Python API¶
Note
This section is currently incomplete. We’re working to fill out the details of the Python API as soon as possible.
Configuration¶
The immunedb.common.config
module provides methods to initialize a
connection to a new or existing database.
Most programs using ImmuneDB will start with code similar to:
import immunedb.common.config as config
parser = config.get_base_arg_parser('Some description of the program')
# ... add any additional arguments to the parser ...
args = parser.parse_args()
session = config.init_db(args.db_config)
When this script is run, it will require at least one argument which is the
path to a database configuration (as generated with immunedb_admin
). Using
that, a Session
object will be made, connected to the associated database.
One can also directly specify the path to a configuration directly.
import immunedb.common.config as config
session = config.init_db('path/to/config')
Alternatively a dictionary with the same information can be passed:
import immunedb.common.config as config
session = config.init_db({
'host': '...',
'database': '...',
'username': '...',
'password': '...',
})
Returned will be a Session
object which can be used to interact with the
database.
Using the Session¶
ImmuneDB is built using SQLAlchemy as a MySQL abstraction layer. Simply put, instead of writing SQL, the database is queried using Python constructs. Full documentation on using the session can be found in SQLAlchemy’s documentation.
Once a session is created, the models listed below can be queried.
Example Queries¶
Below are some example queries that demonstrate how to use the ImmuneDB API.
Clone CDR3s¶
Get all clones with a given V-gene and print their CDR3 AA sequences.
Input
import immunedb.common.config as config
from immunedb.common.models import Clone
session = config.init_db(...)
for clone in session.query(Clone).filter(Clone.v_gene == 'IGHV3-30'):
print('clone {} has AAs {}'.format(clone.id, clone.cdr3_aa))
Output
clone 37884 has AAs CARGYSSSYFDYW
clone 37886 has AAs CARSRTSLSIYGVVPTGDFDSW
clone 37885 has AAs CARNGLNTVSGVVISPKYWLDPW
clone 37887 has AAs CARDLFRGVDFYYYGMDVW
Clone Frequency¶
Determine how many sequences appear in each sample belonging to clone 1234.
Note the CloneStats
model has one entry for each clone/sample combination
plus one where the sample_id
field is null
which represents the overall
clone.
Input
import immunedb.common.config as config
from immunedb.common.models import CloneStats
session = config.init_db(...)
for stat in session.query(CloneStats).filter(
CloneStats.clone_id == 1234).order_by(CloneStats.sample_id):
print('clone {} has {} unique sequences and {} copies {}'.format(
stat.clone_id,
stat.unique_cnt,
stat.total_cnt,
('in sample ' + stat.sample.name) if stat.sample else 'overall'))
Output
clone 1234 has 53 unique sequences and 1331 copies overall
clone 1234 has 27 unique sequences and 379 copies in sample sample1
clone 1234 has 27 unique sequences and 339 copies in sample sample3
clone 1234 has 24 unique sequences and 311 copies in sample sample4
clone 1234 has 28 unique sequences and 302 copies in sample sample10
V-gene Usage¶
This is a more complex query which gathers the V-gene usage of all sequences which are (a) in subject with ID 1, (b) associated with a clone, and (c) are unique to the subject, printing them from least to most frequent.
Input
import immunedb.common.config as config
from immunedb.common.models import Sequence, SequenceCollapse
session = config.init_db(...)
subject_unique_seqs = session.query(
func.count(Sequence.seq_id).label('count'),
Sequence.v_gene
).join(
SequenceCollapse
).filter(
Sequence.subject_id == 1,
~Sequence.clone_id.is_(None),
SequenceCollapse.copy_number_in_subject > 0
).group_by(
Sequence.v_gene
).order_by(
'count'
)
for seq in subject_unique_seqs:
print(seq.v_gene, seq.count)
Output
# ... output truncated ...
IGHV4-34 1128
IGHV1-2 1160
IGHV3-48 1169
IGHV4-39 1310
IGHV3-7 1345
IGHV3-30|3-30-5|3-33 1607
IGHV3-23|3-23D 1626
IGHV3-21 1878