Usage¶
This package provides a command line interface.
Retrieve a reference¶
To retrieve a reference mention its id with the --id
option.
$ mutalyzer_retriever --id "NG_012337.1"
##sequence-region NG_012337.1 1 15948
...
Retrieve a reference model¶
To retrieve the reference model add --parse
(-p
). Optionally, choose the
preferred indentation with --indent
.
$ mutalyzer_retriever --id "NG_012337.1" -p --indent 2
{
"annotations": {
"id": "NG_012337.1",
"type": "record",
...
Output directory and split the model¶
Specify an output directory with --output
.
$ mutalyzer_retriever --id "NG_012337.1" -p --indent 2 --output .
$ less NG_012337.1
{
"annotations": {
"id": "NG_012337.1",
"type": "record",
...
Split the model between annotations and sequence with --split
.
$ mutalyzer_retriever --id "NG_012337.1" -p --indent 2 --output . --split
$ less NG_012337.1.annotations
{
"id": "NG_012337.1",
"type": "record",
...
$ less NG_012337.1.sequence
GGGCTTGGTTCTACCATATCTCTACTTTGTGTTTATGTTTGTGTATGCATGTACTCCAAAGTCTT
...
Choose the retrieval source¶
By default all the sources are accessed (in the following order: LRG, NCBI,
Esembl) and the reference is retrieved from the first one found. However,
a specific source can be specified with -source
(-s
).
$ mutalyzer_retriever --id "NG_012337.1" -s ncbi
...
Choose the retrieval file type¶
For NCBI and Ensembl the default retrieved reference is gff3
. However,
a fasta
file can be also retrieved with --type
(-t
).
$ mutalyzer_retriever --id "NG_012337.1" -t fasta
>NG_012337.1 Homo sapiens succinate dehydrogenase complex, ...
GGGCTTGGTTCTACCATATCTCTACTTTGTGTTTATGTTTGTGTATGCATGTACTCCAA...
...
If --parse
(p
) is added to the previous command, the sequence model
is obtained (no annotations are included).
$ mutalyzer_retriever --id "NG_012337.1" -t fasta -p
{"sequence": {"seq": "GGGCTTGGTTCTACCATATCTCTACTTT
For the moment, this is not the case when --parse
(p
) is used in
combination with -t gff3
.
Raw genbank files can be retrieved from NCBI with -t genbank
, but they
cannot be parsed to obtain a model.
Parse local files¶
To obtain a model from local files (gff3
with fasta
and lrg
) use
the from_file
command.
$ mutalyzer_retriever from_file -h
usage: mutalyzer_retriever from_file [-h]
[--paths PATHS [PATHS ...]]
[--is_lrg]
optional arguments:
-h, --help show this help message and exit
--paths PATHS [PATHS ...]
both gff3 and fasta paths or just an lrg
--is_lrg there is one file which is lrg
An example with gff3
and fasta
is as follows.
$ mutalyzer_retriever from_file --paths NG_012337.1.gff3 NG_012337.1.fasta
{"annotations": {"id": "NG_012337.1", "type": "record", "location": ...
...
For an lrg
file the --is_lrg
flag needs to be added.
$ mutalyzer_retriever from_file --paths LRG_417 --is_lrg
{"annotations": {"type": "record", "id": "LRG_417", "location": ...
Retrieve the NCBI reference models from FTP¶
Starting from scratch, i.e., connect to the FTP location to retrieve the assembly versions and to download the annotations files.
$ mutalyzer_retriever ncbi_assemblies
- local output directory set up to ./models
done
...
Restrict only to specific reference ids and assuming that the input files are
already present in the downloads/
directory.
$ mutalyzer_retriever ncbi_assemblies --input downloads/ --ref_id_start NC_000023 --downloaded
- local output directory set up to ./models
done
- processing 109 from 20180213, (GRCh38.p12, GCF_000001405.38)
- NC_000023.11
...
- processing 105.20220307 from 20220307, (GRCh37.p13, GCF_000001405.25)
- NC_000023.10
- writing ./models/NC_000023.10