Usage¶

This package provides a command line interface.

Retrieve a reference¶

To retrieve a reference mention its id with the --id option.

$ mutalyzer_retriever --id "NG_012337.1"
##sequence-region NG_012337.1 1 15948
...

Retrieve a reference model¶

To retrieve the reference model add --parse (-p). Optionally, choose the preferred indentation with --indent.

$ mutalyzer_retriever --id "NG_012337.1" -p --indent 2
{
  "annotations": {
    "id": "NG_012337.1",
    "type": "record",
...

Output directory and split the model¶

Specify an output directory with --output.

$ mutalyzer_retriever --id "NG_012337.1" -p --indent 2 --output .
$ less NG_012337.1
{
  "annotations": {
    "id": "NG_012337.1",
    "type": "record",
...

Split the model between annotations and sequence with --split.

$ mutalyzer_retriever --id "NG_012337.1" -p --indent 2 --output . --split
$ less NG_012337.1.annotations
{
  "id": "NG_012337.1",
  "type": "record",
...
$ less NG_012337.1.sequence
GGGCTTGGTTCTACCATATCTCTACTTTGTGTTTATGTTTGTGTATGCATGTACTCCAAAGTCTT
...

Choose the retrieval source¶

By default all the sources are accessed (in the following order: LRG, NCBI, Esembl) and the reference is retrieved from the first one found. However, a specific source can be specified with -source (-s).

$ mutalyzer_retriever --id "NG_012337.1" -s ncbi
...

Choose the retrieval file type¶

For NCBI and Ensembl the default retrieved reference is gff3. However, a fasta file can be also retrieved with --type (-t).

$ mutalyzer_retriever --id "NG_012337.1" -t fasta
>NG_012337.1 Homo sapiens succinate dehydrogenase complex, ...
GGGCTTGGTTCTACCATATCTCTACTTTGTGTTTATGTTTGTGTATGCATGTACTCCAA...
...

If --parse (p) is added to the previous command, the sequence model is obtained (no annotations are included).

$ mutalyzer_retriever --id "NG_012337.1" -t fasta -p
{"sequence": {"seq": "GGGCTTGGTTCTACCATATCTCTACTTT

For the moment, this is not the case when --parse (p) is used in combination with -t gff3.

Raw genbank files can be retrieved from NCBI with -t genbank, but they cannot be parsed to obtain a model.

Parse local files¶

To obtain a model from local files (gff3 with fasta and lrg) use the from_file command.

$ mutalyzer_retriever from_file -h
usage: mutalyzer_retriever from_file [-h]
                                     [--paths PATHS [PATHS ...]]
                                     [--is_lrg]

optional arguments:
  -h, --help            show this help message and exit
  --paths PATHS [PATHS ...]
                        both gff3 and fasta paths or just an lrg
  --is_lrg              there is one file which is lrg

An example with gff3 and fasta is as follows.

$ mutalyzer_retriever from_file --paths NG_012337.1.gff3 NG_012337.1.fasta
{"annotations": {"id": "NG_012337.1", "type": "record", "location": ...
...

For an lrg file the --is_lrg flag needs to be added.

$ mutalyzer_retriever from_file --paths LRG_417 --is_lrg
{"annotations": {"type": "record", "id": "LRG_417", "location": ...

Retrieve the NCBI reference models from FTP¶

Starting from scratch, i.e., connect to the FTP location to retrieve the assembly versions and to download the annotations files.

$ mutalyzer_retriever ncbi_assemblies
- local output directory set up to ./models
  done
  ...

Restrict only to specific reference ids and assuming that the input files are already present in the downloads/ directory.

$ mutalyzer_retriever ncbi_assemblies  --input downloads/ --ref_id_start NC_000023 --downloaded
- local output directory set up to ./models
  done
- processing 109 from 20180213, (GRCh38.p12, GCF_000001405.38)
  - NC_000023.11
...
- processing 105.20220307 from 20220307, (GRCh37.p13, GCF_000001405.25)
  - NC_000023.10
- writing ./models/NC_000023.10

Usage¶

Retrieve a reference¶

Retrieve a reference model¶

Output directory and split the model¶

Choose the retrieval source¶

Choose the retrieval file type¶

Parse local files¶

Retrieve the NCBI reference models from FTP¶

mutalyzer_retriever

Navigation

Related Topics

Usage¶

Retrieve a reference¶

Retrieve a reference model¶

Output directory and split the model¶

Choose the retrieval source¶

Choose the retrieval file type¶

Parse local files¶

Retrieve the NCBI reference models from FTP¶

Retrieve related reference ids¶