Retrieval sources
=================
NCBI
----
References annotations are retrieved using the `Eutils sviewer `_ endpoint (`example `_), while the fasta
sequences are retrieved using the Entrez API (`example `__).
Assemblies
^^^^^^^^^^
For human chromosomal references (NC\_) the following `FTP location `__ is used to manually retrieve the annotations making sure that
the history is taken into account.
Ensembl
-------
Ensembl offers an `API `_ from where the most
recent reference versions can be retrieved. Queries are not accepted with the
version included, e.g., `ENST00000383925.1 `_, the version being part of the response,
e.g., `ENST00000383925 `_. For this reason we check if the provided reference id includes the
version, case in which we use the following `endpoint `_ to check if the most recent version equals the
provided one. If not, for humans we check if the version matches the `GRCh37
dedicated API `_. If the reference has the
same id in GRCh37 and GRCh38 the retrieved one is from GRCh38. The
`transcript archive `_ may be employed in
future to retrieve other versions, but currently the annotation provided is
not complete.
Note that the retriever accepts only `stable ensembl ids `_, which start with ``ENS``.
LRG
---
LRG references are retrieved from the following `location `__.