DRCAT Resource Catalogue

Bioinformatics Web-based Data Resources

Introduction
Resource metadata

Resource identifiers

ID
IDalt
Acc
Name

Basic metadata

Desc
URL
URLlink
URLrest
URLsoap

Annotation

Cat
Taxon
EDAMdat
EDAMfmt
EDAMid
EDAMtpc

Cross-references

Xref

Queryable data

Query
Example

Example entries
Browse DRCAT

Introduction

What is DRCAT?

DRCAT (the data resource catalogue) collates metadata on bioinformatics Web-based data resources including databases, ontologies, taxonomies and catalogues. A DRCAT entry includes information such as resource identifier(s), name, description and URL. `Query' lines are defined for each resource that describe what type(s) of data are available, in what format, how (by what identifier) the data can be retrieved and from where (URL).

DRCAT was developed to provide more extensive data integration for EMBOSS, but it has many applications beyond EMBOSS. DRCAT entries (including 'Query' lines) are annotated with terms from the EDAM ontology of common bioinformatics concepts.

Download and Status

An "alpha" version is available:

http://sourceforge.net/projects/drcat/files/

It contains a comprehensive set of resourcs that are fully annotated and a starting point for 'Query' line definition. It includes:

655 entries
521 'Query' lines
2147 EDAM annotations

The "alpha" version is intended primarily to solicit feedback. DRCAT is being actively developed: contributions and suggestions are welcome. For further information contact Jon Ison (jison@ebi.ac.uk).

Viewing

DRCAT can be viewed in any text editor. It can also be browsed.

License

DRCAT is made available to all without any constraint or license on its use or redistribution other than:

DRCAT is clearly acknowledged as the source of the product
DRCAT files displayed publicly include the publication date and/or version number
DRCAT files are not altered and subsequently redistributed under their original name

Contacts

All enquiries to Jon Ison (jison@ebi.ac.uk) cc'ing Peter Rice(pmr@ebi.ac.uk) and Matus Kalas (matus.kalas@bccs.uib.no)

Thanks to Chris Southan for providing a comprehensive list of databases. Thanks to Peter Rice and Matus Kalas for valuable work and discussions.

Mailing lists

Feel free to subscribe to one or both of the mailing lists:

Once subscribed, you can mail the lists:

drcat-developers is for technical discussions between EDAM developers / contributors. drcat-users is for general discussions and announcements. Traffic will be kept to a minimum.

Resource metadata

Comment lines begin with '#' and can appear anywhere.

Resource identifiers

Resources might be cross-referenced from an EMBL or SwissProt entry. Database identifiers and names are taken (where available) from:

http://www.expasy.ch/sprot/userman.html#DR_line(DR line of a SwissProt record)
http://www.insdc.org/page.php?page=db_xref (db_xref field of a sequence feature table)

Note that SwissProt identifiers are listed in the the file dbxref.

ID

ID        <ID>

Recommended / official unique identifier.

e.g.

ID      EcID

Value of <ID> is a string (no whitespace). A single ID line is given per entry.

IDalt

IDalt     <ID>

An alternative identifier.

e.g.

IDalt   2DBase-Ecoli

Value of <IDalt> is a string (no whitespace). Multiple IDalt lines may be given per entry (one IDalt / line).

Acc

Acc       <Acc>

Accession number of database.

e.g.

Acc     DB-0115

Value of <Acc> is a string (no whitespace).Values are taken from dbxref (if defined). A single Acc line only is given per entry.

Name

Name      <Text>

Verbose name.

e.g.

Name    Structural classification of proteins (SCOP) database

Values are taken from dbxref (if defined) or otherwise are assigned. A single Name line only is given per entry.

Basic metadata

Desc

Desc      <Text>

Description of resource.

e.g.

Desc    The SCOP database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.

Value of Text is any free text (but typically text from resource home page). A single Desc line only is given per entry.

URL

URL       <URL>

URL of resource server.

e.g.

URL     http://scop.mrc-lmb.cam.ac.uk/scop

Value of <URL> is a resolvable URL. A single URL line only is given per entry.

URLlink

URLlink

URL for instructions on how to link to the database.

e.g.

URLlink http://gene3d.biochem.ucl.ac.uk/Gene3D/linking

Value of <URLlink> is a resolvable URL. A single URLlink line only is given per entry.

URLrest

URLrest

URL of documentation on REST-based interface (if available).

e.g.

URLrest http://www.ebi.ac.uk/pride/prideMartWebService.do

Value of <URLrest> is a resolvable URL. A single URLrest line only is given per entry.

URLsoap

URLsoap

URL of documentation on SOAP-based interface (if available).

e.g.

URLsoap http://api.cathdb.info/api/soap/dataservices/wsdl

Value of <URLsoap> is a resolvable URL. Single URLsoap line per entry.

Annotation

Cat

Cat       <Cat>

Database category.

e.g.

Cat     2D gel databases

Values taken from dbxref.txt (if defined).

A single Cat line only is given per entry.

Taxon

Taxon    <tax_id> | <scientific_name>

Annotation of the taxonomic scope of the resource.

e.g.

Taxon   562 | Escherichia coli

Values of <tax_id> and <scientific_name> are the taxonomic ID and scientific name of an organism taken from the NCBI Taxonomy. Multiple Taxon lines may be given.

EDAMdat

EDAMdat   <EDAM_id> | <EDAM_term>

EDAM annotation of the data returned by a query.

e.g.

EDAMdat 0001554 | SCOP node

Values of <EDAM_id> and <EDAM_term> are a unique identifier and term name for a concept from the EDAM ontology "Data" branch. Multiple EDAMdat lines may be given.

EDAMfmt

EDAMfmt   <EDAM_id> | <EDAM_term>

EDAM annotation of the format of data returned by a query.

e.g.

EDAMfmt 0001929 | FASTA format

Values of <EDAM_id> and <EDAM_term> are a unique identifier and term name for a concept from the EDAM ontology "Format" branch. Multiple EDAMfmt lines may be given.

EDAMid

EDAMid    <EDAM_id> | <EDAM_term>

EDAM annotation of the data identifier used as a query.

e.g.

EDAMid  0001033 | Gene ID (Ensembl)

Values of <EDAM_id> and <EDAM_term> are a unique identifier and term name for a concept from the EDAM ontology "Identifier" branch. Multiple EDAMid lines may be given.

EDAMtpc

EDAMtpc   <EDAM_id> | <EDAM_term>

EDAM annotation of the resource itself.

e.g.

EDAMtpc 0000147 | Protein-protein interactions

Values of <EDAM_id> and <EDAM_term> are a unique identifier and term name for a concept from the EDAM ontology "Topic" branch. Multiple EDAMtpc lines may be given.

Cross-references

Xref

Xref      <token> | <ID1;ID2>

Nature of cross-reference from a SwissProt or EMBL database entry.

e.g.

Xref    SP_explicit | UniProt accession

Where <token> is one of:

SP_explicit (explicit cross-reference from SP in database cross-reference (DR) line)
SP_implicit (implicit cross-reference from SP in DR line)
SP_CC (cross-reference from SP via URL address under the CC topic DATABASE)
SP_FT (cross-reference from SP via key types in the feature table)
SP_lit (cross-reference from SP to MedLine/PubMed stored in RX (Reference cross-reference) line)
EMBL_DR (cross-reference from EMBL DR line)
Other (some other type of reference)
None (not cross-linked from SP or EMBL)

ID1, ID2 etc. give the type of identifier(s), i.e. term names from the EDAM "Identifier" branch, used in the cross-reference, e.g.

SP_explicit | EC number;OrganismID

Multiple Xref lines may be given. UniProt/SwissProt xrefs are described in detail in the UniProt userman.htm file.

`SP_explicit`

SP_explicit links are generally of the form:

DR database_name; primary_id; secondary_id

primary_id is usually an accession and secondary_id usually complements the first, e.g. entry name or version number.

`SP_implicit`

SP_implicit links are to databases that (typically) lack their own accession number scheme, but may be cross-referenced by 1) SP primary accession number or 2) some other identifier used by SP, e.g. gene name in the GN line. In both cases no extra DR line is present.

`SP_CC`

SP_CC links are to databases that (typically) are accessed via one URL, not by individual accessions.

`SP_FT`

SP_FTlinks are provided where the link concerns a feature.

`SP_lit`

SP_lit links the RX line provides the Medline or PubMed identifier.

`Other`

Other links include e.g. taxonomy identifier (Tax_id) that uniquely identifies an organism in NCBI taxonomy classification, and Enzyme EC numbers which are found in SP description (DE) lines.

Queryable data

Query

Query     <Data_type> {<comment>} | <Data_format> {<comment>} | <Data_identifier> {<comment>} | <URL>

Specification of data resource query.

e.g.

Query   SCOP node | HTML | SCOP sunid | http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?sunid=%s
Query   Fungi annotation | HTML | Genus name;Species name | http://aftol.umn.edu/species/%s1_%s2
Query   Fungi annotation (anamorph) | HTML | Genus name;Species name | http://aftol.umn.edu/species/%s1_%s2__(anamorph)

<Data_type>, <Data_format>, <Data_identifier> are term names from the EDAM ontology ("Data", "Format" and "Identifier" branches respectively). For each one, a corresponding EDAMdat, EDAMfmt or EDAMid annotation is given:

<Data_type> is the type of data retrieved
<Data_format> is the format of the data
<Data_identifier> is the type of data identifier(s) (such as a sequence accession or EC number) that's used for query (in the URL component) and also in database cross-references (from SwissProt / EMBL).
<URL> is used for data retrieval. %s in the URL should be replaced by an instance of the data identifier, e.g. a particular accession number.

Query lines may employ two or more data identifiers. In such cases identifiers are separated by a ';' and the URL should use %s1, %s2, %s3 etc (for first, second, third identifiers etc.). In the rare cases where the same ID is used twice in the URL then (e.g.) %s1 %s1 is used (for two uses of the first id) and %s2, %s3 etc (for second, third identifiers).

Multiple Query lines may be given. Note an optional comment may be given after <Datatype>, <Data_format> or <Data_identifier> and can be used for a provider-supplied names or comments for that type, format or identifier.

Example

Example   <Data_identifier> | <Example>

Example identifiers used in Query lines.

e.g.

Example Genus name | Aspergillus
Example Species name | giganteus

<Data_identifier> is a term name from the EDAM ontology ("Identifier" branch) as used in one or more Query lines. <Example> is valid value of the identifier for use in a query. Multiple Example lines may be given.

Example entries

ID      AFTOL
Name    Assembling the Fungal Tree of Life (AFTOL) database
Desc    Fungal structural and biochemical database.
URL     http://www.aftol.org/index.php
Cat     Not available
Taxon   4751 | Fungi
EDAMtpc 0000782 | Fungal
EDAMid  0001045 | Species name
EDAMid  0001870 | Genus name
EDAMdat 0002395 | Fungi annotation
EDAMdat 0002396 | Fungi annotation (anamorph)
EDAMfmt 0002331 | HTML
Xref    SP_FT | None
Query   Fungi annotation | HTML | Genus name;Species name | http://aftol.umn.edu/species/%s1_%s2
Query   Fungi annotation (anamorph) | HTML | Genus name;Species name | http://aftol.umn.edu/species/%s1_%s2__(anamorph)
Example Genus name | Aspergillus
Example Species name | giganteus

ID      ANU-2DPAGE
Acc     DB-0002
Name    Australian National University 2-DE database (ANU-2DPAGE)
Desc    2-DE PAGE database.
URL     http://semele.anu.edu.au
Cat     2D gel databases
Taxon   1 | all
EDAMtpc 0000133 | Two-dimensional gel electrophoresis
EDAMid  0003021 | UniProt accession
EDAMdat 0002364 | Experiment annotation (2D PAGE)
EDAMfmt 0002331 | HTML
Xref    SP_explicit | UniProt accession
Query   Experiment annotation (2D PAGE) | HTML | UniProt accession | http://semele.anu.edu.au/cgi-bin/get-2d-entry?%s
Example UniProt accession | P02930
Example UniProt accession | Q9SIB9

ID      SCOP
Name    Structural classification of proteins (SCOP) database
Desc    The SCOP database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.
URL     http://scop.mrc-lmb.cam.ac.uk/scop
Taxon   1 | all
EDAMtpc 0000736 | Protein domains
EDAMdat 0001554 | SCOP node
EDAMdat 0002093 | Data reference
EDAMid  0001042 | SCOP sunid
EDAMid  0001127 | PDB ID
EDAMid  0000842 | Identifier
EDAMfmt 0002331 | HTML
Query   SCOP node | HTML | SCOP sunid | http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?sunid=%s
Query   Data reference {PDB Entry search} | HTML | PDB ID | http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?PDB=%s
Query   Data reference | HTML | Identifier {Keyword} | http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?key=%s
Example SCOP sunid | 47718
Example PDB ID | 1djh
Example Identifier {Keyword} | immunoglobulin

ID      ECO2DBASE
IDalt   2DBase-Ecoli
IDalt   EC-2D-GEL
Acc     DB-0115
Name    2D-PAGE database of Escherichia coli
Desc    This Database currently contains 12 gels consisting of 1185 protein spots information in which 723 proteins where identified and annotated. Individual protein spots in the existing gels can be displayed, queried, analysed and compared in a tabular format based on varios functional categories enabling quick and subsequent analysis.
URL     http://2dbase.techfak.uni-bielefeld.de/cgi-bin/2d/2d.cgi
Cat     2D gel databases
Taxon   562 | Escherichia coli
EDAMtpc 0000133 | Two-dimensional gel electrophoresis
EDAMdat 0002364 | Experiment annotation (2D PAGE)
EDAMid  0003021 | UniProt accession
EDAMfmt 0002331 | HTML
Xref    SP_explicit | None
Query   Experiment annotation (2D PAGE) {ECO2DBASE entry} | HTML | UniProt accession | http://2dbase.techfak.uni-bielefeld.de/cgi-bin/2d/2d.cgi?%s
Example UniProt accession | P02930
Example UniProt accession | P52697

ID      Ensembl
Acc     DB-0023
Name    Ensembl eukaryotic genome annotation project
Desc    Genome databases for vertebrates and other eukaryotic species.
URL     http://www.ensembl.org/
Cat     Genome annotation databases
Taxon   33208 | Metazoa
EDAMtpc 0000643 | Genomes
EDAMtpc 0002818 | Eukaryote
EDAMtpc 0000643 | Genomes
EDAMdat 0000849 | Sequence record
EDAMdat 0000916 | Gene annotation
EDAMid  0001033 | Gene ID (Ensembl)
EDAMid  0002725 | Transcript ID (Ensembl)
EDAMfmt 0001929 | FASTA format
EDAMfmt 0002331 | HTML
Xref    SP_explicit | None
Xref    SP_FT | None
Query   Gene annotation | HTML | Gene ID (Ensembl) | http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=%s
Query   Sequence record | FASTA format | Gene ID (Ensembl);Transcript ID (Ensembl) | http://www.ensembl.org/Homo_sapiens/Gene/Export?db=core;g=%s1;output=fasta;r=13:31787617-31871809;strand=feature;t=%s2;time=1244110856.85314;st=cdna;st=coding;st=peptide;st=utr5;st=utr3;st=exons;st=introns;genomic=unmasked;_format=Text
Example Gene ID (Ensembl);Transcript ID (Ensembl) | ENSG00000139618;ENST00000380152