DRCAT (the data resource catalogue) collates metadata on bioinformatics Web-based data resources including databases, ontologies, taxonomies and catalogues. A DRCAT entry includes information such as resource identifier(s), name, description and URL. `Query' lines are defined for each resource that describe what type(s) of data are available, in what format, how (by what identifier) the data can be retrieved and from where (URL).
DRCAT was developed to provide more extensive data integration for EMBOSS, but it has many applications beyond EMBOSS. DRCAT entries (including 'Query' lines) are annotated with terms from the EDAM ontology of common bioinformatics concepts.
An "alpha" version is available:
http://sourceforge.net/projects/drcat/files/
It contains a comprehensive set of resourcs that are fully annotated and a starting point for 'Query' line definition. It includes:
The "alpha" version is intended primarily to solicit feedback. DRCAT is being actively developed: contributions and suggestions are welcome. For further information contact Jon Ison (jison@ebi.ac.uk).
DRCAT can be viewed in any text editor. It can also be browsed.
DRCAT is made available to all without any constraint or license on its use or redistribution other than:
All enquiries to Jon Ison (jison@ebi.ac.uk) cc'ing Peter Rice(pmr@ebi.ac.uk) and Matus Kalas (matus.kalas@bccs.uib.no)
Thanks to Chris Southan for providing a comprehensive list of databases. Thanks to Peter Rice and Matus Kalas for valuable work and discussions.
Feel free to subscribe to one or both of the mailing lists:
Once subscribed, you can mail the lists:
drcat-developers is for technical discussions between EDAM developers / contributors. drcat-users is for general discussions and announcements. Traffic will be kept to a minimum.
Comment lines begin with '#' and can appear anywhere.
Resources might be cross-referenced from an EMBL or SwissProt entry. Database identifiers and names are taken (where available) from:
Note that SwissProt identifiers are listed in the the file dbxref.
ID <ID>
Recommended / official unique identifier.
e.g.
ID EcID
Value of <ID> is a string (no whitespace). A single ID line is given per entry.
IDalt <ID>
An alternative identifier.
e.g.
IDalt 2DBase-Ecoli
Value of <IDalt> is a string (no whitespace). Multiple IDalt lines may be given per entry (one IDalt / line).
Acc <Acc>
Accession number of database.
e.g.
Acc DB-0115
Value of <Acc> is a string (no whitespace).Values are taken from dbxref (if defined). A single Acc line only is given per entry.
Name <Text>
Verbose name.
e.g.
Name Structural classification of proteins (SCOP) database
Values are taken from dbxref (if defined) or otherwise are assigned. A single Name line only is given per entry.
Desc <Text>
Description of resource.
e.g.
Desc The SCOP database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.
Value of Text is any free text (but typically text from resource home page). A single Desc line only is given per entry.
URL <URL>
URL of resource server.
e.g.
URL http://scop.mrc-lmb.cam.ac.uk/scop
Value of <URL> is a resolvable URL. A single URL line only is given per entry.
URLlink
URL for instructions on how to link to the database.
e.g.
URLlink http://gene3d.biochem.ucl.ac.uk/Gene3D/linking
Value of <URLlink> is a resolvable URL. A single URLlink line only is given per entry.
URLrest
URL of documentation on REST-based interface (if available).
e.g.
URLrest http://www.ebi.ac.uk/pride/prideMartWebService.do
Value of <URLrest> is a resolvable URL. A single URLrest line only is given per entry.
URLsoap
URL of documentation on SOAP-based interface (if available).
e.g.
URLsoap http://api.cathdb.info/api/soap/dataservices/wsdl
Value of <URLsoap> is a resolvable URL. Single URLsoap line per entry.
Cat <Cat>
Database category.
e.g.
Cat 2D gel databases
Values taken from dbxref.txt (if defined).
A single Cat line only is given per entry.
Taxon <tax_id> | <scientific_name>
Annotation of the taxonomic scope of the resource.
e.g.
Taxon 562 | Escherichia coli
Values of <tax_id> and <scientific_name> are the taxonomic ID and scientific name of an organism taken from the NCBI Taxonomy. Multiple Taxon lines may be given.
EDAMdat <EDAM_id> | <EDAM_term>
EDAM annotation of the data returned by a query.
e.g.
EDAMdat 0001554 | SCOP node
Values of <EDAM_id> and <EDAM_term> are a unique identifier and term name for a concept from the EDAM ontology "Data" branch. Multiple EDAMdat lines may be given.
EDAMfmt <EDAM_id> | <EDAM_term>
EDAM annotation of the format of data returned by a query.
e.g.
EDAMfmt 0001929 | FASTA format
Values of <EDAM_id> and <EDAM_term> are a unique identifier and term name for a concept from the EDAM ontology "Format" branch. Multiple EDAMfmt lines may be given.
EDAMid <EDAM_id> | <EDAM_term>
EDAM annotation of the data identifier used as a query.
e.g.
EDAMid 0001033 | Gene ID (Ensembl)
Values of <EDAM_id> and <EDAM_term> are a unique identifier and term name for a concept from the EDAM ontology "Identifier" branch. Multiple EDAMid lines may be given.
EDAMtpc <EDAM_id> | <EDAM_term>
EDAM annotation of the resource itself.
e.g.
EDAMtpc 0000147 | Protein-protein interactions
Values of <EDAM_id> and <EDAM_term> are a unique identifier and term name for a concept from the EDAM ontology "Topic" branch. Multiple EDAMtpc lines may be given.
Xref <token> | <ID1;ID2>
Nature of cross-reference from a SwissProt or EMBL database entry.
e.g.
Xref SP_explicit | UniProt accession
Where <token> is one of:
ID1, ID2 etc. give the type of identifier(s), i.e. term names from the EDAM "Identifier" branch, used in the cross-reference, e.g.
SP_explicit | EC number;OrganismID
Multiple Xref lines may be given. UniProt/SwissProt xrefs are described in detail in the UniProt userman.htm file.
SP_explicit links are generally of the form:
DR database_name; primary_id; secondary_id
primary_id is usually an accession and secondary_id usually complements the first, e.g. entry name or version number.
SP_implicit links are to databases that (typically) lack their own accession number scheme, but may be cross-referenced by 1) SP primary accession number or 2) some other identifier used by SP, e.g. gene name in the GN line. In both cases no extra DR line is present.
SP_CC links are to databases that (typically) are accessed via one URL, not by individual accessions.
SP_FTlinks are provided where the link concerns a feature.
SP_lit links the RX line provides the Medline or PubMed identifier.
Other links include e.g. taxonomy identifier (Tax_id) that uniquely identifies an organism in NCBI taxonomy classification, and Enzyme EC numbers which are found in SP description (DE) lines.
Query <Data_type> {<comment>} | <Data_format> {<comment>} | <Data_identifier> {<comment>} | <URL>
Specification of data resource query.
e.g.
Query SCOP node | HTML | SCOP sunid | http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?sunid=%s Query Fungi annotation | HTML | Genus name;Species name | http://aftol.umn.edu/species/%s1_%s2 Query Fungi annotation (anamorph) | HTML | Genus name;Species name | http://aftol.umn.edu/species/%s1_%s2__(anamorph)
<Data_type>, <Data_format>, <Data_identifier> are term names from the EDAM ontology ("Data", "Format" and "Identifier" branches respectively). For each one, a corresponding EDAMdat, EDAMfmt or EDAMid annotation is given:
Query lines may employ two or more data identifiers. In such cases identifiers are separated by a ';' and the URL should use %s1, %s2, %s3 etc (for first, second, third identifiers etc.). In the rare cases where the same ID is used twice in the URL then (e.g.) %s1 %s1 is used (for two uses of the first id) and %s2, %s3 etc (for second, third identifiers).
Multiple Query lines may be given. Note an optional comment may be given after <Datatype>, <Data_format> or <Data_identifier> and can be used for a provider-supplied names or comments for that type, format or identifier.
Example <Data_identifier> | <Example>
Example identifiers used in Query lines.
e.g.
Example Genus name | Aspergillus Example Species name | giganteus
<Data_identifier> is a term name from the EDAM ontology ("Identifier" branch) as used in one or more Query lines. <Example> is valid value of the identifier for use in a query. Multiple Example lines may be given.
ID AFTOL Name Assembling the Fungal Tree of Life (AFTOL) database Desc Fungal structural and biochemical database. URL http://www.aftol.org/index.php Cat Not available Taxon 4751 | Fungi EDAMtpc 0000782 | Fungal EDAMid 0001045 | Species name EDAMid 0001870 | Genus name EDAMdat 0002395 | Fungi annotation EDAMdat 0002396 | Fungi annotation (anamorph) EDAMfmt 0002331 | HTML Xref SP_FT | None Query Fungi annotation | HTML | Genus name;Species name | http://aftol.umn.edu/species/%s1_%s2 Query Fungi annotation (anamorph) | HTML | Genus name;Species name | http://aftol.umn.edu/species/%s1_%s2__(anamorph) Example Genus name | Aspergillus Example Species name | giganteus ID ANU-2DPAGE Acc DB-0002 Name Australian National University 2-DE database (ANU-2DPAGE) Desc 2-DE PAGE database. URL http://semele.anu.edu.au Cat 2D gel databases Taxon 1 | all EDAMtpc 0000133 | Two-dimensional gel electrophoresis EDAMid 0003021 | UniProt accession EDAMdat 0002364 | Experiment annotation (2D PAGE) EDAMfmt 0002331 | HTML Xref SP_explicit | UniProt accession Query Experiment annotation (2D PAGE) | HTML | UniProt accession | http://semele.anu.edu.au/cgi-bin/get-2d-entry?%s Example UniProt accession | P02930 Example UniProt accession | Q9SIB9 ID SCOP Name Structural classification of proteins (SCOP) database Desc The SCOP database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification. URL http://scop.mrc-lmb.cam.ac.uk/scop Taxon 1 | all EDAMtpc 0000736 | Protein domains EDAMdat 0001554 | SCOP node EDAMdat 0002093 | Data reference EDAMid 0001042 | SCOP sunid EDAMid 0001127 | PDB ID EDAMid 0000842 | Identifier EDAMfmt 0002331 | HTML Query SCOP node | HTML | SCOP sunid | http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?sunid=%s Query Data reference {PDB Entry search} | HTML | PDB ID | http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?PDB=%s Query Data reference | HTML | Identifier {Keyword} | http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?key=%s Example SCOP sunid | 47718 Example PDB ID | 1djh Example Identifier {Keyword} | immunoglobulin ID ECO2DBASE IDalt 2DBase-Ecoli IDalt EC-2D-GEL Acc DB-0115 Name 2D-PAGE database of Escherichia coli Desc This Database currently contains 12 gels consisting of 1185 protein spots information in which 723 proteins where identified and annotated. Individual protein spots in the existing gels can be displayed, queried, analysed and compared in a tabular format based on varios functional categories enabling quick and subsequent analysis. URL http://2dbase.techfak.uni-bielefeld.de/cgi-bin/2d/2d.cgi Cat 2D gel databases Taxon 562 | Escherichia coli EDAMtpc 0000133 | Two-dimensional gel electrophoresis EDAMdat 0002364 | Experiment annotation (2D PAGE) EDAMid 0003021 | UniProt accession EDAMfmt 0002331 | HTML Xref SP_explicit | None Query Experiment annotation (2D PAGE) {ECO2DBASE entry} | HTML | UniProt accession | http://2dbase.techfak.uni-bielefeld.de/cgi-bin/2d/2d.cgi?%s Example UniProt accession | P02930 Example UniProt accession | P52697 ID Ensembl Acc DB-0023 Name Ensembl eukaryotic genome annotation project Desc Genome databases for vertebrates and other eukaryotic species. URL http://www.ensembl.org/ Cat Genome annotation databases Taxon 33208 | Metazoa EDAMtpc 0000643 | Genomes EDAMtpc 0002818 | Eukaryote EDAMtpc 0000643 | Genomes EDAMdat 0000849 | Sequence record EDAMdat 0000916 | Gene annotation EDAMid 0001033 | Gene ID (Ensembl) EDAMid 0002725 | Transcript ID (Ensembl) EDAMfmt 0001929 | FASTA format EDAMfmt 0002331 | HTML Xref SP_explicit | None Xref SP_FT | None Query Gene annotation | HTML | Gene ID (Ensembl) | http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=%s Query Sequence record | FASTA format | Gene ID (Ensembl);Transcript ID (Ensembl) | http://www.ensembl.org/Homo_sapiens/Gene/Export?db=core;g=%s1;output=fasta;r=13:31787617-31871809;strand=feature;t=%s2;time=1244110856.85314;st=cdna;st=coding;st=peptide;st=utr5;st=utr3;st=exons;st=introns;genomic=unmasked;_format=Text Example Gene ID (Ensembl);Transcript ID (Ensembl) | ENSG00000139618;ENST00000380152