Databases

CONTENTS

ElementalDataIcon 

GPMAW
A short description of the GPMAW program and a small FAQ.

ElementalDataIcon 

Help
Help for running GPMAW in the form of pdf and online documents.

ElementalDataIcon 

Upgrade
Get the most recent upgrades to GPMAW here. A number of other downloads are available.

ElementalDataIcon 

References Describes reviews and scientific articles using GPMAW

Protein databases and GPMAW

GPMAW uses databases in a variety of purposes. In most cases you will need to download the database from the Internet, but the scientific community has made most of the databases freely available, and the only drawback is the enormous size of some of the databases.

Format. The databases are (of course) available in different formats. The most common format is the FastA format which is found in a couple of variants. For all of the variants, a database record is defined by the name line which starts with a '>' sign, usually followed by one or more accession numbers, the protein name and the species. On the following lines comes the sequence in one-letter code, usually formatted with 60 characters pr. line.
Another popular format is the Swiss-Prot (or EMBL) format, where each sequence record contains much additional information. For a detailed description see
here.
Finally, many records are obtained in GenBank (Entrez/NCBI) format. This format is similar in information content to the Swiss-Prot format, is easier to read for humans, but more difficult to parse for computer programs. More information
here.

GPMAW can read individual records in most formats, but in order to read a database, it has to be indexed by the utility program DBindex (freely available from Lighthouse data, download it here).
DBindex can handle databases in FastA and Swiss-Prot format. However, the Swiss-Prot format has to be converted into FastA before indexing, but when individual records are retrieved by GPMAW, the program will retrieve the fully annotated sequence.

How are databases used?
Retrieval of sequence records.
Digest mass searches.
BLAST homology searches.
General description of files generated and how they are stored.

Table of useful databases.

Some databases and how to handle them in GPMAW/DBindex:
Swiss-Prot - The best annotated database. Reference for most other databases. TrEMBL is handled in an identical manner and when added to Swiss-Prot makes a good complete non-redundant database.
IPI human, mouse, rat - "Complete" protein database for human, mouse and rat respectively. Although in Swiss-Prot format, it is only partially annotated.
NCBInr - The "complete" set of protein sequences collected from most existing protein databases. However, this is not non-redundant

 

[Home] [GPMAW] [Help] [Upgrade] [References] [Lighthouse]

For questions please contact Webmaster .