Wednesday, July 8, 2009

Things about PDB

1, as of July, 2009, PDB contains almost 60,000 structures, among which about 54,000 proteins, and again among which 47,000 X-ray PDB structures.

2, you can download PDB file separately from ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/, or http://www.rcsb.org/pdb/files/. For example,
wget ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1a2k.ent.gz
wget http://www.rcsb.org/pdb/files/4hhb.pdb.gz
Yes, it is possible to use Bioperl:
use Bio::Structure::IO;

$in = Bio::Structure::IO->new(-file => "pdb1a2k.ent",
-format => 'pdb');

while ( my $struc = $in->next_structure() ) {
print "Structure ", $struc->id,"\n";
}

3, EBI has some curated information on PDB structures, check here:

ftp://ftp.ebi.ac.uk/pub/databases/rcsb/pdb-remediated/

4, PDB sequences could be downloaded from NCBI website. So you need not generate by yourself from parsing the structures, which is also error-prone.
wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz
5, each entry of pdbaa represents a sequence, but could correspond to multiple chains of different structures. This sequence has a NCBI gi, and thus easy to follow. The residue number in .pdb files is based on these sequences. And usually, PDB chains only contain a part of the sequence. Yes, sometimes it introduces more residues, but usually it does not matter a lot.

No comments: