Further information on these alignments will be made available in the future, as will refinements on these. Currently, due to the need to include information as to what areas are considered reliably aligned (see below), aligned FASTA or similar-format files would be difficult to use, and thus have not been provided; this problem will be worked on (probably via the usage of a format such as NEXUS).


These files are produced using the showalign program (with a matrix, ESIMILARITY, that we have created) from EMBOSS on alignments that we have done with contributions from other databases (3D_Ali, HOMSTRAD, and Pfam). The output from the showalign program is then processed further using (Perl) programs written locally.

Reliably structurally-aligned portions are in blue; portions not structurally alignable due to gaps in the structures are in red. Files with a ".#.htm" ending are split up into seperate pages for printing. Groups of lines seperated by lines of "=" are 65%+ identical to a sequence with known structure - in our terminology, these are clusters. Areas not in blue (reliably structurally-aligned) are only aligned within clusters.

Sequences with known structure have names which are in all-uppercase or (for less-important cases) have a name beginning with a PDB file and chain id (starting with a number from 1-9, then four letters or numbers). Names of sequences from archaea are in purple; names of sequences from fungi or metazoa are in red; names of sequences from plants are in green; names of sequences from bacteria are in blue.

The postscript files are plots (via the EMBOSS plotcon program) of the degree of residue conservation vs the position along the alignment, with a window of 20 residues. These were done with an all-positive matrix (namely ESIMILARITY+1), so gaps - including at the ends - are considered the same as amino acids not matching at all.

