Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2011_03 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2011_03 of 08-Mar-2011 of UniProtKB/TrEMBL contains 13897064 sequence entries,
                    comprising 4465597779 amino acids .
                    
                    447081 sequences have been added since release 2011_02, the sequence data of
                    314 existing entries has been updated and the annotations of
                    5030796 entries have been revised. This represents an increase of 3%.
                    
                    Number of fragments: 2274657
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           17816     0.13%
                    2: Evidence at transcript level       488164     3.51%
                    3: Inferred from homology            2918264    21.00%
                    4: Predicted                        10472820    75.36%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 333707
                    
                    The first twenty species represent 1248768 sequences:     9 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:15682
                    2x:58333
                    3x:30079
                    4x:18010
                    5x:11329
                    6x: 7840
                    7x: 5621
                    8x: 4369
                    9x: 3590
                    10x: 7008
                    11- 20x:17488
                    21- 50x: 6190
                    51-100x: 2239
                    >100x: 4786
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     362997  Human immunodeficiency virus 1
                    2      95319  Oryza sativa subsp. japonica (Rice)
                    3      80225  Homo sapiens (Human)
                    4      58515  Hepatitis C virus
                    5      52127  uncultured bacterium
                    6      50941  Vitis vinifera (Grape)
                    7      50470  Trichomonas vaginalis
                    8      49307  Mus musculus (Mouse)
                    9      44037  Populus trichocarpa (Western balsam poplar) 
                    10      43148  Hepatitis B virus (HBV)
                    11      41969  Zea mays (Maize)
                    12      40574  Arabidopsis thaliana (Mouse-ear cress)
                    13      39840  Paramecium tetraurelia
                    14      39360  Oryza sativa subsp. indica (Rice)
                    15      34791  Physcomitrella patens subsp. patens (Moss)
                    16      33720  Danio rerio (Zebrafish) (Brachydanio rerio)
                    17      33648  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    18      33195  Selaginella moellendorffii (Spikemoss)
                    19      32625  Arabidopsis lyrata subsp. lyrata
                    20      31960  Drosophila melanogaster (Fruit fly)
                    21      31830  Caenorhabditis remanei (Caenorhabditis vulgaris)
                    22      31262  Ricinus communis (Castor bean)
                    23      29115  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    24      29022  Oikopleura dioica (Tunicate)
                    25      28089  Tetraodon nigroviridis (Green puffer)
                    26      25271  Ralstonia solanacearum (Pseudomonas solanacearum)
                    27      24812  Nematostella vectensis (Starlet sea anemone)
                    28      23453  Rattus norvegicus (Rat)
                    29      23115  Perkinsus marinus ATCC 50983
                    30      22105  Escherichia coli
                    31      21548  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    32      21411  Caenorhabditis elegans
                    33      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
                    34      20734  Trypanosoma cruzi
                    35      20437  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
                    36      18883  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    37      18771  mine drainage metagenome
                    38      18065  Drosophila simulans (Fruit fly)
                    39      17933  Caenorhabditis briggsae
                    40      17848  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    41      17804  Ailuropoda melanoleuca (Giant panda)
                    42      17605  Phytophthora infestans T30-4
                    43      16974  Tribolium castaneum (Red flour beetle)
                    44      16929  Drosophila yakuba (Fruit fly)
                    45      16735  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    46      16707  Drosophila persimilis (Fruit fly)
                    47      16425  Ectocarpus siliculosus (Brown alga)
                    48      16277  Loa loa (Eye worm)
                    49      16269  Bos taurus (Bovine)
                    50      16246  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    51      16220  Trichinella spiralis (Trichina worm)
                    52      16180  Drosophila sechellia (Fruit fly)
                    53      15982  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    54      15864  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    55      15715  Naegleria gruberi (Amoeba)
                    56      15658  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    57      15418  Drosophila willistoni (Fruit fly)
                    58      15248  Tetrahymena thermophila SB210
                    59      15166  Canis familiaris (Dog) (Canis lupus familiaris)
                    60      15137  Drosophila ananassae (Fruit fly)
                    61      15029  Harpegnathos saltator
                    62      14921  Drosophila erecta (Fruit fly)
                    63      14817  Chlamydomonas reinhardtii (Chlamydomonas smithii)
                    64      14791  Camponotus floridanus
                    65      14775  Drosophila mojavensis (Fruit fly)
                    66      14758  Anopheles gambiae (African malaria mosquito)
                    67      14696  Drosophila virilis (Fruit fly)
                    68      14671  Plasmodium chabaudi
                    69      14652  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    70      14634  Volvox carteri f. nagariensis
                    71      14626  Toxoplasma gondii
                    72      14262  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    73      13780  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    74      13560  Moniliophthora perniciosa FA553
                    75      13508  Schistosoma mansoni (Blood fluke)
                    76      13393  Hepatitis C virus subtype 1b
                    77      13357  Aspergillus flavus 
                    78      13286  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    79      13191  Magnaporthe oryzae (strain 70-15 / FGSC 8958) (Rice blast fungus) 
                    80      13127  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
                    81      13031  Gallus gallus (Chicken)
                    82      12960  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    83      12950  Stigmatella aurantiaca (strain DW4/3-1)
                    84      12710  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    85      12567  Glycine max (Soybean) (Glycine hispida)
                    86      12543  Xenopus laevis (African clawed frog)
                    87      12535  Leptosphaeria maculans (Blackleg fungus) (Phoma lingam)
                    88      12444  Polysphondylium pallidum (Cellular slime mold)
                    89      12124  Plasmodium falciparum
                    90      12021  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    91      12019  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
                    92      11849  Aspergillus oryzae
                    93      11705  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
                    94      11646  Plasmodium berghei (strain Anka)
                    95      11645  Anopheles darlingi (Mosquito)
                    96      11564  Trichoplax adhaerens (Trichoplax reptans)
                    97      11498  Brugia malayi (Filarial nematode worm)
                    98      11449  Hepatitis C virus subtype 1a
                    99      11350  Helicobacter pylori (Campylobacter pylori)
                    100      11272  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
                    101      11211  Ktedonobacter racemifer DSM 44963
                    102      10966  Streptomyces clavuligerus ATCC 27064
                    103      10916  Schistosoma japonicum (Blood fluke)
                    104      10858  Chaetomium globosum (Soil fungus)
                    105      10832  Pediculus humanus subsp. corporis (Body louse)
                    106      10796  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
                    107      10673  Podospora anserina
                    108      10407  Neurospora crassa
                    109      10404  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    110      10387  Aspergillus nidulans FGSC A4
                    111      10357  Phaeodactylum tricornutum (strain CCAP 1055/1)
                    112      10277  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
                    113      10258  Rabies virus
                    114      10222  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    115      10114  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
                    116      10110  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    117      10084  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    118      10052  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    119      10048  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    120      10015  Streptomyces bingchenggensis (strain BCW-1)
                    121       9755  Chlorella variabilis
                    122       9739  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    123       9671  Cryptococcus neoformans (Filobasidiella neoformans)
                    124       9662  Trypanosoma brucei gambiense DAL972
                    125       9560  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    126       9549  Aspergillus fumigatus (Sartorya fumigata)
                    127       9528  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    128       9512  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    129       9484  Streptomyces violaceusniger Tu 4113
                    130       9482  Trypanosoma brucei
                    131       9386  Salmo salar (Atlantic salmon)
                    132       9240  Monosiga brevicollis (Choanoflagellate)
                    133       9230  Candida albicans (Yeast)
                    134       9202  Amycolatopsis mediterranei (strain U-32)
                    135       9183  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    136       9183  Emericella nidulans (Aspergillus nidulans)
                    137       9177  Streptomyces hygroscopicus ATCC 53653
                    138       9164  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    139       9114  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    140       9087  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    141       8987  Dictyostelium discoideum (Slime mold)
                    142       8972  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    143       8958  Thalassiosira pseudonana (Marine diatom)
                    144       8945  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    145       8907  Arthroderma gypseum CBS 118893
                    146       8901  Catenulispora acidiphila 
                    147       8858  Aspergillus clavatus
                    148       8757  Rhodococcus sp. (strain RHA1)
                    149       8724  Paracoccidioides brasiliensis (strain Pb18)
                    150       8692  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    151       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    152       8601  Entamoeba dispar SAW760
                    153       8437  Plesiocystis pacifica SIR-1
                    154       8394  Streptomyces sp. AA4
                    155       8302  Entamoeba histolytica
                    156       8249  Microscilla marina ATCC 23134
                    157       8228  Leishmania major
                    158       8202  Streptomyces sviceus ATCC 29083
                    159       8201  Microcoleus chthonoplastes PCC 7420
                    160       8164  Frankia sp. EUN1f
                    161       8154  Burkholderia xenovorans (strain LB400)
                    162       8095  Pseudomonas aeruginosa
                    163       8019  Leishmania infantum
                    164       7986  Trichophyton verrucosum (strain HKI 0517)
                    165       7978  Toxoplasma gondii ME49
                    166       7955  Ostreococcus tauri
                    167       7943  Rhodococcus opacus (strain B4)
                    168       7937  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    169       7917  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    170       7883  Leishmania braziliensis
                    171       7867  Streptomyces ghanaensis ATCC 14672
                    172       7856  Acaryochloris marina (strain MBIC 11017)
                    173       7848  Paracoccidioides brasiliensis (strain Pb03)
                    174       7823  Burkholderia sp. Ch1-1
                    175       7809  Plasmodium yoelii yoelii
                    176       7735  Uncinocarpus reesii (strain UAMH 1704)
                    177       7708  Streptomyces viridochromogenes DSM 40736
                    178       7607  Bradyrhizobium japonicum USDA 110
                    179       7606  uncultured archaeon
                    180       7571  Clostridium hathewayi DSM 13479
                    181       7563  Burkholderia pseudomallei MSHR346
                    182       7528  Streptomyces sp. C
                    183       7523  Streptomyces lividans TK24
                    184       7519  Solibacter usitatus (strain Ellin6076)
                    185       7500  Tuber melanosporum (Perigord truffle)
                    186       7476  Streptomyces coelicolor
                    187       7475  Burkholderia pseudomallei 1710a
                    188       7465  Burkholderia pseudomallei Pakistan 9
                    189       7459  Burkholderia sp. H160
                    190       7443  Kitasatospora setae  
                    191       7386  Ostreococcus lucimarinus (strain CCE9901)
                    192       7379  Streptomyces sp. ACT-1
                    193       7367  Burkholderia pseudomallei 576
                    194       7349  Burkholderia pseudomallei 305
                    195       7274  Clostridium bolteae ATCC BAA-613
                    196       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    197       7231  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    198       7227  Streptomyces avermitilis
                    199       7199  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    200       7178  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    201       7146  Giardia intestinalis (strain ATCC 50803 / WB clone C6) (Giardia lamblia)
                    202       7140  Burkholderia pseudomallei 1106b
                    203       7131  Burkholderia phymatum (strain DSM 17167 / STM815)
                    204       7124  Burkholderia ambifaria MEX-5
                    205       7096  Medicago truncatula (Barrel medic) (Medicago tribuloides)
                    206       7079  Frankia sp. (strain EuI1c)
                    207       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    208       7016  Myxococcus xanthus (strain DK 1622)
                    209       7005  Mucilaginibacter paludis DSM 18603
                    210       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    211       6975  Rhodopirellula baltica
                    212       6959  Frankia sp. (strain EAN1pec)
                    213       6943  Streptomyces sp. Mg1
                    214       6932  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    215       6923  Burkholderia ambifaria IOP40-10
                    216       6903  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    217       6902  Saccharopolyspora erythraea (strain NRRL 23338)
                    218       6892  Streptomyces roseosporus NRRL 15998
                    219       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    220       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    221       6866  Burkholderia sp. (strain CCGE1002)
                    222       6866  Streptomyces pristinaespiralis ATCC 25486
                    223       6859  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    224       6849  Sus scrofa (Pig)
                    225       6823  Rhizobium loti (Mesorhizobium loti)
                    226       6817  Clostridium asparagiforme DSM 15981
                    227       6798  Achromobacter xylosoxidans (strain A8)
                    228       6771  Burkholderia pseudomallei (strain 1106a)
                    229       6769  Sinorhizobium meliloti AK83
                    230       6744  Streptococcus pneumoniae
                    231       6740  Burkholderia pseudomallei (strain 668)
                    232       6736  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    233       6725  Burkholderia graminis C4D1M
                    234       6713  Rhizobium leguminosarum bv. viciae (strain 3841)
                    235       6712  Rhodococcus erythropolis SK121
                    236       6706  Sporisorium reilianum
                    237       6705  Chthoniobacter flavus Ellin428
                    238       6702  Streptomyces flavogriseus ATCC 33331
                    239       6692  Bacillus thuringiensis IBL 200
                    240       6690  delta proteobacterium NaphS2
                    241       6682  Sinorhizobium meliloti BL225C
                    242       6680  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    243       6679  Mesorhizobium opportunistum WSM2075
                    244       6662  Burkholderia pseudomallei S13
                    245       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    246       6655  Bacillus thuringiensis IBL 4222
                    247       6644  Beggiatoa sp. PS
                    248       6627  Burkholderia cenocepacia (strain MC0-3)
                    249       6614  Burkholderia multivorans CGD2
                    250       6613  Burkholderia pseudomallei Pasteur 52237
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          243352 (  2%)
                    Bacteria        8789180 ( 63%)
                    Eukaryota       3743030 ( 27%)
                    Viruses         1084794 (  8%)
                    Other             36707 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  80261 (  2%)           (  1%)
                    Other Mammalia        225822 (  6%)           (  2%)
                    Other Vertebrata      342547 (  9%)           (  2%)
                    Viridiplantae         856512 ( 23%)           (  6%)
                    Fungi                 760842 ( 20%)           (  5%)
                    Insecta               597503 ( 16%)           (  4%)
                    Nematoda              126278 (  3%)           (  1%)
                    Other                 753265 ( 20%)           (  5%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  303528             1001-1100    81605
                    51- 100 1111986             1101-1200    57569
                    101- 150 1277012             1201-1300    39648
                    151- 200 1233046             1301-1400    26022
                    201- 250 1237960             1401-1500    20925
                    251- 300 1200174             1501-1600    14980
                    301- 350 1088981             1601-1700    11198
                    351- 400  842948             1701-1800     8676
                    401- 450  712051             1801-1900     7005
                    451- 500  595272             1901-2000     5907
                    501- 550  403469             2001-2100     4777
                    551- 600  311320             2101-2200     4928
                    601- 650  225537             2201-2300     3882
                    651- 700  175385             2301-2400     3099
                    701- 750  151387             2401-2500     2607
                    751- 800  136174             >2500        22925
                    801- 850  100158
                    851- 900   91202
                    901- 950   62145
                    951-1000   46919
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   321 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    16832612                1.21                                                    
                    Submitted to EMBL/GenBank/DDBJ   9945608   8756176      0.72                                                    
                    Journal                          6704584   6034692      0.48                                                    
                    Submitted to other databases       74948     74360      0.01                                                    
                    Thesis                              7628      7570     <0.01                                                    
                    Book citation                       5249      5198     <0.01                                                    
                    Other                              94595     92900      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 314187
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      12830346                0.92                                                    
                    CATALYTIC ACTIVITY               1331700   1234842      0.10     4                                              
                    CAUTION                          3145656   3145656      0.23     2                                              
                    COFACTOR                          426402    408406      0.03     8                                              
                    DOMAIN                             27685     25786     <0.01    10                                              
                    FUNCTION                         1564867   1454087      0.11     3                                              
                    INTERACTION                         2462      2462     <0.01    11                                              
                    MISCELLANEOUS                      29441     29437     <0.01     9                                              
                    PATHWAY                           663707    612847      0.05     6                                              
                    SIMILARITY                       3961245   3432376      0.29     1                                              
                    SUBCELLULAR LOCATION             1096353   1090919      0.08     5                                              
                    SUBUNIT                           580828    578644      0.04     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       4635142                0.33                                                    
                    CHAIN                             487109    381967      0.04     2                                              
                    NON_TER                          3830217   2273172      0.28     1                                              
                    SIGNAL                            317221    317162      0.02     3                                              
                    TRANSIT                              595       595     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             159040861               11.44                                                    
                    AGD                                 2558      2558     <0.01    77   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    95   2D gel databases                           
                    Allergome                           1933      1393     <0.01    81   Protein family/group databases             
                    ArachnoServer                         66        66     <0.01    93   Organism-specific databases                
                    ArrayExpress                       92635     92624      0.01    49   Gene expression databases                  
                    BRENDA                              2831      2768     <0.01    75   Enzyme and pathway databases               
                    Bgee                              142952    142806      0.01    45   Gene expression databases                  
                    BioCyc                           1624197   1589579      0.12    21   Enzyme and pathway databases               
                    CAZy                               74682     70173      0.01    52   Protein family/group databases             
                    CGD                                 6766      6766     <0.01    72   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    98   2D gel databases                           
                    CTD                               182941    182198      0.01    42   Organism-specific databases                
                    CYGD                                   5         5     <0.01    99   Organism-specific databases                
                    DIP                                 2753      2748     <0.01    76   Protein-protein interaction databases      
                    EMBL                            15539106  13868242      1.12     3   Sequence databases                         
                    Ensembl                           396567    243850      0.03    31   Genome annotation databases                
                    EnsemblBacteria                   567021    536786      0.04    29   Genome annotation databases                
                    EnsemblFungi                      106161    106084      0.01    46   Genome annotation databases                
                    EnsemblMetazoa                    292146    272847      0.02    34   Genome annotation databases                
                    EnsemblPlants                     261903    234482      0.02    35   Genome annotation databases                
                    EnsemblProtists                    33708     33070     <0.01    60   Genome annotation databases                
                    EuPathDB                          151357    151357      0.01    44   Organism-specific databases                
                    FlyBase                           195061    193525      0.01    40   Organism-specific databases                
                    GO                              25617476   8542338      1.84     2   Ontologies                                 
                    Gene3D                           5679749   4553068      0.41     6   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   102   Organism-specific databases                
                    GeneID                           5563169   5448092      0.40     9   Genome annotation databases                
                    GeneTree                         1160885   1160544      0.08    23   Phylogenomic databases                     
                    Genevestigator                    100213    100203      0.01    48   Gene expression databases                  
                    GenoList                           14752     14479     <0.01    66   Organism-specific databases                
                    GenomeReviews                    3653651   3567593      0.26    12   Genome annotation databases                
                    Gramene                            68786     68786     <0.01    53   Organism-specific databases                
                    H-InvDB                              599       488     <0.01    85   Organism-specific databases                
                    HAMAP                             979494    967883      0.07    25   Family and domain databases                
                    HGNC                               68331     66564     <0.01    54   Organism-specific databases                
                    HOGENOM                          2201254   2201210      0.16    18   Phylogenomic databases                     
                    HOVERGEN                          317561    317561      0.02    33   Phylogenomic databases                     
                    HSSP                              253620    253359      0.02    36   3D structure databases                     
                    IPI                               240237    240233      0.02    38   Sequence databases                         
                    InParanoid                        195019    194929      0.01    41   Phylogenomic databases                     
                    IntAct                             15558     15558     <0.01    65   Protein-protein interaction databases      
                    InterPro                        28871249  10577460      2.08     1   Family and domain databases                
                    KEGG                             4780027   4683918      0.34    10   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    73   Organism-specific databases                
                    Leproma                              936       935     <0.01    84   Organism-specific databases                
                    MEROPS                             66538     65094     <0.01    55   Protein family/group databases             
                    MGI                                43048     43007     <0.01    58   Organism-specific databases                
                    MINT                                8997      8997     <0.01    70   Protein-protein interaction databases      
                    NMPDR                             921154    921144      0.07    26   Genome annotation databases                
                    NextBio                            46879     46876     <0.01    57   Other                                      
                    OMA                              2429257   2429255      0.17    16   Phylogenomic databases                     
                    OrthoDB                           579294    579127      0.04    28   Phylogenomic databases                     
                    PANTHER                          1887882   1818669      0.14    20   Family and domain databases                
                    PDB                                14186      8407     <0.01    67   3D structure databases                     
                    PDBsum                             13941      8257     <0.01    68   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    90   2D gel databases                           
                    PIR                               175060    142226      0.01    43   Sequence databases                         
                    PIRSF                             777310    777310      0.06    27   Family and domain databases                
                    PMAP-CutDB                           253       253     <0.01    87   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01   100   2D gel databases                           
                    PRIDE                             103219    103217      0.01    47   Proteomic databases                        
                    PRINTS                           2250821   2001059      0.16    17   Family and domain databases                
                    PROSITE                          6798684   4550783      0.49     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    97   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    89   Proteomic databases                        
                    PeroxiBase                          2497      2488     <0.01    78   Protein family/group databases             
                    Pfam                            13423011   9997104      0.97     4   Family and domain databases                
                    PharmGKB                              83        83     <0.01    92   Organism-specific databases                
                    PhosphoSite                         1754      1754     <0.01    82   PTM databases                              
                    PhylomeDB                         372478    372446      0.03    32   Phylogenomic databases                     
                    ProDom                            252250    236190      0.02    37   Family and domain databases                
                    ProMEX                               419       419     <0.01    86   Proteomic databases                        
                    ProtClustDB                      2734857   2734857      0.20    14   Phylogenomic databases                     
                    ProteinModelPortal               4374383   4373748      0.31    11   3D structure databases                     
                    PseudoCAP                           4343      4340     <0.01    74   Organism-specific databases                
                    REBASE                             16019     15469     <0.01    64   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   94        93     <0.01    91   2D gel databases                           
                    RGD                                17470     17356     <0.01    63   Organism-specific databases                
                    Reactome                              58        55     <0.01    94   Enzyme and pathway databases               
                    RefSeq                           5583979   5455922      0.40     8   Sequence databases                         
                    SGD                                  246       246     <0.01    88   Organism-specific databases                
                    SMART                            2824275   2184491      0.20    13   Family and domain databases                
                    SMR                              2160950   2160950      0.16    19   3D structure databases                     
                    STRING                           1204083   1203945      0.09    22   Protein-protein interaction databases      
                    SUPFAM                           5649904   4670364      0.41     7   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    96   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01   101   2D gel databases                           
                    TAIR                               18491     18407     <0.01    62   Organism-specific databases                
                    TCDB                                2391      2382     <0.01    79   Protein family/group databases             
                    TIGR                              195075    188027      0.01    39   Genome annotation databases                
                    TIGRFAMs                         2719863   2480443      0.20    15   Family and domain databases                
                    TubercuList                         2228      2223     <0.01    80   Organism-specific databases                
                    UCSC                               49705     49705     <0.01    56   Genome annotation databases                
                    UniGene                           462371    432591      0.03    30   Sequence databases                         
                    VectorBase                         78967     78455      0.01    50   Genome annotation databases                
                    World-2DPAGE                         946       941     <0.01    83   2D gel databases                           
                    WormBase                           41431     41301     <0.01    59   Organism-specific databases                
                    Xenbase                            13204     13168     <0.01    69   Organism-specific databases                
                    ZFIN                               21590     21585     <0.01    61   Organism-specific databases                
                    dictyBase                           8078      8078     <0.01    71   Organism-specific databases                
                    eggNOG                           1146164   1146164      0.08    24   Phylogenomic databases                     
                    euHCVdb                            75268     75265      0.01    51   Organism-specific databases                
                    
                    Number of explicitly cross-referenced databases: 129
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.61   Gln (Q) 3.84   Leu (L) 9.83   Ser (S) 6.69
                    Arg (R) 5.45   Glu (E) 6.13   Lys (K) 5.27   Thr (T) 5.62
                    Asn (N) 4.15   Gly (G) 7.12   Met (M) 2.48   Trp (W) 1.31
                    Asp (D) 5.30   His (H) 2.19   Phe (F) 4.04   Tyr (Y) 3.06
                    Cys (C) 1.26   Ile (I) 6.04   Pro (P) 4.72   Val (V) 6.75
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 469495
                    Total number of entries encoded on a Plasmid: 191530
                    Total number of entries encoded on a Plastid: 11911
                    Total number of entries encoded on a Plastid; Apicoplast: 368
                    Total number of entries encoded on a Plastid; Chloroplast: 131187
                    Total number of entries encoded on a Plastid; Cyanelle: 8
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 448