Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2011_01 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2011_01 of 11-Jan-2011 of UniProtKB/TrEMBL contains 13069501 sequence entries,
                    comprising 4207640687 amino acids .
                    
                    332281 sequences have been added since release 2010_12, the sequence data of
                    704 existing entries has been updated and the annotations of
                    3513283 entries have been revised. This represents an increase of 3%.
                    
                    Number of fragments: 2168794
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           17505     0.13%
                    2: Evidence at transcript level       482663     3.69%
                    3: Inferred from homology            2725312    20.85%
                    4: Predicted                         9844021    75.32%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 304914
                    
                    The first twenty species represent 1236437 sequences:   9.5 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:13826
                    2x:53921
                    3x:28349
                    4x:17268
                    5x:10540
                    6x: 7501
                    7x: 5297
                    8x: 4211
                    9x: 3449
                    10x: 6573
                    11- 20x:16919
                    21- 50x: 5943
                    51-100x: 2172
                    >100x: 4511
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     358709  Human immunodeficiency virus 1
                    2      95359  Oryza sativa subsp. japonica (Rice)
                    3      76951  Homo sapiens (Human)
                    4      58369  Hepatitis C virus
                    5      50897  Vitis vinifera (Grape)
                    6      50404  Trichomonas vaginalis
                    7      50249  uncultured bacterium
                    8      49349  Mus musculus (Mouse)
                    9      44030  Populus trichocarpa (Western balsam poplar) 
                    10      42188  Hepatitis B virus (HBV)
                    11      41944  Zea mays (Maize)
                    12      40947  Arabidopsis thaliana (Mouse-ear cress)
                    13      39839  Paramecium tetraurelia
                    14      39335  Oryza sativa subsp. indica (Rice)
                    15      34791  Physcomitrella patens subsp. patens (Moss)
                    16      33634  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    17      33195  Selaginella moellendorffii (Spikemoss)
                    18      32625  Arabidopsis lyrata subsp. lyrata
                    19      31830  Caenorhabditis remanei (Caenorhabditis vulgaris)
                    20      31792  Drosophila melanogaster (Fruit fly)
                    21      31263  Ricinus communis (Castor bean)
                    22      29115  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    23      28089  Tetraodon nigroviridis (Green puffer)
                    24      26783  Danio rerio (Zebrafish) (Brachydanio rerio)
                    25      25201  Ralstonia solanacearum (Pseudomonas solanacearum)
                    26      24810  Nematostella vectensis (Starlet sea anemone)
                    27      23475  Rattus norvegicus (Rat)
                    28      23115  Perkinsus marinus ATCC 50983
                    29      21925  Escherichia coli
                    30      21380  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    31      21357  Caenorhabditis elegans
                    32      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
                    33      20734  Trypanosoma cruzi
                    34      20437  Puccinia graminis f. sp. tritici CRL 75-36-700-3
                    35      18880  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    36      17964  Caenorhabditis briggsae
                    37      17946  Drosophila simulans (Fruit fly)
                    38      17849  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    39      17799  Ailuropoda melanoleuca (Giant panda)
                    40      17604  Phytophthora infestans T30-4
                    41      16973  Tribolium castaneum (Red flour beetle)
                    42      16901  Drosophila yakuba (Fruit fly)
                    43      16737  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    44      16707  Drosophila persimilis (Fruit fly)
                    45      16386  Ectocarpus siliculosus (Brown alga)
                    46      16277  Loa loa (Eye worm)
                    47      16249  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    48      16231  Bos taurus (Bovine)
                    49      16180  Drosophila sechellia (Fruit fly)
                    50      15943  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    51      15868  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    52      15715  Naegleria gruberi (Amoeba)
                    53      15661  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    54      15419  Drosophila willistoni (Fruit fly)
                    55      15248  Tetrahymena thermophila SB210
                    56      15157  Canis familiaris (Dog) (Canis lupus familiaris)
                    57      15138  Drosophila ananassae (Fruit fly)
                    58      15029  Harpegnathos saltator
                    59      14923  Drosophila erecta (Fruit fly)
                    60      14818  Chlamydomonas reinhardtii
                    61      14791  Camponotus floridanus
                    62      14776  Drosophila mojavensis (Fruit fly)
                    63      14758  Anopheles gambiae (African malaria mosquito)
                    64      14697  Drosophila virilis (Fruit fly)
                    65      14671  Plasmodium chabaudi
                    66      14653  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    67      14634  Volvox carteri f. nagariensis
                    68      14263  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    69      13785  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    70      13560  Moniliophthora perniciosa FA553
                    71      13505  Schistosoma mansoni (Blood fluke)
                    72      13361  Aspergillus flavus 
                    73      13290  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    74      13128  Schizophyllum commune H4-8
                    75      12983  Gallus gallus (Chicken)
                    76      12964  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    77      12954  Stigmatella aurantiaca DW4/3-1
                    78      12715  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    79      12705  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    80      12546  Glycine max (Soybean) (Glycine hispida)
                    81      12469  Xenopus laevis (African clawed frog)
                    82      12445  Polysphondylium pallidum (Cellular slime mold)
                    83      12300  Hepatitis C virus subtype 1b
                    84      12022  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    85      12019  Glomerella graminicola M1.001
                    86      11850  Aspergillus oryzae
                    87      11806  Plasmodium falciparum
                    88      11802  Plasmodium berghei
                    89      11705  Pyrenophora teres f. teres 0-1
                    90      11565  Trichoplax adhaerens (Trichoplax reptans)
                    91      11496  Brugia malayi (Filarial nematode worm)
                    92      11211  Ktedonobacter racemifer DSM 44963
                    93      11181  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
                    94      11087  Helicobacter pylori (Campylobacter pylori)
                    95      10966  Streptomyces clavuligerus ATCC 27064
                    96      10916  Schistosoma japonicum (Blood fluke)
                    97      10861  Chaetomium globosum (Soil fungus)
                    98      10832  Pediculus humanus subsp. corporis (Body louse)
                    99      10806  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
                    100      10676  Podospora anserina
                    101      10405  Neurospora crassa
                    102      10391  Aspergillus nidulans FGSC A4
                    103      10387  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    104      10328  Phaeodactylum tricornutum CCAP 1055/1
                    105      10277  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
                    106      10223  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    107      10114  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
                    108      10114  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    109      10089  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    110      10061  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    111      10015  Streptomyces bingchenggensis (strain BCW-1)
                    112       9996  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    113       9762  Rabies virus
                    114       9755  Chlorella variabilis
                    115       9743  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    116       9675  Cryptococcus neoformans (Filobasidiella neoformans)
                    117       9662  Trypanosoma brucei gambiense DAL972
                    118       9564  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    119       9553  Aspergillus fumigatus (Sartorya fumigata)
                    120       9532  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    121       9516  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    122       9484  Streptomyces violaceusniger Tu 4113
                    123       9481  Trypanosoma brucei
                    124       9381  Salmo salar (Atlantic salmon)
                    125       9289  Plasmodium vivax
                    126       9241  Monosiga brevicollis (Choanoflagellate)
                    127       9234  Candida albicans (Yeast)
                    128       9202  Amycolatopsis mediterranei (strain U-32)
                    129       9187  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    130       9184  Emericella nidulans (Aspergillus nidulans)
                    131       9177  Streptomyces hygroscopicus ATCC 53653
                    132       9165  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    133       9114  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    134       9090  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    135       8994  Dictyostelium discoideum (Slime mold)
                    136       8974  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    137       8958  Thalassiosira pseudonana (Marine diatom)
                    138       8945  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    139       8901  Catenulispora acidiphila 
                    140       8862  Aspergillus clavatus
                    141       8764  Rhodococcus sp. (strain RHA1)
                    142       8744  Toxoplasma gondii
                    143       8712  Paracoccidioides brasiliensis (strain Pb18)
                    144       8695  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    145       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    146       8601  Entamoeba dispar SAW760
                    147       8437  Plesiocystis pacifica SIR-1
                    148       8394  Streptomyces sp. AA4
                    149       8299  Entamoeba histolytica
                    150       8249  Microscilla marina ATCC 23134
                    151       8228  Leishmania major
                    152       8215  Bradyrhizobium japonicum
                    153       8202  Streptomyces sviceus ATCC 29083
                    154       8201  Microcoleus chthonoplastes PCC 7420
                    155       8164  Frankia sp. EUN1f
                    156       8154  Burkholderia xenovorans (strain LB400)
                    157       8084  Pseudomonas aeruginosa
                    158       8019  Leishmania infantum
                    159       7989  Trichophyton verrucosum (strain HKI 0517)
                    160       7978  Toxoplasma gondii ME49
                    161       7955  Ostreococcus tauri
                    162       7943  Rhodococcus opacus (strain B4)
                    163       7940  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    164       7916  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    165       7883  Leishmania braziliensis
                    166       7867  Streptomyces ghanaensis ATCC 14672
                    167       7856  Acaryochloris marina (strain MBIC 11017)
                    168       7852  Paracoccidioides brasiliensis (strain Pb03)
                    169       7836  Toxoplasma gondii VEG
                    170       7823  Burkholderia sp. Ch1-1
                    171       7809  Plasmodium yoelii yoelii
                    172       7739  Uncinocarpus reesii (strain UAMH 1704)
                    173       7708  Streptomyces viridochromogenes DSM 40736
                    174       7571  Clostridium hathewayi DSM 13479
                    175       7563  Burkholderia pseudomallei MSHR346
                    176       7528  Streptomyces sp. C
                    177       7523  Streptomyces lividans TK24
                    178       7519  Solibacter usitatus (strain Ellin6076)
                    179       7501  Tuber melanosporum (Perigord truffle)
                    180       7480  Streptomyces coelicolor
                    181       7475  Burkholderia pseudomallei 1710a
                    182       7465  Burkholderia pseudomallei Pakistan 9
                    183       7459  Burkholderia sp. H160
                    184       7386  Ostreococcus lucimarinus (strain CCE9901)
                    185       7379  Streptomyces sp. ACT-1
                    186       7367  Burkholderia pseudomallei 576
                    187       7349  Burkholderia pseudomallei 305
                    188       7274  Clostridium bolteae ATCC BAA-613
                    189       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    190       7231  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    191       7228  Streptomyces avermitilis
                    192       7202  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    193       7179  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    194       7147  Giardia intestinalis (strain ATCC 50803 / WB clone C6) (Giardia lamblia)
                    195       7140  Burkholderia pseudomallei 1106b
                    196       7131  Burkholderia phymatum (strain DSM 17167 / STM815)
                    197       7124  Burkholderia ambifaria MEX-5
                    198       7094  Medicago truncatula (Barrel medic)
                    199       7079  Frankia sp. EuI1c
                    200       7068  uncultured archaeon
                    201       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    202       7016  Myxococcus xanthus (strain DK 1622)
                    203       7005  Mucilaginibacter paludis DSM 18603
                    204       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    205       6976  Rhodopirellula baltica
                    206       6959  Frankia sp. (strain EAN1pec)
                    207       6943  Streptomyces sp. Mg1
                    208       6932  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    209       6923  Burkholderia ambifaria IOP40-10
                    210       6903  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    211       6902  Saccharopolyspora erythraea (strain NRRL 23338)
                    212       6897  Hepatitis C virus subtype 1a
                    213       6892  Streptomyces roseosporus NRRL 15998
                    214       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    215       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    216       6866  Burkholderia sp. (strain CCGE1002)
                    217       6866  Streptomyces pristinaespiralis ATCC 25486
                    218       6859  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    219       6817  Clostridium asparagiforme DSM 15981
                    220       6816  Rhizobium loti (Mesorhizobium loti)
                    221       6798  Achromobacter xylosoxidans A8
                    222       6783  Sus scrofa (Pig)
                    223       6771  Burkholderia pseudomallei (strain 1106a)
                    224       6769  Sinorhizobium meliloti AK83
                    225       6740  Burkholderia pseudomallei (strain 668)
                    226       6736  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    227       6725  Burkholderia graminis C4D1M
                    228       6713  Rhizobium leguminosarum bv. viciae (strain 3841)
                    229       6712  Rhodococcus erythropolis SK121
                    230       6705  Chthoniobacter flavus Ellin428
                    231       6702  Streptomyces flavogriseus ATCC 33331
                    232       6695  Streptococcus pneumoniae
                    233       6692  Bacillus thuringiensis IBL 200
                    234       6690  delta proteobacterium NaphS2
                    235       6682  Sinorhizobium meliloti BL225C
                    236       6680  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    237       6679  Mesorhizobium opportunistum WSM2075
                    238       6662  Burkholderia pseudomallei S13
                    239       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    240       6655  Bacillus thuringiensis IBL 4222
                    241       6644  Beggiatoa sp. PS
                    242       6627  Burkholderia cenocepacia (strain MC0-3)
                    243       6614  Burkholderia multivorans CGD2
                    244       6613  Burkholderia pseudomallei Pasteur 52237
                    245       6606  Burkholderia multivorans CGD2M
                    246       6583  Bacillus thuringiensis serovar sotto str. T04001
                    247       6559  Cyanothece sp. (strain PCC 7822)
                    248       6527  Burkholderia multivorans CGD1
                    249       6521  Streptomyces sp. ACTE
                    250       6504  Frankia alni (strain ACN14a)
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          239924 (  2%)
                    Bacteria        8179374 ( 63%)
                    Eukaryota       3581090 ( 27%)
                    Viruses         1051570 (  8%)
                    Other             17542 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  76986 (  2%)           (  1%)
                    Other Mammalia        223137 (  6%)           (  2%)
                    Other Vertebrata      326635 (  9%)           (  2%)
                    Viridiplantae         847295 ( 24%)           (  6%)
                    Fungi                 725480 ( 20%)           (  6%)
                    Insecta               555504 ( 16%)           (  4%)
                    Nematoda              109890 (  3%)           (  1%)
                    Other                 716163 ( 20%)           (  5%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  279604             1001-1100    76936
                    51- 100 1037437             1101-1200    54323
                    101- 150 1197364             1201-1300    37418
                    151- 200 1155436             1301-1400    24668
                    201- 250 1160195             1401-1500    19774
                    251- 300 1124852             1501-1600    14135
                    301- 350 1022462             1601-1700    10640
                    351- 400  792289             1701-1800     8298
                    401- 450  668208             1801-1900     6718
                    451- 500  558943             1901-2000     5626
                    501- 550  380503             2001-2100     4564
                    551- 600  292825             2101-2200     4723
                    601- 650  212063             2201-2300     3721
                    651- 700  165194             2301-2400     2938
                    701- 750  142510             2401-2500     2526
                    751- 800  128824             >2500        22111
                    801- 850   94063
                    851- 900   85690
                    901- 950   58578
                    951-1000   44548
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   321 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    15865447                1.21                                                    
                    Submitted to EMBL/GenBank/DDBJ   9356129   8238169      0.72                                                    
                    Journal                          6338224   5705206      0.48                                                    
                    Submitted to other databases       63672     63018     <0.01                                                    
                    Thesis                              7607      7549     <0.01                                                    
                    Book citation                       5234      5183     <0.01                                                    
                    Other                              94581     92892      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 311466
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      11641161                0.89                                                    
                    CATALYTIC ACTIVITY               1218235   1126693      0.09     4                                              
                    CAUTION                          2705609   2705609      0.21     2                                              
                    COFACTOR                          398395    383130      0.03     8                                              
                    DOMAIN                             24844     23055     <0.01    10                                              
                    FUNCTION                         1430655   1324862      0.11     3                                              
                    INTERACTION                         2477      2477     <0.01    11                                              
                    MISCELLANEOUS                      27791     27787     <0.01     9                                              
                    PATHWAY                           587973    541275      0.04     6                                              
                    SIMILARITY                       3721983   3227182      0.28     1                                              
                    SUBCELLULAR LOCATION              988554    984880      0.08     5                                              
                    SUBUNIT                           534645    533017      0.04     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       4414788                0.34                                                    
                    CHAIN                             458083    359480      0.04     2                                              
                    NON_TER                          3649333   2167312      0.28     1                                              
                    SIGNAL                            306782    305964      0.02     3                                              
                    TRANSIT                              590       590     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             152003555               11.63                                                    
                    AGD                                 2567      2567     <0.01    76   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    94   2D gel databases                           
                    Allergome                           1822      1287     <0.01    80   Protein family/group databases             
                    ArachnoServer                         66        66     <0.01    92   Organism-specific databases                
                    ArrayExpress                       93280     93267      0.01    48   Gene expression databases                  
                    BRENDA                              2842      2779     <0.01    74   Enzyme and pathway databases               
                    Bgee                              128753    128653      0.01    44   Gene expression databases                  
                    BioCyc                           1624015   1589426      0.12    21   Enzyme and pathway databases               
                    CAZy                               74739     70230      0.01    49   Protein family/group databases             
                    CGD                                 6773      6773     <0.01    71   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    97   2D gel databases                           
                    CTD                               170588    169861      0.01    42   Organism-specific databases                
                    CYGD                                   5         5     <0.01    98   Organism-specific databases                
                    DIP                                 2765      2759     <0.01    75   Protein-protein interaction databases      
                    EMBL                            14586789  13040658      1.12     3   Sequence databases                         
                    Ensembl                           372998    219395      0.03    30   Genome annotation databases                
                    EnsemblBacteria                   501523    471651      0.04    27   Genome annotation databases                
                    EnsemblFungi                       98065     97959      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    295150    251283      0.02    33   Genome annotation databases                
                    EnsemblPlants                     207330    192399      0.02    37   Genome annotation databases                
                    EnsemblProtists                    24279     24100     <0.01    59   Genome annotation databases                
                    EuPathDB                          151307    151307      0.01    43   Organism-specific databases                
                    FlyBase                           194708    193181      0.01    40   Organism-specific databases                
                    GO                              25546307   8034862      1.95     2   Ontologies                                 
                    Gene3D                           5236993   4248204      0.40     9   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   101   Organism-specific databases                
                    GeneID                           5378310   5264682      0.41     7   Genome annotation databases                
                    Genevestigator                    100690    100680      0.01    46   Gene expression databases                  
                    GenoList                           14755     14482     <0.01    64   Organism-specific databases                
                    GenomeReviews                    3653964   3567915      0.28    12   Genome annotation databases                
                    Gramene                            68842     68842      0.01    51   Organism-specific databases                
                    H-InvDB                              601       490     <0.01    84   Organism-specific databases                
                    HAMAP                             852493    842842      0.07    25   Family and domain databases                
                    HGNC                               65791     64004      0.01    53   Organism-specific databases                
                    HOGENOM                          2201693   2201565      0.17    17   Phylogenomic databases                     
                    HOVERGEN                          319987    318279      0.02    32   Phylogenomic databases                     
                    HSSP                              253814    253540      0.02    34   3D structure databases                     
                    IPI                               241712    241690      0.02    36   Sequence databases                         
                    InParanoid                        195416    195325      0.01    38   Phylogenomic databases                     
                    IntAct                             15611     15611     <0.01    63   Protein-protein interaction databases      
                    InterPro                        27424202  10036137      2.10     1   Family and domain databases                
                    KEGG                             4664032   4572886      0.36    10   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    72   Organism-specific databases                
                    Leproma                              936       935     <0.01    83   Organism-specific databases                
                    MEROPS                             66599     65155      0.01    52   Protein family/group databases             
                    MGI                                42963     42918     <0.01    57   Organism-specific databases                
                    MINT                                9022      9022     <0.01    69   Protein-protein interaction databases      
                    NMPDR                             921022    921012      0.07    24   Genome annotation databases                
                    NextBio                            47113     47110     <0.01    56   Other                                      
                    OMA                              2429733   2429731      0.19    16   Phylogenomic databases                     
                    OrthoDB                           429397    429396      0.03    29   Phylogenomic databases                     
                    PANTHER                          2114193   1993143      0.16    20   Family and domain databases                
                    PDB                                13547      8055     <0.01    66   3D structure databases                     
                    PDBsum                             13215      7861     <0.01    67   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    89   2D gel databases                           
                    PIR                               175378    142540      0.01    41   Sequence databases                         
                    PIRSF                             729544    729544      0.06    26   Family and domain databases                
                    PMAP-CutDB                           253       253     <0.01    86   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    99   2D gel databases                           
                    PRIDE                             103511    103510      0.01    45   Proteomic databases                        
                    PRINTS                           2117851   1878980      0.16    19   Family and domain databases                
                    PROSITE                          6375984   4261976      0.49     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    96   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2482      2473     <0.01    77   Protein family/group databases             
                    Pfam                            12742293   9489467      0.97     4   Family and domain databases                
                    PharmGKB                              85        85     <0.01    91   Organism-specific databases                
                    PhosphoSite                         1796      1796     <0.01    81   PTM databases                              
                    PhylomeDB                         372692    372660      0.03    31   Phylogenomic databases                     
                    ProDom                            246152    230443      0.02    35   Family and domain databases                
                    ProMEX                               423       423     <0.01    85   Proteomic databases                        
                    ProtClustDB                      2623391   2623375      0.20    14   Phylogenomic databases                     
                    ProteinModelPortal               4147940   4147583      0.32    11   3D structure databases                     
                    PseudoCAP                           4344      4341     <0.01    73   Organism-specific databases                
                    REBASE                             14534     14001     <0.01    65   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   95        94     <0.01    90   2D gel databases                           
                    RGD                                17471     17373     <0.01    62   Organism-specific databases                
                    Reactome                              58        55     <0.01    93   Enzyme and pathway databases               
                    RefSeq                           5393654   5266686      0.41     6   Sequence databases                         
                    SGD                                  246       246     <0.01    87   Organism-specific databases                
                    SMART                            2700266   2089173      0.21    13   Family and domain databases                
                    SMR                              2143753   2143753      0.16    18   3D structure databases                     
                    STRING                           1204600   1204458      0.09    22   Protein-protein interaction databases      
                    SUPFAM                           5363920   4441297      0.41     8   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    95   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01   100   2D gel databases                           
                    TAIR                               18363     18283     <0.01    61   Organism-specific databases                
                    TCDB                                2348      2339     <0.01    78   Protein family/group databases             
                    TIGR                              195095    188048      0.01    39   Genome annotation databases                
                    TIGRFAMs                         2564572   2338857      0.20    15   Family and domain databases                
                    TubercuList                         2240      2235     <0.01    79   Organism-specific databases                
                    UCSC                               49949     49949     <0.01    54   Genome annotation databases                
                    UniGene                           465312    435142      0.04    28   Sequence databases                         
                    VectorBase                         47555     47087     <0.01    55   Genome annotation databases                
                    World-2DPAGE                         947       942     <0.01    82   2D gel databases                           
                    WormBase                           41076     40948     <0.01    58   Organism-specific databases                
                    Xenbase                            13158     13134     <0.01    68   Organism-specific databases                
                    ZFIN                               21606     21601     <0.01    60   Organism-specific databases                
                    dictyBase                           8434      8434     <0.01    70   Organism-specific databases                
                    eggNOG                           1146637   1146637      0.09    23   Phylogenomic databases                     
                    euHCVdb                            74732     74729      0.01    50   Organism-specific databases                
                    
                    Number of explicitly cross-referenced databases: 127
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.61   Gln (Q) 3.86   Leu (L) 9.83   Ser (S) 6.69
                    Arg (R) 5.46   Glu (E) 6.13   Lys (K) 5.27   Thr (T) 5.61
                    Asn (N) 4.15   Gly (G) 7.12   Met (M) 2.47   Trp (W) 1.31
                    Asp (D) 5.29   His (H) 2.19   Phe (F) 4.03   Tyr (Y) 3.06
                    Cys (C) 1.27   Ile (I) 6.02   Pro (P) 4.73   Val (V) 6.74
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 424928
                    Total number of entries encoded on a Plasmid: 184198
                    Total number of entries encoded on a Plastid: 11244
                    Total number of entries encoded on a Plastid; Apicoplast: 365
                    Total number of entries encoded on a Plastid; Chloroplast: 127641
                    Total number of entries encoded on a Plastid; Cyanelle: 8
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 444