Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
 UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2011_04 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2011_04 of 05-Apr-2011 of UniProtKB/TrEMBL contains 14555721 sequence entries,
                    comprising 4685791965 amino acids .
                    
                    712852 sequences have been added since release 2011_03, the sequence data of
                    3843 existing entries has been updated and the annotations of
                    4070435 entries have been revised. This represents an increase of 5%.
                    
                    Number of fragments: 2336722
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           18015     0.12%
                    2: Evidence at transcript level       490770     3.37%
                    3: Inferred from homology            3012532    20.70%
                    4: Predicted                        11034404    75.81%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 336908
                    
                    The first twenty species represent 1277658 sequences:   8.8 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:15766
                    2x:59026
                    3x:30437
                    4x:18234
                    5x:11428
                    6x: 7941
                    7x: 5642
                    8x: 4437
                    9x: 3641
                    10x: 7066
                    11- 20x:17811
                    21- 50x: 6336
                    51-100x: 2282
                    >100x: 4960
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     375423  Human immunodeficiency virus 1
                    2      95377  Oryza sativa subsp. japonica (Rice)
                    3      85473  Homo sapiens (Human)
                    4      58563  Hepatitis C virus
                    5      56149  Mus musculus (Mouse)
                    6      53815  uncultured bacterium
                    7      50948  Vitis vinifera (Grape)
                    8      50470  Trichomonas vaginalis
                    9      44039  Populus trichocarpa (Western balsam poplar) 
                    10      43314  Hepatitis B virus (HBV)
                    11      41987  Zea mays (Maize)
                    12      40460  Arabidopsis thaliana (Mouse-ear cress)
                    13      39840  Paramecium tetraurelia
                    14      39367  Oryza sativa subsp. indica (Rice)
                    15      36147  Danio rerio (Zebrafish) (Brachydanio rerio)
                    16      34791  Physcomitrella patens subsp. patens (Moss)
                    17      33649  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    18      33195  Selaginella moellendorffii (Spikemoss)
                    19      32625  Arabidopsis lyrata subsp. lyrata
                    20      32026  Drosophila melanogaster (Fruit fly)
                    21      31830  Caenorhabditis remanei (Caenorhabditis vulgaris)
                    22      31296  Ricinus communis (Castor bean)
                    23      30815  Trypanosoma cruzi
                    24      30505  Daphnia pulex (Water flea)
                    25      29114  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    26      29024  Oikopleura dioica (Tunicate)
                    27      28089  Tetraodon nigroviridis (Green puffer)
                    28      25275  Ralstonia solanacearum (Pseudomonas solanacearum)
                    29      24811  Nematostella vectensis (Starlet sea anemone)
                    30      24045  Rattus norvegicus (Rat)
                    31      23115  Perkinsus marinus ATCC 50983
                    32      22601  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    33      22342  Escherichia coli
                    34      21469  Caenorhabditis elegans
                    35      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
                    36      20437  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
                    37      18890  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    38      18771  mine drainage metagenome
                    39      18065  Drosophila simulans (Fruit fly)
                    40      17933  Caenorhabditis briggsae
                    41      17846  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    42      17802  Ailuropoda melanoleuca (Giant panda)
                    43      17605  Phytophthora infestans T30-4
                    44      16974  Tribolium castaneum (Red flour beetle)
                    45      16929  Drosophila yakuba (Fruit fly)
                    46      16734  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    47      16707  Drosophila persimilis (Fruit fly)
                    48      16425  Ectocarpus siliculosus (Brown alga)
                    49      16303  Bos taurus (Bovine)
                    50      16295  Loa loa (Eye worm)
                    51      16244  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    52      16220  Trichinella spiralis (Trichina worm)
                    53      16180  Drosophila sechellia (Fruit fly)
                    54      15982  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    55      15855  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    56      15715  Naegleria gruberi (Amoeba)
                    57      15652  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    58      15418  Drosophila willistoni (Fruit fly)
                    59      15247  Tetrahymena thermophila SB210
                    60      15232  Canis familiaris (Dog) (Canis lupus familiaris)
                    61      15137  Drosophila ananassae (Fruit fly)
                    62      15029  Harpegnathos saltator
                    63      14921  Drosophila erecta (Fruit fly)
                    64      14819  Chlamydomonas reinhardtii (Chlamydomonas smithii)
                    65      14791  Camponotus floridanus
                    66      14775  Drosophila mojavensis (Fruit fly)
                    67      14757  Anopheles gambiae (African malaria mosquito)
                    68      14696  Drosophila virilis (Fruit fly)
                    69      14671  Plasmodium chabaudi
                    70      14652  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    71      14634  Volvox carteri f. nagariensis
                    72      14626  Toxoplasma gondii
                    73      14308  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
                    74      14258  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    75      13774  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    76      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
                    77      13510  Schistosoma mansoni (Blood fluke)
                    78      13432  Hepatitis C virus subtype 1b
                    79      13349  Aspergillus flavus 
                    80      13284  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    81      13186  Magnaporthe oryzae (strain 70-15 / FGSC 8958) (Rice blast fungus) 
                    82      13126  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
                    83      13041  Gallus gallus (Chicken)
                    84      12956  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    85      12950  Stigmatella aurantiaca (strain DW4/3-1)
                    86      12704  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    87      12622  Glycine max (Soybean) (Glycine hispida)
                    88      12545  Xenopus laevis (African clawed frog)
                    89      12533  Leptosphaeria maculans (Blackleg fungus) (Phoma lingam)
                    90      12444  Polysphondylium pallidum (Cellular slime mold)
                    91      12130  Plasmodium falciparum
                    92      12017  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    93      12014  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
                    94      11715  Thalassiosira pseudonana (Marine diatom)
                    95      11710  Hepatitis C virus subtype 1a
                    96      11704  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
                    97      11647  Anopheles darlingi (Mosquito)
                    98      11645  Plasmodium berghei (strain Anka)
                    99      11611  Aspergillus oryzae (strain ATCC 42149 / RIB 40)
                    100      11563  Trichoplax adhaerens (Trichoplax reptans)
                    101      11497  Brugia malayi (Filarial nematode worm)
                    102      11350  Helicobacter pylori (Campylobacter pylori)
                    103      11272  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
                    104      11211  Ktedonobacter racemifer DSM 44963
                    105      10966  Streptomyces clavuligerus ATCC 27064
                    106      10918  Schistosoma japonicum (Blood fluke)
                    107      10857  Chaetomium globosum (Soil fungus)
                    108      10832  Pediculus humanus subsp. corporis (Body louse)
                    109      10791  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
                    110      10669  Podospora anserina
                    111      10581  Metarhizium anisopliae ARSEF 23
                    112      10408  Neurospora crassa
                    113      10399  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    114      10377  Aspergillus nidulans FGSC A4
                    115      10357  Phaeodactylum tricornutum (strain CCAP 1055/1)
                    116      10277  Rabies virus
                    117      10276  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
                    118      10218  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    119      10212  Coccidioides posadasii str. Silveira
                    120      10113  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
                    121      10105  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    122      10092  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    123      10076  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    124      10057  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    125      10015  Streptomyces bingchenggensis (strain BCW-1)
                    126       9830  Metarhizium acridum CQMa 102
                    127       9755  Chlorella variabilis
                    128       9729  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    129       9669  Cryptococcus neoformans (Filobasidiella neoformans)
                    130       9662  Trypanosoma brucei gambiense DAL972
                    131       9555  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    132       9526  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    133       9508  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    134       9484  Streptomyces violaceusniger Tu 4113
                    135       9482  Trypanosoma brucei
                    136       9386  Salmo salar (Atlantic salmon)
                    137       9239  Monosiga brevicollis (Choanoflagellate)
                    138       9228  Candida albicans (Yeast)
                    139       9202  Amycolatopsis mediterranei (strain U-32)
                    140       9180  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    141       9177  Streptomyces hygroscopicus ATCC 53653
                    142       9173  Emericella nidulans (Aspergillus nidulans)
                    143       9161  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    144       9114  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    145       9084  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    146       9035  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
                    147       8982  Dictyostelium discoideum (Slime mold)
                    148       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    149       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    150       8900  Catenulispora acidiphila 
                    151       8882  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
                    152       8820  Aspergillus clavatus 
                    153       8757  Rhodococcus sp. (strain RHA1)
                    154       8720  Paracoccidioides brasiliensis (strain Pb18)
                    155       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    156       8682  Arthroderma otae (strain CBS 113480) (Microsporum canis)
                    157       8599  Entamoeba dispar SAW760
                    158       8437  Plesiocystis pacifica SIR-1
                    159       8394  Streptomyces sp. AA4
                    160       8374  Capsaspora owczarzaki ATCC 30864
                    161       8300  Entamoeba histolytica
                    162       8249  Microscilla marina ATCC 23134
                    163       8202  Streptomyces sviceus ATCC 29083
                    164       8201  Microcoleus chthonoplastes PCC 7420
                    165       8186  Leishmania infantum
                    166       8164  Frankia sp. EUN1f
                    167       8154  Burkholderia xenovorans (strain LB400)
                    168       8119  Pseudomonas aeruginosa
                    169       8044  Leishmania mexicana MHOM/GT/2001/U1103
                    170       8010  Leishmania major strain Friedlin
                    171       7997  Leishmania braziliensis
                    172       7978  Toxoplasma gondii ME49
                    173       7975  Trichophyton verrucosum (strain HKI 0517)
                    174       7961  Leishmania donovani BPK282A1
                    175       7955  Ostreococcus tauri
                    176       7943  Rhodococcus opacus (strain B4)
                    177       7924  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    178       7917  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    179       7867  Streptomyces ghanaensis ATCC 14672
                    180       7856  Acaryochloris marina (strain MBIC 11017)
                    181       7845  Paracoccidioides brasiliensis (strain Pb03)
                    182       7823  Burkholderia sp. Ch1-1
                    183       7808  Plasmodium yoelii yoelii
                    184       7726  Uncinocarpus reesii (strain UAMH 1704)
                    185       7708  Streptomyces viridochromogenes DSM 40736
                    186       7657  uncultured archaeon
                    187       7607  Bradyrhizobium japonicum USDA 110
                    188       7571  Clostridium hathewayi DSM 13479
                    189       7563  Burkholderia pseudomallei MSHR346
                    190       7528  Streptomyces sp. C
                    191       7523  Streptomyces lividans TK24
                    192       7519  Solibacter usitatus (strain Ellin6076)
                    193       7503  Tuber melanosporum (Perigord truffle)
                    194       7476  Streptomyces coelicolor
                    195       7475  Burkholderia pseudomallei 1710a
                    196       7465  Burkholderia pseudomallei Pakistan 9
                    197       7459  Burkholderia sp. H160
                    198       7443  Kitasatospora setae  
                    199       7385  Ostreococcus lucimarinus (strain CCE9901)
                    200       7379  Streptomyces sp. ACT-1
                    201       7367  Burkholderia pseudomallei 576
                    202       7349  Burkholderia pseudomallei 305
                    203       7274  Clostridium bolteae ATCC BAA-613
                    204       7241  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    205       7231  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    206       7227  Streptomyces avermitilis
                    207       7177  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    208       7171  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    209       7146  Giardia intestinalis (strain ATCC 50803 / WB clone C6) (Giardia lamblia)
                    210       7140  Burkholderia pseudomallei 1106b
                    211       7130  Burkholderia phymatum (strain DSM 17167 / STM815)
                    212       7124  Burkholderia ambifaria MEX-5
                    213       7097  Medicago truncatula (Barrel medic) (Medicago tribuloides)
                    214       7079  Frankia sp. (strain EuI1c)
                    215       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    216       7016  Myxococcus xanthus (strain DK 1622)
                    217       7005  Mucilaginibacter paludis DSM 18603
                    218       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    219       6975  Rhodopirellula baltica
                    220       6959  Frankia sp. (strain EAN1pec)
                    221       6943  Streptomyces sp. Mg1
                    222       6932  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    223       6923  Burkholderia ambifaria IOP40-10
                    224       6903  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    225       6902  Saccharopolyspora erythraea (strain NRRL 23338)
                    226       6892  Streptomyces roseosporus NRRL 15998
                    227       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    228       6877  Streptococcus pneumoniae
                    229       6869  Sus scrofa (Pig)
                    230       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    231       6866  Streptomyces pristinaespiralis ATCC 25486
                    232       6865  Burkholderia sp. (strain CCGE1002)
                    233       6859  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    234       6831  Rhizobium loti (Mesorhizobium loti)
                    235       6817  Clostridium asparagiforme DSM 15981
                    236       6798  Achromobacter xylosoxidans (strain A8)
                    237       6771  Burkholderia pseudomallei (strain 1106a)
                    238       6769  Sinorhizobium meliloti AK83
                    239       6740  Burkholderia pseudomallei (strain 668)
                    240       6736  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    241       6725  Burkholderia graminis C4D1M
                    242       6713  Rhizobium leguminosarum bv. viciae (strain 3841)
                    243       6712  Rhodococcus erythropolis SK121
                    244       6706  Sporisorium reilianum
                    245       6705  Chthoniobacter flavus Ellin428
                    246       6692  Bacillus thuringiensis IBL 200
                    247       6690  delta proteobacterium NaphS2
                    248       6682  Sinorhizobium meliloti BL225C
                    249       6680  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    250       6679  Mesorhizobium opportunistum WSM2075
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          249101 (  2%)
                    Bacteria        9250503 ( 64%)
                    Eukaryota       3912488 ( 27%)
                    Viruses         1106614 (  8%)
                    Other             37014 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  85509 (  2%)           (  1%)
                    Other Mammalia        233967 (  6%)           (  2%)
                    Other Vertebrata      350910 (  9%)           (  2%)
                    Viridiplantae         859668 ( 22%)           (  6%)
                    Fungi                 818825 ( 21%)           (  6%)
                    Insecta               616042 ( 16%)           (  4%)
                    Nematoda              126421 (  3%)           (  1%)
                    Other                 821146 ( 21%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  318946             1001-1100    85585
                    51- 100 1166206             1101-1200    60376
                    101- 150 1342145             1201-1300    41855
                    151- 200 1296812             1301-1400    27339
                    201- 250 1303610             1401-1500    22043
                    251- 300 1261962             1501-1600    15776
                    301- 350 1146272             1601-1700    11855
                    351- 400  884522             1701-1800     9144
                    401- 450  749719             1801-1900     7421
                    451- 500  626659             1901-2000     6234
                    501- 550  423335             2001-2100     5009
                    551- 600  326576             2101-2200     5167
                    601- 650  237215             2201-2300     4065
                    651- 700  184581             2301-2400     3235
                    701- 750  159086             2401-2500     2768
                    751- 800  142587             >2500        24107
                    801- 850  105781
                    851- 900   96063
                    901- 950   65527
                    951-1000   49416
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   321 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    17480136                1.20                                                    
                    Submitted to EMBL/GenBank/DDBJ  10433939   9228966      0.72                                                    
                    Journal                          6825139   6190717      0.47                                                    
                    Submitted to other databases       89932     89319      0.01                                                    
                    Thesis                              7644      7586     <0.01                                                    
                    Book citation                       5660      5609     <0.01                                                    
                    Other                             117822    115985      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 315880
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      13443604                0.92                                                    
                    CATALYTIC ACTIVITY               1375458   1270079      0.09     4                                              
                    CAUTION                          3508509   3508509      0.24     2                                              
                    COFACTOR                          438795    420517      0.03     8                                              
                    DOMAIN                             28319     26431     <0.01    10                                              
                    FUNCTION                         1606135   1491555      0.11     3                                              
                    INTERACTION                         2451      2451     <0.01    11                                              
                    MISCELLANEOUS                      29594     29590     <0.01     9                                              
                    PATHWAY                           680124    629465      0.05     6                                              
                    SIMILARITY                       4061126   3512969      0.28     1                                              
                    SUBCELLULAR LOCATION             1119096   1114185      0.08     5                                              
                    SUBUNIT                           593997    591902      0.04     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       4750794                0.33                                                    
                    CHAIN                             495787    389407      0.03     2                                              
                    NON_TER                          3929909   2335245      0.27     1                                              
                    SIGNAL                            324503    324446      0.02     3                                              
                    TRANSIT                              595       595     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             164127657               11.28                                                    
                    AGD                                 2556      2556     <0.01    77   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    94   2D gel databases                           
                    Allergome                           1990      1427     <0.01    81   Protein family/group databases             
                    ArachnoServer                         66        66     <0.01    92   Organism-specific databases                
                    ArrayExpress                       92072     92061      0.01    49   Gene expression databases                  
                    BRENDA                              2833      2772     <0.01    75   Enzyme and pathway databases               
                    Bgee                              142018    141841      0.01    45   Gene expression databases                  
                    BioCyc                           1638692   1596900      0.11    21   Enzyme and pathway databases               
                    CAZy                               74772     70249      0.01    52   Protein family/group databases             
                    CGD                                 6760      6760     <0.01    72   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    97   2D gel databases                           
                    CTD                               186951    186221      0.01    42   Organism-specific databases                
                    DIP                                 2747      2742     <0.01    76   Protein-protein interaction databases      
                    EMBL                            16228402  14526328      1.11     3   Sequence databases                         
                    Ensembl                           224310    194863      0.02    38   Genome annotation databases                
                    EnsemblBacteria                   565775    535564      0.04    29   Genome annotation databases                
                    EnsemblFungi                      106266    106177      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    320469    297461      0.02    32   Genome annotation databases                
                    EnsemblPlants                     262396    234835      0.02    34   Genome annotation databases                
                    EnsemblProtists                    72641     71482     <0.01    54   Genome annotation databases                
                    EuPathDB                          148662    148662      0.01    44   Organism-specific databases                
                    FlyBase                           195012    193476      0.01    40   Organism-specific databases                
                    GO                              27852130   8787762      1.91     2   Ontologies                                 
                    Gene3D                           5761962   4572367      0.40     6   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   100   Organism-specific databases                
                    GeneID                           5632535   5517105      0.39     8   Genome annotation databases                
                    GeneTree                         1167985   1167642      0.08    23   Phylogenomic databases                     
                    Genevestigator                    100050    100040      0.01    48   Gene expression databases                  
                    GenoList                           14752     14479     <0.01    66   Organism-specific databases                
                    GenomeReviews                    3758282   3659081      0.26    12   Genome annotation databases                
                    Gramene                            68784     68784     <0.01    56   Organism-specific databases                
                    H-InvDB                              598       487     <0.01    85   Organism-specific databases                
                    HAMAP                            1031530   1019580      0.07    25   Family and domain databases                
                    HGNC                               73165     71402      0.01    53   Organism-specific databases                
                    HOGENOM                          2205191   2205147      0.15    18   Phylogenomic databases                     
                    HOVERGEN                          317241    317240      0.02    33   Phylogenomic databases                     
                    HSSP                              254759    254498      0.02    36   3D structure databases                     
                    IPI                               241080    241051      0.02    37   Sequence databases                         
                    InParanoid                        194858    194768      0.01    41   Phylogenomic databases                     
                    IntAct                             16918     16918     <0.01    64   Protein-protein interaction databases      
                    InterPro                        29830670  10894489      2.05     1   Family and domain databases                
                    KEGG                             4802406   4690987      0.33    10   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    73   Organism-specific databases                
                    Leproma                              936       935     <0.01    84   Organism-specific databases                
                    MEROPS                             70819     69336     <0.01    55   Protein family/group databases             
                    MGI                                49328     49278     <0.01    58   Organism-specific databases                
                    MINT                                8975      8975     <0.01    70   Protein-protein interaction databases      
                    NMPDR                             920603    920593      0.06    26   Genome annotation databases                
                    NextBio                            46747     46744     <0.01    59   Other                                      
                    OMA                              2432762   2432760      0.17    16   Phylogenomic databases                     
                    OrthoDB                           579141    578973      0.04    28   Phylogenomic databases                     
                    PANTHER                          1822628   1758658      0.13    20   Family and domain databases                
                    PDB                                14593      8630     <0.01    67   3D structure databases                     
                    PDBsum                             14359      8495     <0.01    68   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    89   2D gel databases                           
                    PIR                               174812    141979      0.01    43   Sequence databases                         
                    PIRSF                             812681    812681      0.06    27   Family and domain databases                
                    PMAP-CutDB                           253       253     <0.01    87   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    98   2D gel databases                           
                    PRIDE                             134374    134368      0.01    46   Proteomic databases                        
                    PRINTS                           2301299   2045030      0.16    17   Family and domain databases                
                    PROSITE                          6727057   4441902      0.46     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    96   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2500      2491     <0.01    78   Protein family/group databases             
                    Pfam                            13840578  10308901      0.95     4   Family and domain databases                
                    PharmGKB                              83        83     <0.01    91   Organism-specific databases                
                    PhosphoSite                         1574      1574     <0.01    82   PTM databases                              
                    PhylomeDB                         372152    372120      0.03    31   Phylogenomic databases                     
                    ProDom                            257340    240823      0.02    35   Family and domain databases                
                    ProMEX                               419       419     <0.01    86   Proteomic databases                        
                    ProtClustDB                      2738594   2738583      0.19    15   Phylogenomic databases                     
                    ProteinModelPortal               4646223   4646146      0.32    11   3D structure databases                     
                    PseudoCAP                           4343      4340     <0.01    74   Organism-specific databases                
                    REBASE                             16488     15919     <0.01    65   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   94        93     <0.01    90   2D gel databases                           
                    RGD                                18124     17924     <0.01    63   Organism-specific databases                
                    Reactome                              58        55     <0.01    93   Enzyme and pathway databases               
                    RefSeq                           5648061   5519339      0.39     7   Sequence databases                         
                    SMART                            2913342   2254637      0.20    13   Family and domain databases                
                    SMR                              2158907   2158907      0.15    19   3D structure databases                     
                    STRING                           1203195   1203056      0.08    22   Protein-protein interaction databases      
                    SUPFAM                           5624326   4611742      0.39     9   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    95   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    99   2D gel databases                           
                    TAIR                               18432     18348     <0.01    62   Organism-specific databases                
                    TCDB                                2405      2398     <0.01    79   Protein family/group databases             
                    TIGR                              211333    196214      0.01    39   Genome annotation databases                
                    TIGRFAMs                         2828371   2579443      0.19    14   Family and domain databases                
                    TubercuList                         2194      2189     <0.01    80   Organism-specific databases                
                    UCSC                               49565     49563     <0.01    57   Genome annotation databases                
                    UniGene                           467095    438359      0.03    30   Sequence databases                         
                    VectorBase                         78962     78450      0.01    50   Genome annotation databases                
                    World-2DPAGE                         946       941     <0.01    83   2D gel databases                           
                    WormBase                           41369     41239     <0.01    60   Organism-specific databases                
                    Xenbase                            13232     13194     <0.01    69   Organism-specific databases                
                    ZFIN                               21584     21579     <0.01    61   Organism-specific databases                
                    dictyBase                           7930      7930     <0.01    71   Organism-specific databases                
                    eggNOG                           1145419   1145419      0.08    24   Phylogenomic databases                     
                    euHCVdb                            75268     75265      0.01    51   Organism-specific databases                
                    
                    Number of explicitly cross-referenced databases: 129
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.62   Gln (Q) 3.86   Leu (L) 9.84   Ser (S) 6.69
                    Arg (R) 5.46   Glu (E) 6.13   Lys (K) 5.26   Thr (T) 5.62
                    Asn (N) 4.14   Gly (G) 7.12   Met (M) 2.48   Trp (W) 1.31
                    Asp (D) 5.30   His (H) 2.19   Phe (F) 4.03   Tyr (Y) 3.05
                    Cys (C) 1.26   Ile (I) 6.02   Pro (P) 4.72   Val (V) 6.75
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 476624
                    Total number of entries encoded on a Plasmid: 197022
                    Total number of entries encoded on a Plastid: 12227
                    Total number of entries encoded on a Plastid; Apicoplast: 368
                    Total number of entries encoded on a Plastid; Chloroplast: 133111
                    Total number of entries encoded on a Plastid; Cyanelle: 8
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 448