Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2010_05 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2010_05 of 20-Apr-2010 of UniProtKB/TrEMBL contains 10706472 sequence entries,
                    comprising 3452790297 amino acids .
                    
                    161141 sequences have been added since release 2010_04, the sequence data of
                    391 existing entries has been updated and the annotations of
                    1504904 entries have been revised. This represents an increase of 2%.
                    
                    Number of fragments: 1815229
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           19391     0.18%
                    2: Evidence at transcript level       462560     4.32%
                    3: Inferred from homology            2227524    20.81%
                    4: Predicted                         7996997    74.69%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 235184
                    
                    The first twenty species represent 1153259 sequences:  10.8 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:10134
                    2x:41404
                    3x:22969
                    4x:13883
                    5x: 8735
                    6x: 6311
                    7x: 4339
                    8x: 3522
                    9x: 2870
                    10x: 4480
                    11- 20x:14451
                    21- 50x: 5111
                    51-100x: 1970
                    >100x: 3791
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     329423  Human immunodeficiency virus 1
                    2      95730  Oryza sativa subsp. japonica (Rice)
                    3      71286  Homo sapiens (Human)
                    4      56279  Hepatitis C virus
                    5      50402  Trichomonas vaginalis
                    6      47743  Mus musculus (Mouse)
                    7      44033  Populus trichocarpa (Western balsam poplar) 
                    8      43659  uncultured bacterium
                    9      42332  Arabidopsis thaliana (Mouse-ear cress)
                    10      41879  Zea mays (Maize)
                    11      39843  Paramecium tetraurelia
                    12      39282  Oryza sativa subsp. indica (Rice)
                    13      37527  Hepatitis B virus (HBV)
                    14      34761  Physcomitrella patens subsp. patens
                    15      33629  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    16      31216  Ricinus communis (Castor bean)
                    17      30303  Drosophila melanogaster (Fruit fly)
                    18      29074  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    19      28085  Tetraodon nigroviridis (Green puffer)
                    20      26773  Danio rerio (Zebrafish) (Brachydanio rerio)
                    21      25081  Vitis vinifera (Grape)
                    22      24830  Nematostella vectensis (Starlet sea anemone)
                    23      23527  Rattus norvegicus (Rat)
                    24      23115  Perkinsus marinus ATCC 50983
                    25      21081  Ixodes scapularis (Black-legged tick) (Deer tick)
                    26      21041  Caenorhabditis elegans
                    27      20632  Trypanosoma cruzi
                    28      18873  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    29      18122  Caenorhabditis briggsae
                    30      17861  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    31      17792  Ailuropoda melanoleuca (Giant panda)
                    32      17610  Phytophthora infestans T30-4
                    33      17447  Drosophila simulans (Fruit fly)
                    34      17425  Escherichia coli
                    35      16906  Drosophila yakuba (Fruit fly)
                    36      16753  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    37      16720  Drosophila persimilis (Fruit fly)
                    38      16708  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    39      16257  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    40      16197  Drosophila sechellia (Fruit fly)
                    41      15960  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    42      15874  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    43      15716  Naegleria gruberi (Amoeba)
                    44      15675  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    45      15432  Drosophila willistoni (Fruit fly)
                    46      15251  Tetrahymena thermophila SB210
                    47      15154  Drosophila ananassae (Fruit fly)
                    48      14941  Drosophila erecta (Fruit fly)
                    49      14787  Drosophila mojavensis (Fruit fly)
                    50      14770  Anopheles gambiae (African malaria mosquito)
                    51      14762  Chlamydomonas reinhardtii
                    52      14705  Drosophila virilis (Fruit fly)
                    53      14673  Plasmodium chabaudi
                    54      14666  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    55      14275  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    56      13809  Candida albicans (Yeast)
                    57      13685  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    58      13472  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    59      13440  Aspergillus flavus 
                    60      13436  Schistosoma mansoni (Blood fluke)
                    61      12979  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    62      12732  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    63      12714  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    64      12520  Xenopus laevis (African clawed frog)
                    65      12434  Glycine max (Soybean)
                    66      12340  Polysphondylium pallidum PN500
                    67      12033  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    68      11886  Aspergillus oryzae
                    69      11801  Plasmodium berghei
                    70      11571  Trichoplax adhaerens
                    71      11569  Hepatitis C virus subtype 1b
                    72      11496  Brugia malayi (Filarial nematode worm)
                    73      10940  Sordaria macrospora
                    74      10898  Schistosoma japonicum (Blood fluke)
                    75      10868  Chaetomium globosum (Soil fungus)
                    76      10692  Podospora anserina
                    77      10663  Ralstonia solanacearum (Pseudomonas solanacearum)
                    78      10494  Aspergillus nidulans FGSC A4
                    79      10421  Neurospora crassa
                    80      10403  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    81      10335  Phaeodactylum tricornutum CCAP 1055/1
                    82      10279  Micromonas pusilla CCMP1545
                    83      10239  Plasmodium falciparum
                    84      10233  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    85      10185  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    86      10174  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    87      10115  Micromonas sp. RCC299
                    88      10094  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    89       9846  Bos taurus (Bovine)
                    90       9817  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    91       9666  Trypanosoma brucei gambiense DAL972
                    92       9649  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    93       9637  Cryptococcus neoformans (Filobasidiella neoformans)
                    94       9627  Helicobacter pylori (Campylobacter pylori)
                    95       9625  Aspergillus fumigatus (Sartorya fumigata)
                    96       9574  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    97       9540  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    98       9526  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    99       9469  Trypanosoma brucei
                    100       9360  Salmo salar (Atlantic salmon)
                    101       9295  Emericella nidulans (Aspergillus nidulans)
                    102       9244  Monosiga brevicollis (Choanoflagellate)
                    103       9195  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    104       9173  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    105       9122  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    106       9096  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    107       9028  Dictyostelium discoideum (Slime mold)
                    108       8978  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    109       8964  Thalassiosira pseudonana (Marine diatom)
                    110       8955  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    111       8913  Aspergillus clavatus
                    112       8912  Catenulispora acidiphila 
                    113       8775  Rhodococcus sp. (strain RHA1)
                    114       8720  Paracoccidioides brasiliensis (strain Pb18)
                    115       8708  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    116       8700  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    117       8603  Entamoeba dispar SAW760
                    118       8567  Plasmodium vivax
                    119       8523  Stigmatella aurantiaca DW4/3-1
                    120       8437  Plesiocystis pacifica SIR-1
                    121       8411  Rabies virus
                    122       8302  Entamoeba histolytica
                    123       8253  Streptomyces sviceus ATCC 29083
                    124       8249  Microscilla marina ATCC 23134
                    125       8201  Microcoleus chthonoplastes PCC 7420
                    126       8163  Frankia sp. EUN1f
                    127       8154  Burkholderia xenovorans (strain LB400)
                    128       8116  Picea sitchensis (Sitka spruce)
                    129       8098  Toxoplasma gondii GT1
                    130       8025  Leishmania infantum
                    131       7989  Pseudomonas aeruginosa
                    132       7980  Toxoplasma gondii ME49
                    133       7958  Ostreococcus tauri
                    134       7953  Rhodococcus opacus (strain B4)
                    135       7916  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    136       7891  Leishmania braziliensis
                    137       7860  Paracoccidioides brasiliensis (strain Pb03)
                    138       7857  Acaryochloris marina (strain MBIC 11017)
                    139       7838  Toxoplasma gondii VEG
                    140       7813  Plasmodium yoelii yoelii
                    141       7747  Uncinocarpus reesii (strain UAMH 1704)
                    142       7612  Bradyrhizobium japonicum USDA 110
                    143       7571  Clostridium hathewayi DSM 13479
                    144       7563  Burkholderia pseudomallei MSHR346
                    145       7520  Solibacter usitatus (strain Ellin6076)
                    146       7489  Streptomyces coelicolor
                    147       7475  Burkholderia pseudomallei 1710a
                    148       7465  Burkholderia pseudomallei Pakistan 9
                    149       7459  Burkholderia sp. H160
                    150       7397  Ostreococcus lucimarinus (strain CCE9901)
                    151       7379  Streptomyces sp. ACT-1
                    152       7367  Burkholderia pseudomallei 576
                    153       7349  Burkholderia pseudomallei 305
                    154       7310  Frankia sp. EuI1c
                    155       7274  Clostridium bolteae ATCC BAA-613
                    156       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    157       7237  Streptomyces avermitilis
                    158       7232  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    159       7225  Burkholderia sp. CCGE1002
                    160       7211  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    161       7179  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    162       7171  Medicago truncatula (Barrel medic)
                    163       7149  Giardia lamblia ATCC 50803
                    164       7140  Burkholderia pseudomallei 1106b
                    165       7132  Burkholderia phymatum (strain DSM 17167 / STM815)
                    166       7124  Rhizobium loti (Mesorhizobium loti)
                    167       7124  Burkholderia ambifaria MEX-5
                    168       7119  Leishmania major
                    169       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    170       7017  Myxococcus xanthus (strain DK 1622)
                    171       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    172       6979  Rhodopirellula baltica
                    173       6967  Frankia sp. (strain EAN1pec)
                    174       6943  Streptomyces sp. Mg1
                    175       6940  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    176       6923  Burkholderia ambifaria IOP40-10
                    177       6913  Saccharopolyspora erythraea (strain NRRL 23338)
                    178       6911  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    179       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    180       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    181       6862  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    182       6817  Clostridium asparagiforme DSM 15981
                    183       6772  Burkholderia pseudomallei (strain 1106a)
                    184       6744  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    185       6740  Burkholderia pseudomallei (strain 668)
                    186       6725  Burkholderia graminis C4D1M
                    187       6714  Rhizobium leguminosarum bv. viciae (strain 3841)
                    188       6712  Rhodococcus erythropolis SK121
                    189       6705  Chthoniobacter flavus Ellin428
                    190       6702  Streptomyces flavogriseus ATCC 33331
                    191       6692  Bacillus thuringiensis IBL 200
                    192       6684  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    193       6679  Mesorhizobium opportunistum WSM2075
                    194       6662  Burkholderia pseudomallei S13
                    195       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    196       6655  Bacillus thuringiensis IBL 4222
                    197       6644  Beggiatoa sp. PS
                    198       6627  Burkholderia cenocepacia (strain MC0-3)
                    199       6614  Burkholderia multivorans CGD2
                    200       6613  Burkholderia pseudomallei Pasteur 52237
                    201       6606  Burkholderia multivorans CGD2M
                    202       6583  Bacillus thuringiensis serovar sotto str. T04001
                    203       6579  Hepatitis C virus subtype 1a
                    204       6571  Streptococcus pneumoniae
                    205       6527  Burkholderia multivorans CGD1
                    206       6522  Sus scrofa (Pig)
                    207       6521  Streptomyces sp. ACTE
                    208       6514  Frankia alni (strain ACN14a)
                    209       6498  bacterium Ellin514
                    210       6497  Burkholderia cenocepacia (strain HI2424)
                    211       6488  Bacillus thuringiensis serovar monterrey BGSC 4AJ1
                    212       6463  Planctomyces maris DSM 8797
                    213       6462  Streptomyces clavuligerus ATCC 27064
                    214       6427  Agrobacterium radiobacter (strain K84 / ATCC BAA-868)
                    215       6417  Methylobacterium sp. (strain 4-46)
                    216       6413  Cyanothece sp. CCY0110
                    217       6390  Ustilago maydis (Smut fungus)
                    218       6388  Bradyrhizobium sp. (strain ORS278)
                    219       6379  Stackebrandtia nassauensis DSM 44728
                    220       6372  Micromonospora aurantiaca ATCC 27029
                    221       6357  Rhizobium meliloti (Sinorhizobium meliloti)
                    222       6356  'Nostoc azollae' 0708
                    223       6347  Micromonospora sp. L5
                    224       6336  Burkholderia ambifaria (strain MC40-6)
                    225       6322  Bacillus thuringiensis serovar thuringiensis str. T01001
                    226       6311  Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
                    227       6309  Hahella chejuensis (strain KCTC 2396)
                    228       6298  Bacillus thuringiensis Bt407
                    229       6294  Burkholderia pseudomallei 406e
                    230       6290  Nostoc punctiforme (strain ATCC 29133 / PCC 73102)
                    231       6288  Burkholderia pseudomallei 1655
                    232       6272  Labrenzia aggregata IAM 12614
                    233       6252  Clostridiales bacterium 1_7_47FAA
                    234       6242  Bacillus thuringiensis serovar berliner ATCC 10792
                    235       6237  Geobacillus sp. (strain Y412MC10)
                    236       6234  Rhodococcus erythropolis (strain PR4 / NBRC 100887)
                    237       6213  Candida tropicalis (strain ATCC MYA-3404 / T1) (Yeast)
                    238       6210  Burkholderia ambifaria (strain ATCC BAA-244 / AMMD) (Burkholderia cepacia 
                    239       6206  Paenibacillus sp. (strain JDR-2)
                    240       6185  Oryza sativa (Rice)
                    241       6184  Methylobacterium extorquens (strain ATCC 14718 / DSM 1338 / AM1)
                    242       6172  Methylobacterium radiotolerans (strain ATCC 27329 / DSM 1819 / JCM 2831)
                    243       6154  Ralstonia eutropha  (Cupriavidus necator 
                    244       6143  Burkholderia sp. CCGE1003
                    245       6129  Bacillus thuringiensis serovar israelensis ATCC 35646
                    246       6113  Gallus gallus (Chicken)
                    247       6110  Lyngbya sp. PCC 8106
                    248       6092  Azospirillum sp. B510
                    249       6092  Rhizobium leguminosarum bv. trifolii (strain WSM2304)
                    250       6081  Ralstonia metallidurans (strain CH34 / ATCC 43123 / DSM 2839)
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          209059 (  2%)
                    Bacteria        6542639 ( 61%)
                    Eukaryota       3016134 ( 28%)
                    Viruses          927418 (  9%)
                    Other             11221 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  71308 (  2%)           (  1%)
                    Other Mammalia        192085 (  6%)           (  2%)
                    Other Vertebrata      291020 ( 10%)           (  3%)
                    Viridiplantae         702644 ( 23%)           (  7%)
                    Fungi                 623495 ( 21%)           (  6%)
                    Insecta               409129 ( 14%)           (  4%)
                    Nematoda               61562 (  2%)           (  1%)
                    Other                 664891 ( 22%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  231002             1001-1100    63953
                    51- 100  843700             1101-1200    45320
                    101- 150  974321             1201-1300    31044
                    151- 200  942583             1301-1400    20611
                    201- 250  943645             1401-1500    16580
                    251- 300  912471             1501-1600    11998
                    301- 350  830795             1601-1700     8850
                    351- 400  646772             1701-1800     7018
                    401- 450  542736             1801-1900     5659
                    451- 500  454214             1901-2000     4750
                    501- 550  312554             2001-2100     3871
                    551- 600  239625             2101-2200     3978
                    601- 650  174883             2201-2300     3147
                    651- 700  136256             2301-2400     2490
                    701- 750  116918             2401-2500     2152
                    751- 800  104488             >2500        18900
                    801- 850   77861
                    851- 900   70215
                    901- 950   48574
                    951-1000   37309
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   322 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    13080628                1.22                                                    
                    Submitted to EMBL/GenBank/DDBJ   7678584   6766871      0.72                                                    
                    Journal                          5267311   4756379      0.49                                                    
                    Submitted to other databases       29295     29282     <0.01                                                    
                    Thesis                              7390      7333     <0.01                                                    
                    Book citation                       5124      5073     <0.01                                                    
                    Other                              92924     92507      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 291600
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                       8134170                0.76                                                    
                    CATALYTIC ACTIVITY                726838    669870      0.07     4                                              
                    CAUTION                          2739147   2739147      0.26     1                                              
                    COFACTOR                          224187    218656      0.02     8                                              
                    DOMAIN                              2952      2952     <0.01    11                                              
                    FUNCTION                          797142    733533      0.07     3                                              
                    INTERACTION                         4808      4808     <0.01    10                                              
                    MISCELLANEOUS                      19638     19636     <0.01     9                                              
                    PATHWAY                           261335    239971      0.02     7                                              
                    SIMILARITY                       2638884   2281266      0.25     2                                              
                    SUBCELLULAR LOCATION              453554    453507      0.04     5                                              
                    SUBUNIT                           265685    265681      0.02     6                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       3739770                0.35                                                    
                    CHAIN                             409777    323199      0.04     2                                              
                    NON_TER                          3063451   1813701      0.29     1                                              
                    SIGNAL                            265955    265955      0.02     3                                              
                    TRANSIT                              587       587     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             121353520               11.33                                                    
                    AGD                                 3870      3870     <0.01    69   Organism-specific databases                
                    ANU-2DPAGE                            58        58     <0.01    87   2D gel databases                           
                    ArachnoServer                         77        77     <0.01    86   Organism-specific databases                
                    ArrayExpress                       95055     95042      0.01    43   Gene expression databases                  
                    BRENDA                              2915      2845     <0.01    70   Enzyme and pathway databases               
                    Bgee                              108013    107906      0.01    40   Gene expression databases                  
                    BioCyc                            797371    772098      0.07    23   Enzyme and pathway databases               
                    CAZy                               36215     33861     <0.01    52   Protein family/group databases             
                    CGD                                 6804      6804     <0.01    64   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    92   2D gel databases                           
                    CTD                               152558    151850      0.01    38   Organism-specific databases                
                    CYGD                                   6         6     <0.01    91   Organism-specific databases                
                    DIP                                 2585      2580     <0.01    71   Protein-protein interaction databases      
                    EMBL                            11920305  10690211      1.11     3   Sequence databases                         
                    Ensembl                           385825    230877      0.04    28   Genome annotation databases                
                    EuPathDB                          151377    151377      0.01    39   Organism-specific databases                
                    FlyBase                           195512    193981      0.02    34   Organism-specific databases                
                    GO                              20871381   6367110      1.95     2   Ontologies                                 
                    Gene3D                           3289837   2781676      0.31    10   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01    95   Organism-specific databases                
                    GeneID                           4564683   4458995      0.43     7   Genome annotation databases                
                    Genevestigator                    103189    103178      0.01    41   Gene expression databases                  
                    GenoList                           14766     14493     <0.01    57   Organism-specific databases                
                    GenomeReviews                    3125850   3043396      0.29    11   Genome annotation databases                
                    Gramene                            69134     69134      0.01    44   Organism-specific databases                
                    H-InvDB                              542       442     <0.01    78   Organism-specific databases                
                    HAMAP                             420754    418908      0.04    26   Family and domain databases                
                    HGNC                               51899     50034     <0.01    48   Organism-specific databases                
                    HOGENOM                          2206097   2206026      0.21    15   Phylogenomic databases                     
                    HOVERGEN                          320810    320114      0.03    30   Phylogenomic databases                     
                    HSSP                              255428    255161      0.02    31   3D structure databases                     
                    IPI                               200935    200935      0.02    32   Sequence databases                         
                    InParanoid                        197684    197588      0.02    33   Phylogenomic databases                     
                    IntAct                             13497     13497     <0.01    59   Protein-protein interaction databases      
                    InterPro                        21406722   8108551      2.00     1   Family and domain databases                
                    KEGG                             4052815   3962990      0.38     9   Genome annotation databases                
                    LegioList                           5143      5115     <0.01    66   Organism-specific databases                
                    Leproma                              943       942     <0.01    77   Organism-specific databases                
                    MEROPS                             67363     66079      0.01    46   Protein family/group databases             
                    MGI                                37792     37539     <0.01    51   Organism-specific databases                
                    MINT                                4467      4467     <0.01    67   Protein-protein interaction databases      
                    NMPDR                             928493    928482      0.09    22   Genome annotation databases                
                    NextBio                            48520     48517     <0.01    49   Other                                      
                    OMA                              2439266   2439264      0.23    14   Phylogenomic databases                     
                    OrthoDB                           431097    431096      0.04    25   Phylogenomic databases                     
                    PANTHER                          1688828   1591476      0.16    19   Family and domain databases                
                    PDB                                11696      6981     <0.01    61   3D structure databases                     
                    PDBsum                              5317      3111     <0.01    65   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    83   2D gel databases                           
                    PIR                               176508    143641      0.02    37   Sequence databases                         
                    PIRSF                             561463    561463      0.05    24   Family and domain databases                
                    PMAP-CutDB                           268       268     <0.01    80   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    93   2D gel databases                           
                    PRIDE                             100657    100657      0.01    42   Proteomic databases                        
                    PRINTS                           1717735   1507593      0.16    18   Family and domain databases                
                    PROSITE                          5171938   3418645      0.48     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    90   Enzyme and pathway databases               
                    PeptideAtlas                         148       148     <0.01    82   Proteomic databases                        
                    PeroxiBase                          2286      2280     <0.01    73   Protein family/group databases             
                    Pfam                            10250021   7645561      0.96     4   Family and domain databases                
                    PharmGKB                              88        88     <0.01    85   Organism-specific databases                
                    PhosphoSite                         1730      1730     <0.01    75   PTM databases                              
                    PhylomeDB                         374154    374122      0.03    29   Phylogenomic databases                     
                    ProDom                            195280    184393      0.02    36   Family and domain databases                
                    ProMEX                               457       457     <0.01    79                                              
                    ProtClustDB                      2625462   2625445      0.25    13   Phylogenomic databases                     
                    PseudoCAP                           4349      4346     <0.01    68   Organism-specific databases                
                    REBASE                              7656      7389     <0.01    63   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   97        96     <0.01    84   2D gel databases                           
                    RGD                                14135     14128     <0.01    58   Organism-specific databases                
                    Reactome                              56        53     <0.01    88   Enzyme and pathway databases               
                    RefSeq                           4633471   4517004      0.43     6   Sequence databases                         
                    SGD                                  250       250     <0.01    81   Organism-specific databases                
                    SMART                            2112033   1646499      0.20    16   Family and domain databases                
                    SMR                              3056355   3056345      0.29    12   3D structure databases                     
                    STRING                           1204728   1204584      0.11    20   Protein-protein interaction databases      
                    SUPFAM                           4401599   3615332      0.41     8   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    89   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    94   2D gel databases                           
                    TAIR                               19410     19327     <0.01    56   Organism-specific databases                
                    TCDB                                2208      2189     <0.01    74   Protein family/group databases             
                    TIGR                              195291    188238      0.02    35   Genome annotation databases                
                    TIGRFAMs                         2028455   1855792      0.19    17   Family and domain databases                
                    TubercuList                         2321      2315     <0.01    72   Organism-specific databases                
                    UCSC                               53654     53574      0.01    47   Genome annotation databases                
                    UniGene                           397721    364609      0.04    27   Sequence databases                         
                    VectorBase                         47592     47124     <0.01    50   Genome annotation databases                
                    World-2DPAGE                         947       942     <0.01    76   2D gel databases                           
                    WormBase                           19550     19454     <0.01    55   Organism-specific databases                
                    WormPep                            19558     19454     <0.01    54   Organism-specific databases                
                    Xenbase                            12900     12531     <0.01    60   Organism-specific databases                
                    ZFIN                               20297     20292     <0.01    53   Organism-specific databases                
                    dictyBase                           8176      8175     <0.01    62                                              
                    eggNOG                           1150429   1150429      0.11    21                                              
                    euHCVdb                            68440     68439      0.01    45                                              
                    
                    Number of explicitly cross-referenced databases: 125
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.56   Gln (Q) 3.88   Leu (L) 9.81   Ser (S) 6.72
                    Arg (R) 5.47   Glu (E) 6.14   Lys (K) 5.31   Thr (T) 5.61
                    Asn (N) 4.18   Gly (G) 7.08   Met (M) 2.45   Trp (W) 1.31
                    Asp (D) 5.29   His (H) 2.20   Phe (F) 4.04   Tyr (Y) 3.06
                    Cys (C) 1.29   Ile (I) 6.02   Pro (P) 4.74   Val (V) 6.72
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.04
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Lys, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 308779
                    Total number of entries encoded on a Plasmid: 162346
                    Total number of entries encoded on a Plastid: 10344
                    Total number of entries encoded on a Plastid; Apicoplast: 334
                    Total number of entries encoded on a Plastid; Chloroplast: 113441
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 441