Database of p53 somatic mutations in human tumors and cell lines ---------------------------------------------------------------- Release July 1995 M. Hollstein, C. Rice, M.S. Greenblatt, T. Soussi, R. Fuchs, T. Sorlie, E. Hovig, B. Smith-Sorensen, R. Montesano and C.C. Harris German Cancer Research Center, Heidelberg (MH), International Agency for Research on Cancer, Lyon (TS, RM), EMBL Heidelberg (CR, RF), Hopital Saint Louis, Paris (TS), National Cancer Institute, Bethesda (MSG, CCH), Norwegian Radium Hospital, Oslo (EH, BSS). Nucleic Acids Research, submitted. ---------------------------------------------------------------------- Release Information ------------------- The July 1995 release of the database containes 4496 entries. Content of the database ----------------------- This database is a compilation of p53 mutations in human tumor cells and cell lines from a systematic search of reports published before 1 January 1994. These mutations were identified by DNA sequencing of PCR-amplified material or cloned PCR products. Preliminary screening for mutations by techniques such as those employing SSCP or DGGE/CDGE (reviewed in Rossiter & Caskey, 1990; Grompe, 1993) were often performed. Most analyses were confined to exons 5-8, since early studies noted that mutations occurred primarily in this evolutionarily conserved midregion. A bias against identification of DNA sequence alterations outside this mutation cluster region can thus be expected. If the same mutations were published in more than one article, only one report is referenced, either the first or the most complete report, and the data are only entered once in the database. If the identical mutation was found in two separate samples from the same patient, for example in the primary tumor and in the metastatic tissue, the mutation is considered to be a single event and is entered only once. Tandem mutations, i.e. two adjacent base substitutions, are also considered as one mutation event and are entered together; therefore there will be only one identification number (see below) for this mutation pair. Discrepancies in published reports that are clearly due to typographical errors or that can be explained by other information in the publication have been corrected. In this case, or if there are uncorrected errors or ambiguities regarding a mutation record, the letter 'e' appears in column M (see below). Information that does not permit us to identify the nature and location of the mutation has not been entered. Mutations found by digestion of DNA with a restriction enzyme and demonstration of an RFLP are not entered; however, publications reporting such data will be cited in the electronic version as second appendix. Mutations identified in tumors are presumed to be somatic unless 1) analysis of normal tissue from the same patient demonstrated that the mutation was constitutional in that individual, or 2) the mutation corresponded to one of the known constitutional polymorphisms of the human p53 gene (at codons 21, 31, 47, 72, and 213), as these are unlikely to be mutations that arose in the tumors. Germline mutations, including those identified in families with the Li-Fraumeni cancer syndrom are not in this database. Distribution formats -------------------- The data are provided to the scientific community in two different formats. First, the database is available as an Excel spreadsheet which requires the use of the Microsoft Excel program on either an MS-DOS system or an Apple Macintosh. Second, the data have been converted into a flatfile format modeled onto the standard used by the EMBL nucleotide sequence database. In this format the data are stored in a normal text file with each column of the spreadsheet represented by a special line type. The flatfile format can be used on any computer system and with standard text editors. The database can be obtained from the EBI network servers in the following ways: * Anonymous ftp to: ftp.ebi.ac.uk under /pub/databases/p53 * Through the WWW server at: http://www.ebi.ac.uk/ * Through gopher at: gopher.ebi.ac.uk (port 70) * send an email message to: netserv@ebi.ac.uk and include the line "help p53". Excel format description ------------------------ Each row represents a single tumor mutation with an arbitrarily assigned unique number in in column A. The columns contain the following information and abbreviations: Column A: Unique mutation identification number Column B: Codon number at which the mutation is located (1-393). If a tandem dinucleotide mutation spans two codons, both codons are entered. If other mutations span more than one codon, e.g. there is a deletion of several bases, only the first (5') codon is entered. If the mutation is located at intron sequences this is indicated by 'intron' and intron number. Column C: Normal and mutated base sequence of the codon in which the mutation occurred. If the mutation is a base pair deletion or insertion this is indicated by 'del' or 'ins'. Column D: Nucleotide position at which the mutation is located (1-1179), numbered from the ATG to the termination codon. Column E: Base change, read from the coding strand by convention, for base substitutions. For deletions (indicated by '-') and insertions (indicated by '+') the number of bases deleted or inserted is given in parenthesis. Column F: The name or number given by the authors to the tumor sample or cell line is entered here. If the name is not sufficiently distinctive, e.g. if the publication referes to samples 1,2,3, etc., then we have assigned a name, usually the first letters of the first author's name, followed by the number in the series. If more than one mutation has been found in the same sample, the tumor name in column F is suffixed with an apostrophe. Column G: Anatomical site or type of the tumor as described in the publication cited. Abbreviations used in this column are: HCC, hepatocelluar carcinoma, Leuk/Lym, leukemias and lymphomas. Column H: Reference number (1-312). The full citation is given as a separate file. Column I: This column contains notes regarding the tumor or the patient, such as histological type of tumor, exposure history or other clinical parameters emphasized by authors reporting the mutations. The terminology used by the authors has been retained and therefore may not be uniform. Pre-cancer lesions are also included, e.g. dysplasia. Abbreviations of tumor subtype or cell type are as follows: SCLC, small cell lung cancer; adenoca, adenocarcinoma; osteo, osteosarcoma; rhab, rhabdomyosarcoma; leiomyo, leiomyosarcoma; eryth, erythroleukemia; medull, medulloblastoma; SCC, squamous cell carcinoma; TCC, transitional cell carcinoma; hypoph, hypopharynx; NPC, nasopharyngeal carcinoma. For abbreviations of leukemia and lymphoma subclassifications, e.g., ATL (adult T-cell leukemia), refer to cited reference. Uniformity of these abbreviations in the different reports has not been verified. Other abbreviations: UC, ulcerative colitis; FAP familial adenoma polyposis; XP, xeroderma pigmentosum; HPV, tumor harbors human papilloma virus DNA (HPV+), or lacks virus DNA (HPV-); diff or undiff, (un)differentiated tumor; CIS, carcinoma in situ; premal, premalignant. Other information: 1) "metastatis" specifies that the DNA analyzed for the mutation was obtained from metastatic tissue. The primary tumor is in column G. 2) exposure history: tobacco smoke; radon gas. Column J: An entry 'L' indicates the material examined was from a tumor cell line. If there is no entry, the material is from tissue tumor or biopsy (most instances), or xenograft, or unspecified. Column K: Mutations that are transitions of CpG dinucleotides, i.e. CpG to TpG or CpG to CpA, are designated by 'yes'. If there is no entry, the mutation does not fall into this category. Column L: Amino acid substitution. Chain termination mutations due to single base substitutions are designated by '(amino acid)->stop'. Frameshift mutations are designated by 'frameshift', whereas in-frame deletions and insertions are designated 'deletion' or 'insertion'. Mutations that do not result in an amino acid change are designated by 'silent'. Mutations that occurred in intron sequences are indicated by the term 'splicing' even though in most instances it was not determined whether splicing errors did result from the mutation; some of these mutations are likely to be phenotypically silent. Column M: If the information on the nature or location of the mutation in the reference is ambiguous or contradictory, the letter 'e' appears in this column. Where possible we have made a presumptive correction of the published discrepancy in the database entry. Examples: A B C D E F G H I J K L M 466 7 GAT->CAT 19 G->C N16 Skin 54 SCC Asp->His 1207 152 CCG->CTG C->T HTC/C3 Thyroid 149 L yes Pro->leu Flatfile format --------------- Each database entry consists of a series of lines, each one tagged by a two-character identifier separated from the text of the line by three blanks. The mapping of line contents to the columns in the Excel format is indicated. ID (1 per entry) IDentifier; contains mutation id (column A) DC (0 or 1 per entry) Data Correctness. If the report is ambiguous or incorrect (column M), the line "DC ambiguous" is added to the entry. CD (1 per entry) CoDon change; this line has three semicolon-separated fields of closely related information, terminated by a period: the codon number (column B); the codon change (column C); the amino acid change (column L). If any field is not known, a question mark is substituted. BC (1 per entry) Base Change; this line has two semicolon-separated fields of closely related information, terminated by a period: the nucleotide position (column D); the base change (column E). If any field is not known, a question mark is substituted. CT (0 or 1 per entry) CpG Transition; optional line. If a CpG transition occurred (column K), the line "CT yes" appears. TS (0 or 1 per entry) Tumor Specifics; this line has three semicolon-separated fields of closely related information, terminated by a period: the tumor name (column F); the tumor source (column G); tumor cell line (column J). If any field is not known, a question mark is substituted. If the source is a tumor cell line, the third field is "Y", otherwise "N". CC (0 or 1 per entry) Comments; allows free text comments and keeps contents of column I. RN (1 per entry) Reference Pointer; contains cross-reference to literature reference file (column H). // (1 per entry) Marks end of entry. Examples: ID 466 CD 7; GAT->CAT; Asp->His. BC 19; G->C. TS N16; Skin; N. CC SCC RP 54 // ID 1207 CD 152; CCG->CTG; Pro->Leu. BC 455; C->T. CT yes TS HTC/C3; Thyroid; Y. RP 149 // Updates ------- This compilation of p53 mutations is to provide the scientific community with a database of rapidly accumulating data that can be useful to various disciplines in cancer research, including epidemiology, medicine and basic science. Future versions of the database may include separate sections on germline mutations, mutations detected by RFLP, anamnestic data on patients, and standardization of terminology with the International Classification of Diseases for Oncology (ICD-O). Notifications of omissions and errors of the current version would be gratefully received by the authors. When individual records in the present version require correction they will be revised and the date of revision will be noted in a new column, column N. Data published in th first six months of 1994 will be added at regular intervals during the second half of the year. References ---------- Caron de Fromentel, C. and Soussi, T. (1992) Genes Chromosomes Cancer 4, 1-15. Donehower, L.A. and Bradley, A. (1993) Biochim. Biophys. Acta 1155, 181-207. Greenblatt, M.S., Bennet, W.P., Hollstein, M.C. and Harris, C.C. (1994) Cancer Res. (in press). Grompe, M. (1993) Nature genetics 5, 111-117. Harris, C.C. and Hollstein, M. (1993) New Engl. J. Med. 329, 1318-1327. Hollstein, M.C, Sidranskym D., Vogelstein, B. and Harris, C.C. (1991) Science 253, 49-53. Jones, P.A., Buckley, J.D., Henderson, B.E., Ross, R.K. and Pike M.C. (1991) Cancer Res. 51, 3617-3620. Kunkel, T.A. (1990) Biochemistry 29, 8003-8011. Kunkel, T.A. (1993) Nature 365, 207-208. Levine, A.J. (1993) Annu. Rev. Biochem. 62, 623-651. Lindahl, T. (1993) Nature 362, 709-715. Mellon & Hanawalt (1989) Nature 342, 95-998. Rice, C., Fuchs, R., Higgins, D.G., Stoehr, P.S. and Cameron, G.N. (1993) Nucl. Acids Res. 21, 2967-2971. Rossiter, B.J.F. and Caskey, C.T. (1990) J. Biol. Chem. 265, 12753-12756. Selby & Sancar (1993) Science 260, 53-58. Takeshima, S., Seyama, T., Bennett, W.P. et al. (1991) Lancet 342, 1520-2521.