<< All Back-issues
<< This Issue's Table of Contents
ILAR Journal V39(2/3) 1998
Comparative Gene Mapping
| Greg Elgar, Ph.D., is Group Leader, and Melody Clark, Ph.D., is Senior Scientist with the United Kingdom Human Genome Mapping Project Resource Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom. |
Sequence scanning is an extremely economical and efficient way to assay the contents of a genome, particularly in gene-dense genomes such as that of Fugu. As a resource, its value greatly and continually increases because other sequence data entered independently into a database may shed some light (by homology) on already existing data. In addition, sequence scanning data do not have to be extremely accurate. No complex sequence assembly is necessary, it is not highly degenerate, and the databases are extremely tolerant of mistakes in sequencing, still allowing the identification of significant homologies.
A well-characterized, gridded Fugu cosmid library is publicly available and provides a common reference point for all clones isolated (http://hgmp.mrc.ac.uk/). Cosmids are sequence scanned by generating random shotgun libraries (by sonication) and sequencing approximately 50 clones from each cosmid library. This methodology provides single-pass sequence data for about 50% of the cosmid insert, in turn allowing identification of all but the smallest genes on the cosmid. For example, sequence scanning would be expected to identify 3 of 6 exons, assuming that homology (particularly at the amino acid level) is sufficient. Sequences are searched using the Basic Local Alignment Search Tool ("BLAST") (Altschul and others 1997) periodically against the SWISSPROT (Bairoch and Apweiler 1997), TREMBL (Bairoch and Apweiler 1997), and EMBL (Rodriguez-Tomi and others 1996) databases. All Fugu sequences in the Fugu database are being submitted to EMBL as GSS clones and all data from the project are publicly available on the World Wide Web (Web1).
Because data are stored on the Web page by cosmid clone and because most cosmids appear to contain more than 1 gene, it is possible to assign close physical linkage to genes in the Fugu genome and then to examine the relationship between these genes in other organisms. Finally, it is possible to assess potential regions of conserved synteny between Fugu and other genomes and to identify areas for more intensive study.
SCIENTIFIC CONTRIBUTIONS OF THE MAP
The Fugu cosmid library was first made available to the scientific community through the Fugu Landmark Mapping Project in 1996. Since then, interest in Fugu has expanded with the generation of numerous collaborative projects. At the time of this writing, Fugu projects cover a wide spectrum of studies that include the following:
ANTICIPATED FUTURE CONTRIBUTIONS OF THE MAP
Sample sequencing produces an overall picture of gene content and organization in Fugu. However, limitations include the lack of absolute gene order and the ability to correlate the data directly to traditional maps as well as the fact that genes are present only in fragments. We expect sequence scanning of the genome to continue; however, the technique will be driven largely by the requirements of collaborators who seek to identify particular regions for more intensive study rather than by in-house projects.
To extend some of the observations and speculations related to the phenomenon of conserved gene order between mammals and Fugu, some cosmid contigs are being built and sequence scanned, usually in collaboration with groups working on the equivalent human gene region. The ultimate objective is to progress to whole cosmid sequencing and high-level resolution comparative mapping.
Whole cosmid sequencing projects in Fugu have already shown that novel genes can be identified in Fugu syntenic regions (Trower and others 1996). The number of known mammalian genes is small compared with the total number present; hence, Fugu sequencing will inevitably move toward gene discovery, hypothetical genes analysis, and functional analysis. Genomic comparisons across large regions may allow the identification of conserved elements involved in the regulation of gene expression and any other sequences that are critically conserved across 400 million yr of evolution.
USES OF THE MAP AND ACCESSIBILITY
All data from the Fugu genome project are available on our Web site (http://fugu.hgmp.mrc.ac.uk/), which consists of a series of interlinked pages. These pages are designed to allow easy public access to meaningful data as soon as they become available. There may be only a few hours' delay between a sequence being loaded onto an automated sequencer and its appearance on the Web. Users may enter the database by a search of DNA, amino acid sequences, or keywords from a Basic Local Alignment Search Tool output; by requesting information on a specific cosmid or cosmid clone; or by viewing the overall statistics of the project. Other available features on the Web pages include general information about the fish and a complete, up-to-date set of project protocols. A flow chart overview of the Web site links appears in Figure 2.
Users may also download the entire database of sequences as a fiat file from the Web, although the amount of data makes this a major task. At the time of this writing, all Fugu sequences are transferred into the GSS section of the EMBL Data Library, with limited annotation (in collaboration with the European Bio-informatics Institute at Hinxton, Cambridge, United Kingdom) so that they may be searched in the same way as the rest of the public databases.
The easiest way for biologists to access data of particular interest is to carry out a keyword search. All cosmid clones subjected to database searches are sorted into "significant" hits using a program called maximal segment pair ("MSP") crunch, which was developed by Eric Sonnhammer in the Sanger Centre at Hinxton. The keyword search scans the MSP crunch output and then lists any hits by cosmid clone. Ways to access the relevant information produced by the search are shown in Figure 3.
It is important to carefully evaluate alignments before assuming that a particular gene has been found. Many of the EMBL (DNA) similarities are due to repeat sequences (such as microsatellites) from within the genomic sequence rather than being gene specific. Generally, amino acid similarities (from SWISSPROT and TREMBL searches) are more significant and are easier to interpret than those between DNA. This difference is partly due to the decreased likelihood of finding identical residues in amino acid sequences (20 amino acids versus 4 bases) in the same position resulting in less noise and partly because of the DNA degeneracy of the genetic code, which allows DNA sequence of only 33% identity code for identical amino acids (such as both AGA and CGG as codes for arginine). It is also important to be careful when searching for genes that are members of large gene families because short isolated sequence fragments are not always sufficient to allow differentiation between similar genes. Relative mapping data from another identifiable gene on the same cosmid may indicate the origin of the gene family member.
CONCLUSION
The sequence scanning approach has been successfully applied in Fugu to provide an overview of the genome. More intensive study in particular regions using whole cosmid sequencing will not only produce data on absolute gene order and conservation of gene structure but also will help in identifying regulatory elements. An increasing number of genes are being sequenced in both zebrafish and other fish species (such as carp and medaka), which will produce meaningful fish maps. In addition, with the emergence of zebra fish genetics, advantages of Fugu genomics can be effectively combined with zebra fish genetics in gene finding, sequencing, and functional characterization.
1Abbreviations used in this paper: EMBL, European Molecular Biology Laboratory; Web, World Wide Web.
ACKNOWLEDGMENTS
The Fugu Landmark Mapping Project is funded by the Medical Research Council. We particularly thank Stephen Meek, Sarah Smith, and Sarah Warner, who, with the authors, generated most of the data; and Gary Williams, Yagnesh Umrania, Yvonne Edwards, and Martin Bishop of the UK Human Genome Mapping Project Resource Centre Computing Department, without whom the construction of the Web pages and the analysis would not have been possible.
REFERENCES
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25:3389-3402.
Aparicio S, Hawker K, Cottage A, Mikawa Y, Zuo L, Venkatesh B, Chen E, Krumlauf R, Brenner S. 1997. Organization of the Fugu rubripes Hox clusters: Evidence for continuing evolution of vertebrate Hox complexes. Nat Genet 16:79-83.
Aparicio S, Morrison A, Gould A, Gilthorpe J, Chaudhuri C, Rigby P, Krumlauf R, Brenner S. 1995. Detecting conserved regulatory elements with the model genome of the Japanese Puffer Fish, Fugu rubripes Proc Natl Acad Sci U S A 92:1684-1688.
Bairoch A, Apweiler R. 1997. The SWISS-PROT protein sequence data bank and its supplement TREMBL. Nucleic Acids Res 25:31-36.
Baxendale S, Abdulla S, Elgar G, Buck D, Berks M, Micklem G, Durbin R, Bates G, Brenner S, Beck S, Lehrach H. 1995. Comparative sequence-analysis of the human and pufferfish Huntington's-disease genes. Nat Genet 10:67-76.
Brenner S, Corrochano LM. 1996. Translocation events in the evolution of aminoacyl-transfer-RNA synthetases. Proc Natl Acad Sci U S A 93:8485-8489.
Brenner S, Elgar G, Sandford R, Macrae A, Venkatesh B, Aparicio S. 1993. Characterisation of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature 366:265-268.
Cecconi F. Crosio C, Mariottini P, Cesareni G, Giorgi M, Brenner S, Amaldi F. 1996. A functional role for some Fugu introns larger than the typical short ones--The example of the gene coding for ribosomal protein S7 and snoRNA U 17. Nucleic Acids Res 24:3167-3172.
Cecconi F, Mariottini P, Amaldi F. 1995. The Xenopus intron-encoded U I7 snoRNA is produced by exonucleolytic processing of its precursor in oocytes. Nucleic Acids Res 23:4670-4676.
Crosio C, Cecconi F, Mariottini P, Cesareni G, Brenner S, Amaldi F. 1996. Fugu intron oversize reveals the presence of U15 snoRNA coding sequences in some introns of the ribosomal protein S3 gene. Genome Res 6:1227 1231.
Edwards YJK, Elgar G, Clark MS, Bishop MJ. 1998. The identification and characterization of microsatellites in the compact genome of the Japanese Pufferfish, Fugu rubripes: Perspectives in functional and comparative genomic analyses. J Mol Biol 278:843-854.
Elgar G. 1996. Quality not quantity--The pufferfish genome. Hum Mol Genet 5:1437-1442.
Elgar G, Clark M, Green A, Sandford R. 1997. How good a model is the Fugu genome? (Scientific Correspondence). Nature 387: 140.
Elgar G, Rattray F, Greystrong J, Brenner S. 1995. Genomic structure and nucleotide-sequence of the p55 gene of the puffer fish Fugu rubripes. Genomics 27:442-446.
Elgar G, Sandford R, Aparicio S, Macrae A, Venkatesh B, Brenner S. 1996. Small is beautiful--Comparative genomes with the pufferfish (Fugu rubripes). Trends Genet 12:145-150.
Gilley J, Armes N, Fried M. 1997. Fugu genome is not a good mammalian model. (Scientific Correspondence). Nature 385:305-306.
How GF, Venkatesh B, Brenner S. 1996. Conserved linkage between the puffer fish (Fugu rubripes) and human genes for platelet-derived growth factor receptor and macrophage colony-stimulating factor receptor. Genome Res 6:1185-1191.
Koh CG, Oon SH, Brenner S. 1997. Serine/threonine phosphatases of the pufferfish, Fugu rubripes. Gene 198:223-228.
Lim EH, Brenner S. 1995. Sequence-analysis of MHC class-II beta-like fragments in the pufferfish Fugu rubripes, Immunogenetics 42:432-433.
Lim EH, Brenner S. 1997. Short-range linkage relationships of the Valyl-tRNA synthetase gene in Fugu rubripes, Immunogenetics 46:332-336.
Lim EH, Corrochano LM, Elgar G, Brenner S. 1997. Genomic structure and sequence analysis of the valyl-tRNA synthetase gene of the Japanese pufferfish, Fugu rubripes. DNA Sequence 7:141-15 I.
Macrae AD, Brenner S. 1995. Analysis of the dopamine-receptor family in the colnpact genome of the puffer fish Fugu rubripes Genomics 25:436-446.
Marshall H, Studer M, Popperl H, Aparicio S, Kuroiwa A, Brenner S, Krumlauf R. 1994. A conserved retinoic acid response element required for early expression of the homeobox gene Hoxb-1. Nature 370:567-571.
Mason PJ, Stevens DJ, Luzzatto L, Brenner S, Aparicio S. 1995. Genomic structure and sequence of the Fugu rubripes glucose-6-phosphate-dehydrogenase gene (G6PD). Genomics 26:587-591.
McNaughton JC, Hughes G, Jones WA, Stockwell PA, Klamut H J, Petersen GB. 1997. The evolution of an intron: Analysis of a long, deletion-prone intron in the human dystrophin gene. Genomics 40:294-304.
Medrano L, Bernardi G, Couturier J, Dutrillaux B, Bernardi G. 1988. Chromosome banding and genome compartmentalisation in fishes. Chromosoma 96:178 183.
Miles C, Elgar G, Coles E, Kleinjan D-J, van Heyningen V, Hastie N. 1998. Complete sequencing of the Fugu WAGR region from WT1 to PAX6--Dramatic compaction and conservation of synteny with human chromosome 11p13. Proc Natl Acad Sci U S A (Forthcoming).
Miyaki K, Tabeta O, Kayano H. 1995. Karyotypes of six species of puffer-fishes genus Takifugu (Tetraodontidae, Tetraodontiformes). Fisheries Sci 61:594-598.
Popperl H, Bienz M, Studer M, Chan S-K, Aparicio S, Brenner S, Mann RS, Krumlauf R. 1995. Segmental expression of Hoxb-I is controlled by a highly conserved autoregulatory loop dependent upon exd/pbx. Cell 81:1031~1042.
Rodriguez-Tomi P, Stoehr PJ, Cameron GN, Flores, TP. 1996. The European Bioinformatics Institute (EBI) databases. Nucleic Acids Res 24:6-13.
Sandford R, Burn T, Vaudin M, Elgar G, Brenner S. 1996a. The Fugu genome A novel tool for transcription mapping. (Abstract). Cytogenet Cell Genet 72:20.
Sandford R, Sgotto B, Aparicio S, Brenner S, Vaudin M, Wilson RK, Chissoe S, Pepin K, Bateman A, Chothia C, Hughes J, Harris P. 1997. Comparative analysis of the polycystic kidney disease 1 (PKD1) gene reveals an integral membrane glycoprotein with multiple evolutionary conserved domains. Hum Mol Genet 6:1483-1489.
Sandford R, Sgotto B, Burn T, Brenner S. 1996b. The tuberin (TSC2), autosomal-dominant polycystic kidney disease (PKD1) and somatostatin type-V receptor (SSTR5) genes form a synteny group in the Fugu genome. Genomics 38:84-86.
Sarwal MM, Sontag JM, Hoang L, Brenner S, Wilkie TM. 1996. G protein alpha subunit multigene family in the Japanese puffer fish Fugu rubripes: PCR from a compact vertebrate genome. Genome Res 6:1207-1215.
Sathasivam K, Baxendale S, Mangiarini L, Bertaux F, Hetherington C, Kanazawa l, Lehrach H, Bates GP. 1997. Aberrant processing of the Fugu HD (FrHD) mRNA in mouse cells and in transgenic mice. Hum Mol Genet 6:2141-2149.
Schofield JP, Elgar G, Greystrong J, Lye G, Deadman R, Micklem G. King A, Brenner S, Vaudin M. 1997. Regions of human chromosome 2 (2q32-q35) and mouse chromosome 1 show synteny with the pufferfish genome (Fugu rubripes). Genomics 45:158-167.
Timon M, Elgar G, Habu S, Okumura K, Beverley PCL. 1998. Molecular cloning of major histocompatibility complex class I cDNAs from the pufferfish Fugu rubripes, Immunogenetics 47:170-173.
Trower MK, Orton SM, Purvis IJ, Sanseau P, Riley J, Christodoulou C, Burt D, See CG, Elgar G, Sherrington R, Rogaev El, St. George-Hyslop P, Brenner S, Dykes CW. 1996. Conservation of synteny between the genome of the pufferfish (Fugu rubripes) and the region on human-chromosome-14 (14q24.3) associated with familial Alzheimer-disease (AD3 Locus). Proc Natl Acad Sci U S A 93:1366-1369.
Venkatesh B, Brenner S. 1997. Genomic structure and sequence of the pufferfish (Fugu rubripes) growth hormone-encoding gene: A comparative analysis of teleost growth hormone genes. Gene 187:211-215.
Venkatesh B, Sihoe SL, Murphy D, Brenner S. 1997. Transgenic rats reveal functional conservation of regulatory controls between thc Fugu isotocin and rat oxytocin genes. Proc Nail Acad Sci U S A 94:12462-12466.
Venkatesh B, Tay BH, Elgar G, Brenner S. 1996. Isolation, characterization and evolution of 9 pufferfish (Fugu rubripes) actin genes. J Mol Biol 259:655-665.
Verma-Kurvari S, Johnson JE. 1997. Identification of an achaete-scule homolog, Fashl, from Fugu rubripes. Gene 200:145-148.
Vogel G. 1998. Doubled genes may explain fish diversity. Science 281:1119-1121.
Yamaguchi F, Brenner S. 1997. Molecular cloning of 5-hydroxytryptamine (5-HT) type 1 receptor genes from the Japanese puffer fish, Fugu rubripes. Gene 191:219-223.
Yamaguchi F, Macrae AD, Brenner S. 1996. Molecular-cloning of 2 cannabinoid type 1-like receptor genes from the puffer fish Fugu rubripes. Genomics 3:603-605.
Yeo, GSH, Elgar G, Sandford R, Brenner S. 1997. Cloning and sequencing of complement component C9 and its linkage to DOC-2 in the pufferfish Fugu rubripes. Gene 200:203-211.
TABLE 1 Physicially linked genes in Fugua
| Physically linked genes in Fugu | Distance apart (kb) | Human chromosome assignment |
| CNR1 GABRR1 | Both on 1 cosmid (1) | 6q14-q15 6q14-q21 |
| TH NAP2 IGFII | Within 10 kb of each other (2) | 11p15.5 11p 11p15.5 |
| PAH IGFI | Within 20 kb of each other (3) | 12q22 12q22-q24 |
| TSC2 ADPKDI | Within 1 kb of each other (4) | 16p13.3 16p13.3 |
| WNT1 ERBB3 ARF3 WNT10B | All on 1 cosmid (5) | 12q13 12q13 12q13 12q13 |
| FOS S31iii125 S20i15 7SL RNA gene ATF3 fos-like gene DLST | All on 1 cosmid (6) | 14q24.3 14q24.3 14q24.3 Unassigned Unassigned Unassigned 14q24. |
| HOXB cluster (9 genes) | 90 kb (7) | 17q21-22 |
| HOXC cluster (9 genes) | 66 kb (8) | 12q12-q13 |
| Immunogobulin cluster VH genes (at least 6 genes) | All on 1 cosmid (9) | 14q32.3 |
| SURF2 SURF4 ASS DNM1 GOLGA2 | All on 1 cosmid (10) | 9q34 9q34 9q34 9q34 9q32-q34.1 |
| SRD5A1 ADCY2 | 152B12 | 5p15 5p15.2-15.1 |
| WT1 RCN1 PAX6 | All on 1 cosmid (11) | 11p13 11p13 11p13 |
| MMP2 SLC6A2 | 011A17 | 16q12.2 16q12.2 |
| NTRKR3 RGS2 | 026G17 | 1q21-23 1q31 |
| TOP1 PLCG1 KIAA0181 | All on 1 cosmid (12) | 20q12-q13 20q12-q13 20q12-q13 |
| C4A CSNK2B CYP21 AIF1 | All on 1 cosmid (13) | 6p21.3 6p21.3 6p21.3 6p21.3 |
| C8A C8B | 126C21 | 1p32 1p32 |
| MTF1 IT5-P | 017D12 | 1p32 1p32 |
| CPS1 MAP2 MYL1 | All on 1 cosmid (14) | 2q135 2q34 2q33-q34 |
| PDGFRB CSF1R | 2.2 kb apart (15) | 5q33 5q33 |
TABLE 2 Evolutionary breakpoints in Fugua
| Physically linked genes in Fugu | Distance apart (kb) | Human chromosome assignment |
| G6PD | All in 1 60-kb contig (1) | Xq28 |
| GPD1 ADCY6 CACNB3 | Breakpoint | 12 12q13-q13 12q3 |
| C9 DOC2 | All on 1 cosmid (2) | 5p13 5013 |
| GAS1 FBP1 | Breakpoint | 9q21.3-q22.1 9q22 |
| MSH2 PIGF MTA1 | 042H13 | 2p16-p15/2p22-p2 2p21-16 unassigned |
| GCH1 | Breakpoint | 14q22 |


FIGURE 2 Links between Fugu Web pages.

FIGURE 3 Example of Fugu Web page data from a keyword search.
Copyright © 2008. National Academy of Sciences.
All rights reserved.
500 Fifth St. N.W., Washington, D.C. 20001.
Terms of Use and Privacy Statement