- Empowerment: Our goal is to develop the technologies, platforms and bioinformatics infrastructures to rapidly and inexpensively sequence large and complex genomes of coniferous forest trees. This will allow the forestry community to begin sequencing the many genomes of economic and ecological importance without a dependence on centralized genome centers.
- Adaptive: We recognize the sequencing technologies are developing rapidly and that we must have the expertise and flexibility to rapidly adopt new approaches into our overall sequencing strategy.
- Comparative: We recognize the power of comparative genomics approaches in assembling and annotating genome sequences and will use this approach throughout the project.
- Open Access: We have a policy of sharing all data generated from this project with the research community
Impact and Outcomes
Our project will achieve broad community empowerment through the development of a reference conifer genome sequence. Short-term impacts include facilitating genomic-based breeding for wood products and energy, developing diagnostic tools to estimate risk and impacts caused by changing environments, and facilitating genome sequencing in other conifers. Additional outcomes and longer-term impacts include more complete integration of forest and horticultural tree genome projects and integration with other areas of tree biology (e.g., via the NSF iPlant Tree Biology Cyberinfrastructure initiative).
- Zimin et al: An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing
- Gonzalez-Ibeas et al: Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana)
- Stevens et al: Sequence of the Sugar Pine Megagenome
- Neale et al: Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies
- Zimin et al: Sequencing and Assembly of the 22-Gb Loblolly Pine Genome
- Wegrzyn et al: Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation
- Wegrzyn et al: Insights into the Loblolly Pine Genome: Characterization of BAC and Fosmid Sequences
Pinus taeda Genomics Projects
- Allele Discovery of Economic Pine Traits (ADEPT)
- Allele Discovery of Economic Pine Traits 2 (ADEPT2)
- Conifer Comparative Genomics Project (CCGP)
- Conifer Translational Genomics Network (CTGN)
- Forest Tree Genetic Stock Center (FTGSC)
- Accelerating Pine Genomics – MGEL
- Expanded EST Resource for Pines and Other Conifers
- Specific Aim 1: High-quality reference genome sequences of loblolly pine and three other conifer species
- Effective deployment of new technologies in a hierarchical WGS approach will yield reference sequences based on well-defined milestones. An initial and early deliverable will be 21X WGS sequence and preliminary assemblies (gene-boosted and whole genome) of the loblolly pine genome based on >= 100 bp paired-end Illumina sequences of a mix of 500-bp, 5-kbp, and 40-kbp (fosmid-diTag) libraries. In less than two years a 10×18 hierarchical WGS (180X total read depth) based on 18X (read depth of 500-bp, 5-kbp and 40-kbp libraries) of many small pools of fosmids will be the fundamental data for two types of assemblies: a consensus based on all the data and a second consensus based on hierarchical analysis of subassemblies of the haploid fosmid pools. Polishing will follow that includes longer end reads from a 10X BAC library, deep fosmind-end sequencing, and existing or emerging long-read technologies which are deemed effective for improving assembly quality. A high-resolution (0.1 cM) genetic scaffold based on a new genotyping resource will incorporate all genotypable contigs and validate the contiguity lar ger ones. In the later years comparable reference sequences for sugar pine, slash pine, and Douglas fir will be created. Comparative genomic analysis of these four conifer genomes will provide a solid and rich annotation and further improve assembly quality and contiguity.
- Specific Aim 2: Transcriptome sequencing for gene discovery, reference building, and aids to genome assembly
- We will build transcriptome references using multiple sequencing approaches to maximize evidence-based gene discovery in parallel with the reference genome assembly and annotation and we will provide full transcript assemblies for functional genomics studies. Initially, RNAs from a large number of loblolly pine organs, stages of development, and tissues exposed to biotic and abiotic stresses will be sequenced using the long reads of Roche/454 GS-FLX Titanium technology. Subsequently, higher-depth RNA-Seq approaches will be employed using the Illumina platform, including the sequencing of various mRNA and noncoding RNA libraries. Data will be used first to add depth and detail to the transcriptome and to catalog transcribed polymorphisms. Transcriptome analysis will profile gene expression differences of biological importance, including changes in development of reproductive tissues, embryos and seedlings, and wood and in response to biotic and abiotic stresses.
- Specific Aim 3: Dendrome and TreeGenes databases: Annotation, data integration, and distribution
- The transcriptome and genome sequences will be delivered via TreeGenes to the community as sequence becomes available. Collaboration with GDR will provide the primary annotation and integrate a custom web-based tool known as GenSAS from GDR with GBrowse from Dendrome to facilitate community-level annotation. We will apply and expand existing pipelines to deliver a comprehensive SNP resource and distribute this through the existing DiversiTree interface. We will work continuously with existing projects like Gene Ontology and Plant Ontology to imple ment specific conifer-based ontologies to consistently describe gene products and phenotypes. All pipelines and tools developed in this project will be made freely available to the academic community.