June 30,2014 | Vol. 2
Location :Home> Research Update_201402

Decoded Genome Sequence of the Cultivated Cotton- Gossypium arboreum

The genome sequence of the cultivated cotton Gossypium arboreum (AA) was decoded after the successfully sequencing of Gossypium raimondii (DD) in 2012. This great work was done by Institute of Cotton Research (ICR) of Chinese Academy of Agricultural Sciences (CAAS) and the findings of the study were published online in Nature Genetics on May 18th, 2014.

Institute of Cotton Research of the Chinese Academy of Agricultural Sciences (CAAS) and its partner USDA-ARS, initiated Cotton Genome Project (CGP) in December 2007. The genome sequencing of these two diploid species of tetraploid cottons laid an important foundation for the genome sequencing, assembly and evolutional analysis of tetraploid cottons.

How did it happen? Through this project, a highly homozygous cultivar of G. arboreum, shixiya 1, was sequenced, making a total of 1 193.6 Gb of clean sequence covering the genome by 112.6-fold which were obtained by paired-end sequencing. 90.4% of the assembly were anchored and oriented on 13 pseudochromosomes and later it was found that 68.5% of the genome was occupied by repetitive DNA sequences, whereby 41,330 protein-coding genes were predicted in G. arboreum.

Nevertheless, molecular phylogenetic analyses suggested a divergence time for G. arboreum and G. raimondii of about 5 (2-13) million years ago. There after two whole-genome duplications were shared by G. arboreum and G. raimondii before speciation. However, insertions of long terminal repeat in the past 5 million years which are responsible for the twofold difference in the sizes of these genomes. The finding results of the study will not only facilitate the understanding of the complexity of cotton genome but also genetic diversity of cotton genus.

The comparisons between the G. arboreum genome with the G. raimondii and T. cacao genome sequences identified differences in the expression patterns of NBS domain–encoding genes. The results showed that genes related to the disease resistance in G. raimondii were significantly expanded compared to T. cacao, whereas the number of G. arboreum was similar to T. cacao. This might be the main reason of great difference between the two cottons in regards to the resistance of Verticillium wilt. Moreover, tandem duplications seemed to have a significant role in the expansion of the NBS-encoding gene family in G. raimondii after its divergence from G. arboreum for 5 million years ago, and segmental loss contributed to its contraction in G. arboreum.

Equally important, ethylene as an important signaling molecule that promotes cotton fiber elongation in cotton and the dot plots of promoter regions showed that a deletion of ~130 bp beginning at −470 bp relative to the transcription start site of GaACO1 resulted in loss of a putative MYB-binding site. More important, very high levels of ACO transcripts in G. raimondii ovules in conjunction with an ethylene burst might force an early fiber senescence phenotype, whereas the inactivation of ACO gene transcription in G. arboreum ovules might be responsible for the short-fiber phenotype in this species.

Is true without doubt that the findings of the completion of genome sequencing of diploid cotton G. arboreum will enhance the understanding of the molecular mechanisms of important characters and molecular breeding of new cotton varieties. More important, the findings will be a solid foundation for elucidating the origin of cotton, evolution, and revealing the formation process of tetraploid cotton and other polyploid species.