Genetic Diversity in Chimpanzee Transcriptomics Does Not Represent Wild Populations-Skuklaa-2021
Pan troglodytes has four genetically different subspecies. Comparing human and chimpanzee transcriptomes has been used to evaluate differences in gene expression levels that potentially cause phenotypic differences between the two species. The subspecies from which these transcriptome data sets were collected is not reported in NCBI's Sequence Read Archive (SRA). Inconsistent labeling of RNA sequencing (RNA-seq) samples between studies makes it difficult to determine how many individuals have transcriptomic data. Thus, we analyzed subspecies and individual genetic diversity in 486 public RNA-seq datasets accessible in the SRA, covering the great majority of public chimpanzee transcriptome data. Multiple population genetics methodologies show that 96.6% of samples had Western chimpanzee ancestry. At the individual donor level, we uncover several samples that have been studied across different studies and 135 genetically unique individuals, a figure that decreases to 89 when we exclude potential first- and second-degree relatives. Our results suggest that current chimpanzee transcriptome data captures limited genetic diversity compared to wild populations. These findings inform chimpanzee transcriptomics study.