Prostate cancer is the second most common cancer in men worldwide1. Over the past decade, large-scale integrative genomics efforts have enhanced our understanding of this disease by characterizing its genetic and epigenetic landscape in thousands of patients2,3. However, most tumours profiled in these studies were obtained from patients from Western populations. Here we produced and analysed whole-genome, whole-transcriptome and DNA methylation data for 208 pairs of tumour tissue samples and matched healthy control tissue from Chinese patients with primary prostate cancer. Systematic comparison with published data from 2,554 prostate tumours revealed that the genomic alteration signatures in Chinese patients were markedly distinct from those of Western cohorts: specifically, 41% of tumours contained mutations in FOXA1 and 18% each had deletions in ZNF292 and CHD1. Alterations of the genome and epigenome were correlated and were predictive of disease phenotype and progression. Coding and noncoding mutations, as well as epimutations, converged on pathways that are important for prostate cancer, providing insights into this devastating disease. These discoveries underscore the importance of including population context in constructing comprehensive genomic maps for disease.
All data, including raw data, mutation calls, and clinical information, have been deposited to the Genome Sequence Archive for Human (http://bigd.big.ac.cn/gsa-human/) at the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, under the accession number PRJCA001124. The raw sequencing data and somatic and germ-line mutation calls contain information unique to an individual and require controlled access. The deposited and publicly available data are compliant with the regulations of the Ministry of Science and Technology of the People’s Republic of China. Source Data for Figs. 2, 4 and Extended Data Figs. 6–8 are provided with the paper.
All computational code used in this study is available at the supporting website (http://www.cpgea.com).
Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).
Cancer Genome Atlas Research Network. The molecular taxonomy of primary prostate cancer. Cell 163, 1011–1025 (2015).
Armenia, J. et al. The long tail of oncogenic drivers in prostate cancer. Nat. Genet. 50, 645–651 (2018).
Shoag, J. & Barbieri, C. E. Clinical variability and molecular heterogeneity in prostate cancer. Asian J. Androl. 18, 543–548 (2016).
Kimura, T. East meets West: ethnic differences in prostate cancer epidemiology between East Asians and Caucasians. Chin. J. Cancer 31, 421–429 (2012).
Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).
Barbieri, C. E. et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat. Genet. 44, 685–689 (2012).
Beltran, H. et al. Divergent clonal evolution of castration-resistant neuroendocrine prostate cancer. Nat. Med. 22, 298–305 (2016).
Fraser, M. et al. Genomic hallmarks of localized, non-indolent prostate cancer. Nature 541, 359–364 (2017).
Gao, D. et al. Organoid cultures derived from patients with advanced prostate cancer. Cell 159, 176–187 (2014).
Grasso, C. S. et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature 487, 239–243 (2012).
Hieronymus, H. et al. Copy number alteration burden predicts prostate cancer relapse. Proc. Natl Acad. Sci. USA 111, 11139–11144 (2014).
Kumar, A. et al. Substantial interindividual and limited intraindividual genomic diversity among tumors from men with metastatic prostate cancer. Nat. Med. 22, 369–378 (2016).
Robinson, D. et al. Integrative clinical genomics of advanced prostate cancer. Cell 161, 1215–1228 (2015).
Taylor, B. S. et al. Integrative genomic profiling of human prostate cancer. Cancer Cell 18, 11–22 (2010).
Yuan, J. et al. Integrated analysis of genetic ancestry and genomic alterations across cancers. Cancer Cell 34, 549–560.e9 (2018).
Abida, W. et al. Prospective genomic profiling of prostate cancer across disease states reveals germline and somatic alterations that may affect clinical decision making. JCO Precis. Oncol. https://doi.org/10.1200/PO.17.00029 (2017).
Dall’Era, M. A., deVere-White, R., Rodriguez, D. & Cress, R. Changing incidence of metastatic prostate cancer by race and age, 1988–2015. Eur. Urol. Focus 5, 1014–1021 (2019).
Ren, S. et al. Whole-genome and transcriptome sequencing of prostate cancer identify new genetic alterations driving disease progression. Eur. Urol. 73, 322–339 (2017).
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Shen, M. M. & Abate-Shen, C. Molecular genetics of prostate cancer: new prospects for old challenges. Genes Dev. 24, 1967–2000 (2010).
Tomlins, S. A. et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, 644–648 (2005).
Quigley, D. A. et al. Genomic hallmarks and structural variation in metastatic prostate cancer. Cell 174, 758–769.e9 (2018).
Viswanathan, S. R. et al. Structural alterations driving castration-resistant prostate cancer revealed by linked-read genome sequencing. Cell 174, 433–447.e19 (2018).
Cortés-Ciriano, I. et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet. 52, 331–341 (2020).
Yu, Y. P. et al. Novel fusion transcripts associate with progressive prostate cancer. Am. J. Pathol. 184, 2840–2849 (2014).
Jang, J. S. et al. Common oncogene mutations and novel SND1-BRAF transcript fusion in lung adenocarcinoma from never smokers. Sci. Rep. 5, 9755 (2015).
Fishilevich, S. et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxford) 2017, bax028 (2017).
Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111 (2020).
Zhu, H. et al. Candidate cancer driver mutations in distal regulatory elements and long-range chromatin interaction networks. Mol. Cell. https://doi.org/10.1016/j.molcel.2019.12.027 (2020).
Jozwik, K. M. & Carroll, J. S. Pioneer factors in hormone-dependent cancers. Nat. Rev. Cancer 12, 381–385 (2012).
Sahu, B. et al. Dual role of FoxA1 in androgen receptor binding to chromatin, androgen signalling and prostate cancer. EMBO J. 30, 3962–3976 (2011).
Espiritu, S. M. G. et al. The evolutionary landscape of localized prostate cancers drives clinical aggression. Cell 173, 1003–1013.e15 (2018).
Gao, N. et al. The role of hepatocyte nuclear factor-3 alpha (Forkhead Box A1) and androgen receptor in transcriptional regulation of prostatic genes. Mol. Endocrinol. 17, 1484–1507 (2003).
Adams, E. J. et al. FOXA1 mutations alter pioneering activity, differentiation and prostate cancer phenotypes. Nature 571, 408–412 (2019).
Parolia, A. et al. Distinct structural classes of activating FOXA1 alterations in advanced prostate cancer. Nature 571, 413–418 (2019).
McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci. Transl. Med. 7, 283ra54 (2015).
Mina, M. et al. Conditional selection of genomic alterations dictates cancer evolution and oncogenic dependencies. Cancer Cell 32, 155–168.e6 (2017).
Ishizaki, F. et al. Androgen deprivation promotes intratumoral synthesis of dihydrotestosterone from androgen metabolites in prostate cancer. Sci. Rep. 3, 1528 (2013).
Berman, B. P. et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat. Genet. 44, 40–46 (2011).
Hansen, K. D. et al. Increased methylation variation in epigenetic domains across cancer types. Nat. Genet. 43, 768–775 (2011).
Hon, G. C. et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 22, 246–258 (2012).
Mazor, T. et al. DNA methylation and somatic mutations converge on the cell cycle and define similar evolutionary histories in brain tumors. Cancer Cell 28, 307–317 (2015).
Xiao, Q. et al. Systematic analysis reveals molecular characteristics of ERG-negative prostate cancer. Sci. Rep. 8, 12868 (2018).
Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. https://doi.org/10.1200/PO.17.00011 (2017).
Xu, B. et al. Altered chromatin recruitment by FOXA1 mutations promotes androgen independence and prostate cancer progression. Cell Res. 29, 773–775 (2019).
Gao, S. et al. Forkhead domain mutations in FOXA1 drive prostate cancer progression. Cell Res. 29, 770–772 (2019).
Gao, X., Wang, H., Wang, Y., Xu, C. & Sun, Y. Construction and clinical application of prostate cancer database (PC-Follow) based on browser/server schema. Chin. J. Urol. 36, 694–698 (2015).
Bergmann, E. A., Chen, B. J., Arora, K., Vacic, V. & Zody, M. C. Conpair: concordance and contamination estimator for matched tumor-normal pairs. Bioinformatics 32, 3196–3198 (2016).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protocols 7, 562–578 (2012).
Friedländer, M. R., Mackowiak, S. D., Li, N., Chen, W. & Rajewsky, N. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 40, 37–52 (2012).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).
Boeva, V. et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423–425 (2012).
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Yang, L. et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153, 919–929 (2013).
Jia, W. et al. SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data. Genome Biol. 14, R12 (2013).
Panigrahi, P., Jere, A. & Anamika, K. FusionHub: A unified web platform for annotation and visualization of gene fusion events in human cancer. PLoS One 13, e0196588 (2018).
Shugay, M., Ortiz de Mendíbil, I., Vizmanos, J. L. & Novo, F. J. Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions. Bioinformatics 29, 2539–2546 (2013).
Gonzalez-Perez, A. et al. Computational approaches to identify functional genetic variants in cancer genomes. Nat. Methods 10, 723–729 (2013).
Porta-Pardo, E. et al. Comparison of algorithms for the detection of cancer drivers at subgene resolution. Nat. Methods 14, 782–788 (2017).
Dees, N. D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).
Fu, Y. et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).
Melton, C., Reuter, J. A., Spacek, D. V. & Snyder, M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 47, 710–716 (2015).
Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).
Clark, K. L., Halay, E. D., Lai, E. & Burley, S. K. Co-crystal structure of the HNF-3/fork head DNA-recognition motif resembles histone H5. Nature 364, 412–420 (1993).
Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996).
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Wu, H. et al. Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Res. 43, e141 (2015).
Kishore, K. et al. methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data. BMC Bioinformatics 16, 313 (2015).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Weisenberger, D. J. et al. CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer. Nat. Genet. 38, 787–793 (2006).
Noushmehr, H. et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell 17, 510–522 (2010).
Mo, Q. et al. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl Acad. Sci. USA 110, 4245–4250 (2013).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).
We thank the patients and their families. This study was supported by the ‘Key Research and Development Project on Precision Medicine’ fund (2016YFC090220) granted by the Chinese Ministry of Science and Technology, the Shanghai Key Laboratory of Cell Engineering (14DZ2272300), the Shanghai ‘Top Priority’ Medical Center Project (2017ZZ01005), the ‘National Major New Drug Discovery Initiative’ Fund (2017ZX093040300002) granted by the ‘13th Five-Year Plan’ (Subproject), National Natural Science Foundation of China (81602467, J.L.), and the ‘Zhangjiang National Innovation Demonstration Zone’ Initiative Development Fund. H.J.L., N.M.S., E.C.P. and Ting Wang were supported by American Cancer Society grant RSG-14-049-01-DMC, and E.C.P. was supported by a Postdoctoral Fellowship, PF-17-201-01, from the American Cancer Society. We thank K.-l. Huang for technical assistance on iCluster, and X. Zhang for managing and organizing this project.
The authors declare no competing interests.
Peer review information Nature thanks Arul Chinnaiyan, Colin Collins, Colin Cooper and Charlie Massie for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A list of affiliations appears at the end of the paper
Extended data figures and tables
a, Clinical and pathological patient characterization. b, Study design, indicating the number of tumours with each data type. The cohort consisted of 208 patients who underwent radical prostatectomy. All tumours were analysed by WGS, as was a matched normal para-tumour specimen from each patient. In addition, RNA-seq (n = 134 tumours), miRNA-seq (n = 105), and whole-genome DNA methylation (n = 187) data were generated for a subset of patients. c–f, Comparison between somatic alteration calls from two pipelines for the TCGA PRAD (primary prostate tumour) cohort. ‘CPGEA pipeline’ indicates the pipeline used in this study. ‘TCGA report’ indicates publicly available somatic alteration calls. c, Distribution of mutation burdens in each cohort. Each dot corresponds to a mutation burden calculated from a tumour–normal pair. Red horizontal bars indicate the median mutation burden from the CPGEA pipeline and TCGA (both 0.70 per Mb). d, Genomic regions with significantly recurrent somatic CNAs called by GISTIC2.0. e, Heat map showing genome-wide CNAs. Top, 114 tumours clustered using the WGS-based CPGEA pipeline. Bottom, array-based TCGA results for the same tumours, arranged in the same order. f, Gene-level alteration frequencies from the two pipelines for the TCGA cohort. g, Alexandrov signatures in CPGEA and their association with clinical features. Top, percentage of samples per signature. Bottom, mutation counts for each signature, ordered from low to high by individual patient. h, Box plot showing the correlation of signatures 8 and 16 with Gleason score (Kruskal–Wallis test). Box plots as in Fig. 4b. Each dot corresponds to a tumour sample.
a, Heat map representationof CNA segments grouped by CNA burden subgroup (high, intermediate and low). b, Kaplan–Meier plot of biochemical relapse-free survival in different CNA burden subgroups, using the intermediate CNA burden (7.85%) as a cut-off value. P = 0.0024, two-sided log-rank test. c, Cancer genes with a significant CNA in CPGEA and Western cohorts. The inner circle displays a CNA heat map of individual patients sorted by chromosome, with CNA frequencies and significantly altered genes on the outer rim. d, Number of intra-chromosomal rearrangements as a function of the deletion status of CHD1. P values were determined by two-sided Mann–Whitney U-test. Box plots as in Fig. 4b.
a, Types of structural variation and numbers for individual tumours (columns). Chromoplexy and chromothripsis status, CHD1 deletion status, and ERG fusion status are displayed as a heat map. b, Frequency of recurrent structural variations and their affected genes for five types of structural variation. c, A recurrent inversion potentially disrupts a TAD boundary and results in enhancer hijacking. HiC map for the LNCaP cell line over the inversion. The inversion and TAD boundaries are marked. Expression levels of potentially affected genes are displayed as box plots. P values were determined by two-sided Mann–Whitney U-test. Box plots are as in Fig. 4b. Each dot corresponds to a normal sample (n = 134), a tumour with no structural variation (wild-type (WT), n = 131), or a tumour with structural variation (n = 3). d, Definition of five tiers of structural variation patterns based genomic annotation of the 5′ and 3′ breakpoints. e, Genomic location distribution of 5′ (left) and 3′ (middle) breakpoints, and distribution of different types of structural variation across the five defined tiers (right).
a, The circle represents gene fusions in Chinese and Western cohorts. Recurrent fusions (more than two samples) are displayed as connected gene pairs, in which the width of the connecting arc represents the number of samples that contained the fusion. Red indicates novel gene fusions not present in public databases (FusionHub). b, Fusion was validated by Sanger sequencing and RNA-seq data. Red cells indicate validated fusion events, and green cells indicate PCR failure. c, Circos plot displaying ETS family fusions. Expression levels are shown as a function of copy number. d, The SCHLAP1–UBE2E3 gene fusion. e, AMACR fusions. f, A heterozygous SND1–BRAF fusion found in CPGEA. g, In total, 83 SMGs were detected by MuSiC, including 7 genes called by both MuSiC and MutSigCV. h, Fraction of primary, metastatic, and other cancer types investigated by each study. i, Venn diagrams of SMGs defined in different studies. j, Genes significantly mutated in CPGEA, Western primary, and Western metastatic cohorts. Purple cells indicate that the gene was defined as an SMG in the study. h–j, The Western cohorts are from CPCG9, SU2C11, T/C/B (Trento/Cornell/Broad, neuroendocrine prostate cancer)8, B/C (Broad/Cornell)7, CRC13, M/DFCI3, TCGA2, Michigan11, MSKCC15, Organoid10, CNA-PNAS12 and MSK17.
a, Schematic workflow of noncoding mutation analysis in CPGEA. b, Distribution of noncoding mutations across different genomic features. c, Significance of mutation hotspots in noncoding regulatory regions. Each hotspot is colour-coded for its regulatory region annotation, and the statistical significance (false discovery rate (FDR)) and number of hits per sample are displayed. d, Significance of recurrent mutations in regulatory regions of interest. Regulatory regions for individual genes are displayed based on local and global measures of statistical significance (FDR). Colours indicate regulatory region annotations, and key genes are labelled. e, Enrichment of noncoding mutations resulting in gain or loss of transcription factor-binding sites. For each transcription factor, the match score to the position weight matrix (PWM) was determined for mutations that could potentially destroy or create a binding site for that transcription factor. Plotted for each transcription factor is the mean difference in the match scores for the mutated and reference alleles. Red indicates FDR < 0.05. P values for differences in mean match score were computed by two-sided paired Wilcoxon rank-sum test. f–h, Examples of noncoding mutations in selected genes. TBL1XR1 (f), FOXA1 (g) and FLI1 (h) are shown. Genome browser views show the location of the noncoding mutation. The genomic coordinates and types of noncoding mutation are labelled above the genome browser. Gene expression of genes with noncoding mutations is depicted.
a, FOXA1 mutation validation. Two representative validations by Sanger sequencing and reconstructed RNA-seq analysis. b, Validation of a FOXA1 in-frame deletion-derived peptide by mass spectrometry. c, Mapping of FOXA1 mutations onto the three-dimensional structure of FOXA1 and bound DNA (based on PDB registry 1VTN78). d, DNA methylation over FOXA1-binding sites in tumours with FOXA1 truncation/in-frame deletion. Top, FOXA1-binding motifs in the ENCODE chromatin immunoprecipitation with high-throughput sequencing (ChIP–seq) dataset (left) versus FOXA1-binding motifs outside of FOXA1 ChIP–seq peaks (right). Bottom, wild-type FOXA1-binding sites (left) and mutant FOXA1-binding sites (right) from recently published ChIP–seq data38. P values were determined by one-sided Mann–Whitney U-tests. Box plots are as in Fig. 4b. Each dot corresponds to a normal or tumour sample. e, Clonal analysis of FOXA1 in CPGEA. f, Mutual exclusivity or co-occurrence of gene alterations between genes belonging to 12 important curated pathways. Only alterations with at least one significant interaction (P < 0.05) are included. Asterisks indicate significant relationships. g, Allele frequency distribution of FOXA1 mutations in CPGEA and TCGA processed with the CPGEA pipeline. h, Significant mutual exclusions and co-occurrences between FOXA1 mutations and other genetic lesions in CPGEA, identified by OncoPrint from cBioPortal92. i, FOXA1 mutations and downstream pathways. Pairwise comparison of expression levels of important pathways. The z-score of specific genes and clinical features are displayed in a heat map grouped by different mutation subtypes.
a, Heat map of DNA methylation levels in the CPGEA cohort. Rows represent defined genomic regions including PMDs, hypoDMRs and hyperDMRs, and columns represent samples. Tumours (right) and matched normal samples (left) are sorted by epimutation rate. In each category, genomic regions are sorted by chromosomal coordinates. The top panel shows clinicopathological features of patients (as in Fig. 1), genetic alterations including fusions and coding mutations, and other molecular phenotypes. b, Two-dimensional density plot of the average CpG methylation level in normal versus tumour samples from the same patient. c, Average methylation level of CpGs overlapping different genomic features. P values determined by two-sided Wilcoxon signed-rank test. CDS, coding sequence. Each dot corresponds to a normal prostate or tumour sample. d, Average methylation level of CpGs overlapping different repeat element classes. P values were determined by two-sided Wilcoxon signed-rank test. Each dot corresponds to a normal prostate or tumour sample. e, Average non-CG methylation level in tumours and matched normal samples. Each dot represents a sample. Mean 0.37% for each group. P values were determined by two-sided Wilcoxon signed-rank test. Each dot corresponds to a normal prostate or tumour sample. f, Genome-wide methylation levels in 100-kb bins, clustered across tumour samples. Rows represent samples, and columns represent 100-kb genomic bins, with the DNA methylation level of each bin represented by the heat map. g, The genome fraction of total PMD length in each tumour, in decreasing order. The leftmost bar represents the genome fraction of the union set of PMDs across all tumours. h, PMD recurrence. The red line represents PMDs shared by at least 100 tumours (711 out of 2,218). i, Mutation frequency inside versus outside PMDs. P = 7.5 × 10−32, two-sided Wilcoxon signed-rank test. Mutation frequency was measured as the average number of SNVs per Mb. Each dot corresponds to a tumour sample (n = 187). j, Expression level of genes located in PMDs (n = 4,043) or outside PMDs (n = 15,344) in tumours versus matched normal samples. P values determined by one-sided Wilcoxon signed-rank test. Genes in PMDs had significantly lower expression than genes outside PMDs in both tumours and normal samples (P = 0, two-sided Mann–Whitney U-test). Outlier genes with very high expression were omitted from the plot. All box plots are as in Fig. 4b.
a, Recurrence of hypoDMRs. There were 1,172 hypoDMRs were shared by at least 10 tumours (red line). b, Recurrence of hyperDMRs. There were 4,214 hyperDMRs were shared by at least 10 tumours (red line). c, Genomic location of the union set of hypoDMRs and recurrent hypoDMRs. The innermost circle represents the reference genome background. d, Genomic location of the union set of hyperDMRs and recurrent hyperDMRs. The innermost circle represents the reference genome background. e, MSigDB perturbation enrichment analysis of recurrent hypoDMRs (n = 1,172) using GREAT87. f, Gene Ontology (GO) enrichment analysis of recurrent hyperDMRs (n = 4,214) using GREAT. The top 20 GO biological process terms are shown. g, Scatter plots of example epigenetically silenced genes. Each dot represents a normal sample (red), a tumour without a silenced gene (blue), or a tumour with a silenced gene (black). TPM, transcripts per million. h, Heat map of CIMP-CGI methylation levels. Rows represent CIMP-CGIs, and columns represent samples. Tumours (right) were clustered by CIMP-CGI methylation levels, and matched normal samples (left) were sorted in the same order. CIMP-CGIs were sorted by chromosome and genomic coordinates. The top panel shows clinicopathological features of patients (as in Fig. 1), genetic alterations, including fusions and coding mutations, and other molecular phenotypes. i, Proportion of recurrent hyperDMRs overlapping CGIs. j, Association of CIMP+ tumours (n = 33) with gene mutation status. Red vertical line represents P = 0.05 (two-sided Fisher’s exact test). k, Kaplan–Meier plot of biochemical recurrence-free survival in patients with CIMP+ and CIMP− tumours. P values were determined by two-sided log-rank test. l, m, Correlation between epimutation burden and mutation (l) or CNA (m) burden. Spearman’s correlation coefficient ρ = 0.37, P = 2.5 × 10−7 for mutation burden, and ρ = 0.65, P = 1.2 × 10−23 for CNA burden. Each dot represents a tumour (n = 187).
a, Molecular taxonomy across eight cohorts based on seven important oncogenic drivers identified by TCGA. b, Mutation burden, CNA burden and epimutation burden across the four molecular subtypes in CPGEA. c, Key CNA events, CIMP and fusion events across the four subtypes. ERG fusion-positive genes were combined results from Meerkat, SOAPfuse and high expression samples. d, Annotation of each molecular subtype. e, Kaplan–Meier plot of biochemical relapse-free survival for iCluster subtype D compared to the other three iCluster subtypes. P values were determined by two-sided log-rank test. f–h, Clustering of tumours using single datasets, using RNA-seq analysis (f), DNA methylation (g), and miRNA data (h). h, Rows represent miRNAs and columns represent tumours. The top panel shows clinical features of patients (as in Fig. 1) along with four miRNA clusters and four iCluster subtypes. i, Violin plots of mutation, CNA and epimutation burdens for four miRNA clusters. Mutation burden, P = 0.85, 0.43, 0.61, 0.58, 0.24 and 0.16, for the comparison between miRNA clusters of 1–2, 1–3, 1–4, 2–3, 2–4 and 3–4, respectively. CNA burden, P = 5.9 × 10−6, 0.00025, 0.29, 0.045, 1.3 × 10−26, and 4.1 × 10−5, in the same order. Epimutation burden, P = 0.0052, 0.090, 0.24, 0.20, 6.1 × 10−5 and 0.0080, in the same order. P values determined by two-sided Mann–Whitney U-test. Each dot corresponds to a tumour sample belong to miRNA cluster 1 (n = 21), 2 (n = 37), 3 (n = 34), or 4 (n = 13). j, Box plots of miRNA expression levels in normal samples and four miRNA-based tumour clusters (cluster 1 (n = 21), 2 (n = 37), 3 (n = 34), or 4 (n = 13)). Box plots are as in Fig. 4b. k, Kaplan–Meier plot of biochemical recurrence-free survival in patients with tumours belonging to miRNA cluster 2 or other clusters. P values were determined by two-sided log-rank test. Primary tumours without any treatment were included.
a, Summary of genetic and epigenetic lesions in 12 curated pathways across the Chinese prostate cancer subtypes. b, Comparison of the frequency of disturbances in the AR pathway between CPGEA (primary), TCGA (primary) and SU2C (metastasis) cohorts. The frequency of coding mutations in each AR pathway gene is shown. c, The frequency of fusions, structural variations, noncoding mutations and epimutations in each AR pathway gene in the CPGEA cohort. Information on additional pathways is provided at http://www.cpgea.com. d, Comparison of pathway-level alterations across the CPGEA (206 samples, excluding 2 microsatellite instability (MSI) samples), TCGA (114 samples processed with the CPGEA pipeline), and SU2C cohorts (150 samples downloaded from cBioPortal). To compare across cohorts, only coding mutations and CNAs were considered. e, Frequency of coding alterations (CNAs, fusion genes and nonsynonymous coding mutations) noncoding alterations, and both for each pathway in the CPGEA cohort. f, Different levels of actionable mutations predicted by OncoKB in CPGEA and TCGA.
This file contains Supplementary Discussion.
Supplementary Data 1: Metadata of public large cohorts of prostate cancer genomics studies.
Supplementary Data 2: Clinical and pathological information of specimen.
Supplementary Data 3: Comparison of SMG, CNA, and fusion frequencies between CPGEA pipeline and TCGA report on TCGA cohort, and between CPGEA and other public cohorts. Arm level copy number alterations were estimated by GISTIC. Focal copy number alterations and affected genes were also estimated by GISTIC.
Supplementary Data 4: GISTIC output of CPGEA cohort (n = 208).
Supplementary Data 5: Complete list of structural variations in CPGEA cohort.
Supplementary Data 6: Complete list of fusion events and validation in CPGEA cohort.
Supplementary Data 7: Hotspot, local, global, and TF (n = 117) analysis of noncoding mutations in CPGEA cohort.
Supplementary Data 8: FOXA1 mutations, validations, and mutual exclusivity and co-occurrences with other 74 genetic alterations from 206 tumours (SELECT output).
Supplementary Data 9: Pathway comparison between CPGEA, TCGA and SU2C.