2021年也继续陪伴着你
大家好,我是风。这里是风风的CNS美图复现系列专栏。今天是元旦耶,元旦快乐哦各位小伙伴们!预祝各位在2021年喜提SCI和课题无数,发文发到手抽筋,基金多到巨开心!好啦,言归正传,上周我们短暂地进行了放松环节,去到“风风的小故事”专栏听小故事,水蜜桃群的宝宝们已经提出了下一个小故事想要听的内容,最近我还听说我们“37°培养箱”队的度渡队长要搞大动作,让我在新的一年刚开始就充满了小期待呢!到底会是啥大动作呢?(我也不知道ing~)
传送门
学了那么久的ComplexHeatmap,我们也该到了收获的时候了,今天我们就找一张已经发表的文章的热图来复现一下:
这张图来自于Cancer Cell上的一篇文章,doi是
https://doi.org/10.1016/j.ccr.2012.08.024,原文给大家下载好了放在资料区,看文末回复关键词即可获取,此图为Figure 1,作为一张热图,简单给大家介绍下用了什么内容:
1.热图主体为甲基化探针表达量:行为探针名,列为样本名;

2.下面是各个样本的注释,包括:年龄、甲基化聚类、甲基化平均水平、表达水平、基因突变、拷贝数突变
这样看来这张热图好像并不难画对吧?按照我们前面说的拆和叠的内容,只需要先把主体画出来,然后不断叠加注释上去就好了嘛!我们简单复习下,画热图主体我们用Heatmaph函数进行叠加,注释的内容用HeatmapAnnotation函数,然后用draw函数把两部分画在一起就搞定了。听起来好像很简单是不?那我们来试试呗!
数据处理
画图第一步,当然是熟悉我们数据的特征啦,我们用GSE36278的数据,数据从GEO数据库下载整理,也可以用getGEO("GSE36278"),这里我已经下载好了给大家,直接load进来,然后熟悉数据:
rm(list = ls())# 加载R包# BiocManager::install("IlluminaHumanMethylation450kanno.ilmn12.hg19")library(matrixStats)library(GEOquery)library(IlluminaHumanMethylation450kanno.ilmn12.hg19)library(tidyverse)library(RColorBrewer)library(ComplexHeatmap)library(circlize)library(GetoptLong)# gset <- getGEO("GSE36278") # 想自己下载数据的话直接去掉这一行的#号load("GSE36278.Rdata")# gset$GSE36278_series_matrix.txt.gz@experimentData # 实验信息## gset$GSE36278_series_matrix.txt.gz@assayData$exprs# 甲基化位点表达矩阵## gset$GSE36278_series_matrix.txt.gz@phenoData # 表型数据,这里一般包含了我们要的临床信息、样本信息等内容## gset$GSE36278_series_matrix.txt.gz@featureData # 特征数据,与表型数据类似## gset$GSE36278_series_matrix.txt.gz@annotation # 平台信息## gset$GSE36278_series_matrix.txt.gz@protocolData@data # 样本名称## gset$GSE36278_series_matrix.txt.gz@.__classVersion__ # 分析信息# 上述内容可以自己对比一下,一般使用exprs和phenoData就可以满足大部分分析需求# 提取甲基化矩阵# 甲基化表达谱已通过Illumina HumanMethylation450 BeadChip阵列进行了测量, 我们通过IlluminaHumanMethylation450kanno.ilmn12.hg19程序包加载探针数据,同时将矩阵中的行名称调整为与探针相同data(Locations)mat <- exprs(gset[[1]])colnames(mat) <- phenoData(gset[[1]])@data$titlemat <- mat[rownames(Locations), ]
接下来我们需要对探针进行处理,probe包含探针的位置和CpG位点是否与SNP重叠的信息,接下来我们删除性染色体上的探针以及与SNP重叠的探针:
data(SNPs.137CommonSingle) # SNP信息data(Islands.UCSC) # UCSC的CpG岛信息l <- Locations$chr %in% paste0("chr", 1:22) & is.na(SNPs.137CommonSingle$Probe_rs)# 提取chr1-chr22,也就是去除X和Y的性染色体,并过滤掉与SNP重叠的探针mat <- mat[l, ] # 重新提取探针矩阵# 获取探针位置的子集以及相应的CpG岛注释cgi <- Islands.UCSC$Relation_to_Island[l]loc <- Locations[l, ]
加下来我们还要对样本进行分组并且重命名:
# 样本分组mat1 <- as.matrix(mat[, grep("GBM", colnames(mat))]) # grep函数提取肿瘤样本的下标mat2 <- as.matrix(mat[, grep("CTRL", colnames(mat))]) # grep函数提取正常样本的下标colnames(mat1) <- gsub("GBM", "dkfz", colnames(mat1))# 表型数据在文章补充材料中,这里已经给大家下载好了,放在资料区,后台回复关键词获取# 将表型数据的行调整为与甲基化矩阵的列相同phenotype <- read.table("450K_annotation.txt", header = TRUE, sep = "\t", row.names = 1, check.names = FALSE, comment.char = "", stringsAsFactors = FALSE)phenotype <- phenotype[colnames(mat1), ]
提取肿瘤样品中甲基化程度最高的前8000个探针和相应信息:
ind <- order(rowVars(mat1, na.rm = TRUE), # rowVars函数计算行方差,并用order进行排序 decreasing = TRUE)[1:8000] m1 <- mat1[ind, ] # 提取肿瘤样本的探针信息m2 <- mat2[ind, ] # 提取正常样本的探针信息# 提取探针注释信息cgi2 <- cgi[ind] cgi2 <- ifelse(grepl("Shore", cgi2), "Shore", cgi2)cgi2 <- ifelse(grepl("Shelf", cgi2), "Shelf", cgi2)loc <- loc[ind, ]# 处理矩阵中的NA值m1[is.na(m1)] = 0.5m2[is.na(m2)] = 0.5
绘制图片
接下来我们开始绘图,首先给图例进行赋值,使用HeatmapAnnotation绘制图例,这里的图例包括age、Methylation Cluster、TCGA Methylation、TCGA Expression、Mutations、Cytogenetics等内容:
ha = HeatmapAnnotation( age = anno_points(phenotype[[13]], gp = gpar(col = ifelse(phenotype[[13]] > 20, "black", "red")), height = unit(3, "cm")), # 设置age注释 dkfz_cluster = phenotype[[1]], # 设置Methylation Cluster注释 tcga_cluster = phenotype[[2]], # 设置TCGA Methylation注释 tcga_expr = phenotype[[3]], # 设置TCGA Expression注释 IDH1 = phenotype[[5]], # 设置IDH1突变注释 H3F3A = phenotype[[4]], # 设置H3F3A突变注释 TP53 = phenotype[[6]], # 设置TP53突变注释 chr7_gain = ifelse(phenotype[[7]] == 1, "gain", "normal"), # 设置chr7_gain(CNV)注释 chr10_loss = ifelse(phenotype[[8]] == 1, "loss", "normal"), # 设置chr10_loss(CNV)注释 CDKN2A_del = ifelse(phenotype[[9]] == 1, "del", "normal"), # 设置CDKN2A_del(CNV)注释 EGFR_amp = ifelse(phenotype[[10]] == 1, "amp", "normal"), # 设置EGFR_amp(CNV)注释 PDGFRA_amp = ifelse(phenotype[[11]] == 1, "amp", "normal"), # 设置PDGFRA_amp(CNV)注释 col = list(dkfz_cluster = structure(names = c("IDH", "K27", "G34", "RTK I PDGFRA","Mesenchymal", "RTK II Classic"), brewer.pal(6, "Set3")), # 设置Methylation Cluster颜色 tcga_cluster = structure(names = c("G-CIMP+", "Cluster #2", "Cluster #3"), brewer.pal(3, "Set1")), # 设置TCGA Methylation颜色 tcga_expr = structure(names = c("Proneural", "Classical", "Mesenchymal"), c("#377EB8", "#FFFF33", "#FF7F00")), # 设置TCGA Expression颜色 IDH1 = structure(names = c("MUT", "WT", "G34R", "G34V", "K27M"), c("black", "white", "#4DAF4A", "#4DAF4A", "#377EB8")), # 将mutation合并为同一种颜色类型 H3F3A = structure(names = c("MUT", "WT", "G34R", "G34V", "K27M"), c("black", "white", "#4DAF4A", "#4DAF4A", "#377EB8")), TP53 = structure(names = c("MUT", "WT", "G34R", "G34V", "K27M"), c("black", "white", "#4DAF4A", "#4DAF4A", "#377EB8")), chr7_gain = c("gain" = "black", "normal" = "white"), # 将Cytogenetics合并为同一种颜色类型 chr10_loss = c("loss" = "black", "normal" = "white"), CDKN2A_del = c("del" = "black", "normal" = "white"), EGFR_amp = c("amp" = "black", "normal" = "white"), PDGFRA_amp = c("amp" = "black", "normal" = "white")), na_col = "grey", border = TRUE, show_legend = c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE), show_annotation_name = FALSE, annotation_legend_param = list( dkfz_cluster = list(title = "Methylation Cluster"), tcga_cluster = list(title = "TCGA Methylation"), tcga_expr = list(title = "TCGA Expression"), H3F3A = list(title = "Mutations"), chr7_gain = list(title = "Cytogenetics")) # 定义注释名)ha## A HeatmapAnnotation object with 12 annotations## name: heatmap_annotation_0 ## position: column ## items: 136 ## width: 1npc ## height: 88.8660578386606mm ## this object is subsetable## 5.13831111111111mm extension on the left ## ## name annotation_type color_mapping height## age anno_points() 30mm## dkfz_cluster discrete vector user-defined 5mm## tcga_cluster discrete vector user-defined 5mm## tcga_expr discrete vector user-defined 5mm## IDH1 discrete vector user-defined 5mm## H3F3A discrete vector user-defined 5mm## TP53 discrete vector user-defined 5mm## chr7_gain discrete vector user-defined 5mm## chr10_loss discrete vector user-defined 5mm## CDKN2A_del discrete vector user-defined 5mm## EGFR_amp discrete vector user-defined 5mm## PDGFRA_amp discrete vector user-defined 5mm
图例绘制完成后,开始绘制主体热图,主体热图有三个热图组合而成,用我们前面学习的方法,那就是分别绘制Heatmap然后用+组合起来:
col_fun = colorRamp2(c(0, 0.5, 1), c("navy", "white", "red")) # 设置颜色,低表达为navy,高表达为redht_list = Heatmap(m1, col = col_fun, name = "Methylation",clustering_distance_columns = "spearman",show_row_dend = FALSE, show_column_dend = FALSE,show_column_names = FALSE,bottom_annotation = ha, column_title = qq("GBM samples (n = @{ncol(m1)})"),row_split = factor(cgi2, levels = c("Island", "Shore", "Shelf", "OpenSea")),row_title_gp = gpar(col = "#FFFFFF00")) +Heatmap(m2, col = col_fun, show_column_names = FALSE,show_column_dend = FALSE, column_title = "Controls",show_heatmap_legend = FALSE, width = unit(1, "cm")) +Heatmap(cgi2, name = "CGI", show_row_names = FALSE, width = unit(5, "mm"),col = structure(names = c("Island", "Shore", "Shelf", "OpenSea"),c("red", "blue", "green", "#CCCCCC")))
好了,一个热图已经绘制完成,接下来可以使用draw函数进行调整和组合:
draw(ht_list,row_title = paste0("DNA methylation probes (n = ", nrow(m1), ")"),annotation_legend_side = "left",heatmap_legend_side = "left")
是不是感觉还差点意思?图例上面的修饰没跟上对吧?那我们给它补上:
annotation_titles = c(dkfz_cluster = "Methylation Cluster",tcga_cluster = "TCGA Methylation",tcga_expr = "TCGA Expression",IDH1 = "IDH1",H3F3A = "H3F3A",TP53 = "TP53",chr7_gain = "Chr7 gain",chr10_loss = "Chr10 loss",CDKN2A_del = "Chr10 loss",EGFR_amp = "EGFR amp",PDGFRA_amp = "PDGFRA amp")for(an in names(annotation_titles)) {decorate_annotation(an, {grid.text(annotation_titles[an], unit(-2, "mm"), just = "right")grid.rect(gp = gpar(fill = NA, col = "black"))})}decorate_annotation("age", {grid.text("Age", unit(8, "mm"), just = "right")grid.rect(gp = gpar(fill = NA, col = "black"))grid.lines(unit(c(0, 1), "npc"), unit(c(20, 20), "native"), gp = gpar(lty = 2))})decorate_annotation("IDH1", {grid.lines(unit(c(-40, 0), "mm"), unit(c(1, 1), "npc"))})decorate_annotation("chr7_gain", {grid.lines(unit(c(-40, 0), "mm"), unit(c(1, 1), "npc"))})
这样我们整个图片就复现完成啦,其实使用的方法都在我们前面的推文都讲解过了,这次复现就是把前面内容进行组合,并且使用自己现有的数据进行分析,如果你有自己的类似数据,可以直接更改数据套用代码,这样就可以直接出一张Cancer Cell的美图啦!

杂志推荐

只有符合以下条件的杂志才会被推荐:
  1. 至少位于3区及以上,影响因子在3+;
  2. 每年发文量至少在200以上;
  3. 网上可查审稿周期不超过2.5个月
  4. 未被“预警名单”收录(emmmmmm)
  5. 尽量推荐一些大家在其他地方看的比较少的期刊(知道的人越少,目前来说可能相对更大几率接收)
  6. (等着你们提)

本周推荐期刊:

期刊名:OncoImmunology(在“挑圈联靠”公众号回复杂志名可以获得如上图的期刊详细信息👆)
上榜理由:美国老牌期刊,中科院分区2区,JCR分区1区,年发文量251,近几年上面也有不少生信文章,如果有实验验证,中的概率会更大一点,给大家列举几篇看题目就能猜到大概率属于生信分析的文章:
  1. An integrated classifier improves prognostic accuracy in non-metastatic gastric cancer.  2020 Aug 30;9(1):1792038.doi: 10.1080/2162402X.2020.1792038.
  2. HLA class-I and class-II restricted neoantigen loads predict overall survival in breast cancer. 2020 Apr 1;9(1):1744947. doi: 10.1080/2162402X.2020.1744947.
  3. Identification of tumor-infiltrating immune cells and prognostic validation of tumor-infiltrating mast cells in adrenocortical carcinoma: results from bioinformatics and real-world data. 2020 Jun 23;9(1):1784529. doi: 10.1080/2162402X.2020.1784529.
好了,后台回复“feng31”获得本次代码和绘图数据,别忘了引用ComplexHeatmap的文献哦!今天的内容就到这里,我们下次再见吧!*^_^*
我是风,解螺旋先锋班学员,翡翠讲师,挑圈联靠专栏作者,训练营助教。2020过得真快,新的一年已经到来,我加入先锋班大家庭也一年多啦,这一年从解螺旋各位老师身上学习了很多,也有了很多收获和很多珍贵的回忆,希望新的一年能够再接再厉,继续成长,也希望能够在不断输入的同时,也有更好的输出来回馈解螺旋!mua~
END

撰文丨风   风
排版丨四金兄
值班 | 风间琉璃
主编丨小雪球
欢迎大家关注解螺旋生信频道-挑圈联靠公号~
继续阅读
阅读原文