高分文章不讲xx，竟然用这种图片

学会ComplexHeatmap包的组合技用法上

大家好，我是风。这里是风风的CNS美图复现系列专栏。如果你能跟到这里，那你肯定是真爱无疑了，不知道有没用上ComplexHeatmap给你文章加上一两个图呢？再次感谢ComplexHeatmap包开发者顾博士，同时提醒大家，如果你使用了这个包，请引用文章：“Gu, Z. (2016) Complex heatmaps reveal patterns and correlations in multidimensional genomic data. DOI: 10.1093/bioinformatics/btw313”。
传送门
手把手教你复现一篇CNS级别美图！附代码，建议收藏！
CNS级别的美图复现起来难不难？小白也能快速上手（附代码）（二）
小白课堂 | 学个R包，复现一张CNS级别美图（附代码）
学一个R包给文章加一张CNS级别的美图（四）
学一个R包给文章加一张CNS级别的美图（五）
学一个R包给文章加一张CNS级别的美图（六）
拆和叠？原来你和高阶论文的图片就差这点
可繁可简，可盐可甜，这才是upset的真面目吧！

基础技能我们基本都学完了，今天我们继续学习一个组合技：用热图展示甲基化、表达量、基因组特征之间的关系，以期能够找到三者之间的潜在关系。如果你想方便一点，可以把数据整理成示例文件的格式，然后直接复制粘贴运行就等出图即可，当然最好还是能够综合前面学习的内容一步一步进行啦！

请先记住这张结果图，接着我们就要对它进行复现：

好，话不多说，我们马上开始！

加载数据

画图第一步，当然是熟悉我们数据的特征啦，先把数据导入进来：

load("Meth.rda")type <- res_list$typehead(type)## [1] "Tumor" "Tumor" "Tumor" "Tumor" "Tumor" "Tumor"mat_meth <- res_list$mat_methhead(mat_meth)## sample1 sample2 sample3 sample4 sample5 sample6 sample7## [1,] 0.09508993 0.10253955 0.1242478 0.2108422 0.1147976 0.1191285 0.14215247## [2,] 0.17894877 0.20436340 0.3663206 0.2851402 0.1415034 0.2746609 0.24187039## [3,] 0.19883432 0.09013638 0.1634693 0.1049659 0.1783066 0.2483717 0.09150014## [4,] 0.33309078 0.22795393 0.1966336 0.2611358 0.4468088 0.5024891 0.32202946## [5,] 0.37314951 0.37459873 0.4272505 0.3935484 0.4477683 0.2472484 0.45435208## [6,] 0.16395173 0.23372889 0.1180172 0.2565564 0.1958752 0.1374908 0.16019662## sample8 sample9 sample10 sample11 sample12 sample13 sample14## [1,] 0.1455233 0.1238175 0.2356176 0.4242254 0.4987964 0.4239122 0.3477856## [2,] 0.2639306 0.3162204 0.2806442 0.2645001 0.5272360 0.5437152 0.3818492## [3,] 0.1890816 0.1600013 0.1757942 0.2148016 0.3090826 0.5236685 0.2294759## [4,] 0.5213153 0.4827805 0.2993246 0.2261042 0.1588585 0.1133573 0.1050069## [5,] 0.4223604 0.4816269 0.3474320 0.3985611 0.5330987 0.5689659 0.5208801## [6,] 0.2295001 0.1835948 0.1855747 0.2443270 0.4221594 0.4097455 0.3539166## sample15 sample16 sample17 sample18 sample19 sample20## [1,] 0.4225247 0.3855191 0.4825476 0.4302995 0.3766141 0.2590889## [2,] 0.4944915 0.3471979 0.4010059 0.2298062 0.4378647 0.3602383## [3,] 0.3303636 0.3470328 0.4227933 0.3855401 0.3916592 0.4209159## [4,] 0.1308567 0.1549738 0.1158741 0.2261024 0.1545304 0.2265255## [5,] 0.7322305 0.4907163 0.4421528 0.6781560 0.4727099 0.5861254## [6,] 0.3036062 0.4667574 0.3621896 0.3737679 0.4760492 0.3609025mat_expr <- res_list$mat_exprhead(mat_meth)## sample1 sample2 sample3 sample4 sample5 sample6 sample7## [1,] 0.09508993 0.10253955 0.1242478 0.2108422 0.1147976 0.1191285 0.14215247## [2,] 0.17894877 0.20436340 0.3663206 0.2851402 0.1415034 0.2746609 0.24187039## [3,] 0.19883432 0.09013638 0.1634693 0.1049659 0.1783066 0.2483717 0.09150014## [4,] 0.33309078 0.22795393 0.1966336 0.2611358 0.4468088 0.5024891 0.32202946## [5,] 0.37314951 0.37459873 0.4272505 0.3935484 0.4477683 0.2472484 0.45435208## [6,] 0.16395173 0.23372889 0.1180172 0.2565564 0.1958752 0.1374908 0.16019662## sample8 sample9 sample10 sample11 sample12 sample13 sample14## [1,] 0.1455233 0.1238175 0.2356176 0.4242254 0.4987964 0.4239122 0.3477856## [2,] 0.2639306 0.3162204 0.2806442 0.2645001 0.5272360 0.5437152 0.3818492## [3,] 0.1890816 0.1600013 0.1757942 0.2148016 0.3090826 0.5236685 0.2294759## [4,] 0.5213153 0.4827805 0.2993246 0.2261042 0.1588585 0.1133573 0.1050069## [5,] 0.4223604 0.4816269 0.3474320 0.3985611 0.5330987 0.5689659 0.5208801## [6,] 0.2295001 0.1835948 0.1855747 0.2443270 0.4221594 0.4097455 0.3539166## sample15 sample16 sample17 sample18 sample19 sample20## [1,] 0.4225247 0.3855191 0.4825476 0.4302995 0.3766141 0.2590889## [2,] 0.4944915 0.3471979 0.4010059 0.2298062 0.4378647 0.3602383## [3,] 0.3303636 0.3470328 0.4227933 0.3855401 0.3916592 0.4209159## [4,] 0.1308567 0.1549738 0.1158741 0.2261024 0.1545304 0.2265255## [5,] 0.7322305 0.4907163 0.4421528 0.6781560 0.4727099 0.5861254## [6,] 0.3036062 0.4667574 0.3621896 0.3737679 0.4760492 0.3609025direction <- res_list$directionhead(direction)## [1] "hypo" "hypo" "hypo" "hyper" "hypo" "hypo"cor_pvalue <- res_list$cor_pvaluehead(cor_pvalue)## [1] 0.65603378 0.57822828 0.34949399 0.02867702 1.23879545 0.10946518gene_type <- res_list$gene_typehead(gene_type)## [1] "protein_coding" "psedo-gene" "psedo-gene" "lincRNA" ## [5] "others" "psedo-gene"anno_gene <- res_list$anno_genehead(anno_gene)## [1] "intergenic" "intergenic" "intergenic" "intergenic" "intergenic"## [6] "TSS"dist <- res_list$disthead(dist)## [1] 388737 97280 36426 420011 355666 317548anno_enhancer <- res_list$anno_enhancerhead(anno_enhancer)## enhancer_1 enhancer_2 enhancer_3## [1,] 0.07196473 0.0000000 0.2879309## [2,] 0.75837107 0.3554948 0.0000000## [3,] 0.00000000 0.8477110 0.0000000## [4,] 0.00000000 0.0000000 0.0000000## [5,] 0.57110298 0.0000000 0.9173112## [6,] 0.00000000 0.0000000 0.6795756

对数据进行解释：加载的res_list数据总共有9个list：

type是样本类型，标注肿瘤或者正常样本；

mat_meth是甲基化水平矩阵，其中行是差异甲基化区域(DMR),矩阵中的值是每个样品中DMR中的平均甲基化水平；

mat_expr是基因表达矩阵，其中行对应与DMR相关的基因(即离DMR最近的基因),矩阵中的值是每个样本中每个基因的表达水平；

direction表示甲基化趋势，在肿瘤中是高甲基化还是低甲基化；

cor_pvalue表示甲基化水平与相关基因表达量的相关性检验p值；

gene_type表示基因类型，是mRNA或者lncRNA；

anno_gene表示基因功能区域，是intergenic, intragenic 或者 TSS；

dist表示相关基因从DMR到TSS的距离；anno_enhancer表示部分DMR与增强子重叠。

好了，了解了数据的构成，我们就可以开始画图了。

绘制图片

接下来我们开始绘图，首先计算甲基化矩阵的列的聚类，以便调整表达矩阵中的列，使得和甲基化矩阵中的列具有相同的列顺序：

column_tree <- hclust(dist(t(mat_meth))) head(column_tree)## $merge## [,1] [,2]## [1,] -5 -7## [2,] -3 -9## [3,] -1 -2## [4,] -4 -10## [5,] -6 1## [6,] 2 3## [7,] -8 5## [8,] 4 6## [9,] 7 8## [10,] -12 -18## [11,] -13 -14## [12,] -16 -17## [13,] -11 -20## [14,] -15 -19## [15,] 10 14## [16,] 12 13## [17,] 11 16## [18,] 15 17## [19,] 9 18## ## $height## [1] 3.900199 3.901578 3.902258 3.910749 3.967042 3.968184 4.031672 4.167456## [9] 4.211256 4.758352 4.764540 4.788488 4.790655 4.798932 4.939085 4.961162## [17] 5.065518 5.141802 8.025530## ## $order## [1] 8 6 5 7 4 10 3 9 1 2 12 18 15 19 13 14 16 17 11 20## ## $labels## [1] "sample1" "sample2" "sample3" "sample4" "sample5" "sample6" ## [7] "sample7" "sample8" "sample9" "sample10" "sample11" "sample12"## [13] "sample13" "sample14" "sample15" "sample16" "sample17" "sample18"## [19] "sample19" "sample20"## ## $method## [1] "complete"## ## $call## hclust(d = dist(t(mat_meth)))column_order <- column_tree$order # 提取顺序的向量

接着我们先定义一些颜色，方便在后续绘图可以直接设置，让代码更加简洁：

library(RColorBrewer)library(circlize)# 定义颜色meth_col_fun <- colorRamp2(c(0, 0.5, 1), c("blue", "white", "red"))direction_col <- c("hyper" = "red", "hypo" = "blue")expr_col_fun <- colorRamp2(c(-2, 0, 2), c("green", "white", "red"))pvalue_col_fun <- colorRamp2(c(0, 2, 4), c("white", "blue", "red"))gene_type_col <- structure(brewer.pal(length(unique(gene_type)), "Set3"), names = unique(gene_type))anno_gene_col <- structure(brewer.pal(length(unique(anno_gene)), "Set1"), names = unique(anno_gene))dist_col_fun <- colorRamp2(c(0, 10000), c("black", "white"))enhancer_col_fun <- colorRamp2(c(0, 1), c("white", "orange"))

# 这一步目的是对前面每一个参数的颜色进行定义，让图片更加美观

我们来分析一下图片，图片是由两个热图主体加上注释合并在一起绘制而成，按照我们“拆”跟“叠”的二字法则，我们先把注释内容用HeatmapAnnotation整理出来：

library (ComplexHeatmap)ha1 <- HeatmapAnnotation(type = type, col = list(type = c("Tumor" = "red", "Control" = "royalblue")), annotation_name_side = "left") # 第一个热图主体的样本注释ha2 <- HeatmapAnnotation(type = type, col = list(type = c("Tumor" = "red", "Control" = "royalblue")), show_legend = FALSE) # 第二个热图主体的样本注释

弄好注释之后，我们先来画第一个热图：

第一个热图画好之后，我们可以同样的方法再画右边的热图，并且将两个图形合并在一起：

Heatmap(mat_meth,# 使用甲基化矩阵进行绘图，其余参数都为上面注释好的内容name = "methylation", col = meth_col_fun,column_order= column_order,top_annotation = ha1, column_title = "Methylation") +Heatmap(mat_expr[,column_tree$order], # 调整矩阵列的顺序与前面第一个矩阵一致name = "expression", col = expr_col_fun, column_order = column_order, top_annotation = ha2, column_title = "Expression")

两个热图主体中间有一个direction，我们把它加上去：（由于direction在两个主体热图中间，因此我们需要调整顺序，也就是甲基化热图+direction+表达量热图）

这时候的热图基本已经完成，但是还差一些各种各样的注释，我们一个一个加上去,怎么加呢：我们先看只有一个注释时候怎么画：

Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) # 绘制p值

发现没有？还是按照画热图的方法，将数据看成是只有一列的矩阵就好，那如果想画所有的注释是不是只需要像上面一样不断用+呢？没错，聪明的孩子有漂亮图出，我们来画一下所有注释：

Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) + Heatmap(gene_type, name = "gene type", col = gene_type_col) + Heatmap(anno_gene, name = "anno_gene", col = anno_gene_col) + Heatmap(dist, name = "dist_tss", col = dist_col_fun) + Heatmap(anno_enhancer, name = "anno_enhancer", col = enhancer_col_fun, cluster_columns = FALSE, column_title = "Enhancer")

# 颜色都是我们上面定义好的结果，可以自行更换颜色

我们把上面两部分合并起来看看效果：

Heatmap(mat_meth, name = "methylation", col = meth_col_fun, column_order= column_order, top_annotation = ha1, column_title = "Methylation") + Heatmap(direction, name = "direction", col = direction_col) + Heatmap(mat_expr[, column_tree$order], name = "expression", col = expr_col_fun, column_order = column_order, top_annotation = ha2, column_title = "Expression") + Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) + Heatmap(gene_type, name = "gene type", col = gene_type_col) + Heatmap(anno_gene, name = "anno_gene", col = anno_gene_col) + Heatmap(dist, name = "dist_tss", col = dist_col_fun) + Heatmap(anno_enhancer, name = "anno_enhancer", col = enhancer_col_fun, cluster_columns = FALSE, column_title = "Enhancer")

这种图形其实就是不断画热图然后叠加在一起就好了，除了上面这些对象，你还可以不断继续叠加内容，甚至比如单个基因的logFC值，这样一个图片展示的内容就更多了，等等等等，是不是还差点什么？还没划分聚类？好吧好吧，那我们就来加上去，咋搞呢？我们这个系列第一次推文给大家介绍了draw函数还记得吗？我知道你们不记得了，没关系，很简单，我们先把上面的结果赋值为ht_list：

ht_list <-Heatmap(mat_meth, name = "methylation", col = meth_col_fun,column_order= column_order,top_annotation = ha1,column_title = "Methylation") +Heatmap(direction, name = "direction", col = direction_col) +Heatmap(mat_expr[, column_tree$order], name = "expression", col = expr_col_fun,column_order = column_order,top_annotation = ha2,column_title = "Expression") +Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) +Heatmap(gene_type, name = "gene type", col = gene_type_col) +Heatmap(anno_gene, name = "anno_gene", col = anno_gene_col) +Heatmap(dist, name = "dist_tss", col = dist_col_fun) +Heatmap(anno_enhancer, name = "anno_enhancer", col = enhancer_col_fun,cluster_columns = FALSE,column_title = "Enhancer")

然后利用draw函数绘图并且设置聚类：

draw(ht_list,row_km = 2, row_split = direction,column_title = "Comprehensive correspondence between methylation, expression and other genomic features", column_title_gp = gpar(fontsize = 12, fontface = "bold"), merge_legends = TRUE, heatmap_legend_side = "bottom")

这样我们整个图片就大功告成，最开始的图片就复现完成啦！

好了，今天的内容先到这里，后台回复“风29”就能获得本次推文所有数据和代码,这只是甲基化的上篇，我们下次将使用Cancer Cell杂志上面一篇文章的数据复现原文一张图片，敬请期待！我们下次再见吧！*^_^*

—

END

—

撰文丨风风

排版丨四金兄

值班 | 阿琛

主编丨小雪球

欢迎大家关注解螺旋生信频道-挑圈联靠公号~

继续阅读

阅读原文