学会ComplexHeatmap包的组合技用法 上
大家好,我是风。这里是风风的CNS美图复现系列专栏。如果你能跟到这里,那你肯定是真爱无疑了,不知道有没用上ComplexHeatmap给你文章加上一两个图呢?再次感谢ComplexHeatmap包开发者顾博士,同时提醒大家,如果你使用了这个包,请引用文章:“Gu, Z. (2016) Complex heatmaps reveal patterns and correlations in multidimensional genomic data. DOI: 10.1093/bioinformatics/btw313”。
传送门
基础技能我们基本都学完了,今天我们继续学习一个组合技:用热图展示甲基化、表达量、基因组特征之间的关系,以期能够找到三者之间的潜在关系。如果你想方便一点,可以把数据整理成示例文件的格式,然后直接复制粘贴运行就等出图即可,当然最好还是能够综合前面学习的内容一步一步进行啦!
请先记住这张结果图,接着我们就要对它进行复现:


好,话不多说,我们马上开始!
加载数据
画图第一步,当然是熟悉我们数据的特征啦,先把数据导入进来:
load("Meth.rda")type <- res_list$typehead(type)## [1] "Tumor" "Tumor" "Tumor" "Tumor" "Tumor" "Tumor"mat_meth <- res_list$mat_methhead(mat_meth)## sample1 sample2 sample3 sample4 sample5 sample6 sample7## [1,] 0.09508993 0.10253955 0.1242478 0.2108422 0.1147976 0.1191285 0.14215247## [2,] 0.17894877 0.20436340 0.3663206 0.2851402 0.1415034 0.2746609 0.24187039## [3,] 0.19883432 0.09013638 0.1634693 0.1049659 0.1783066 0.2483717 0.09150014## [4,] 0.33309078 0.22795393 0.1966336 0.2611358 0.4468088 0.5024891 0.32202946## [5,] 0.37314951 0.37459873 0.4272505 0.3935484 0.4477683 0.2472484 0.45435208## [6,] 0.16395173 0.23372889 0.1180172 0.2565564 0.1958752 0.1374908 0.16019662## sample8 sample9 sample10 sample11 sample12 sample13 sample14## [1,] 0.1455233 0.1238175 0.2356176 0.4242254 0.4987964 0.4239122 0.3477856## [2,] 0.2639306 0.3162204 0.2806442 0.2645001 0.5272360 0.5437152 0.3818492## [3,] 0.1890816 0.1600013 0.1757942 0.2148016 0.3090826 0.5236685 0.2294759## [4,] 0.5213153 0.4827805 0.2993246 0.2261042 0.1588585 0.1133573 0.1050069## [5,] 0.4223604 0.4816269 0.3474320 0.3985611 0.5330987 0.5689659 0.5208801## [6,] 0.2295001 0.1835948 0.1855747 0.2443270 0.4221594 0.4097455 0.3539166## sample15 sample16 sample17 sample18 sample19 sample20## [1,] 0.4225247 0.3855191 0.4825476 0.4302995 0.3766141 0.2590889## [2,] 0.4944915 0.3471979 0.4010059 0.2298062 0.4378647 0.3602383## [3,] 0.3303636 0.3470328 0.4227933 0.3855401 0.3916592 0.4209159## [4,] 0.1308567 0.1549738 0.1158741 0.2261024 0.1545304 0.2265255## [5,] 0.7322305 0.4907163 0.4421528 0.6781560 0.4727099 0.5861254## [6,] 0.3036062 0.4667574 0.3621896 0.3737679 0.4760492 0.3609025mat_expr <- res_list$mat_exprhead(mat_meth)## sample1 sample2 sample3 sample4 sample5 sample6 sample7## [1,] 0.09508993 0.10253955 0.1242478 0.2108422 0.1147976 0.1191285 0.14215247## [2,] 0.17894877 0.20436340 0.3663206 0.2851402 0.1415034 0.2746609 0.24187039## [3,] 0.19883432 0.09013638 0.1634693 0.1049659 0.1783066 0.2483717 0.09150014## [4,] 0.33309078 0.22795393 0.1966336 0.2611358 0.4468088 0.5024891 0.32202946## [5,] 0.37314951 0.37459873 0.4272505 0.3935484 0.4477683 0.2472484 0.45435208## [6,] 0.16395173 0.23372889 0.1180172 0.2565564 0.1958752 0.1374908 0.16019662## sample8 sample9 sample10 sample11 sample12 sample13 sample14## [1,] 0.1455233 0.1238175 0.2356176 0.4242254 0.4987964 0.4239122 0.3477856## [2,] 0.2639306 0.3162204 0.2806442 0.2645001 0.5272360 0.5437152 0.3818492## [3,] 0.1890816 0.1600013 0.1757942 0.2148016 0.3090826 0.5236685 0.2294759## [4,] 0.5213153 0.4827805 0.2993246 0.2261042 0.1588585 0.1133573 0.1050069## [5,] 0.4223604 0.4816269 0.3474320 0.3985611 0.5330987 0.5689659 0.5208801## [6,] 0.2295001 0.1835948 0.1855747 0.2443270 0.4221594 0.4097455 0.3539166## sample15 sample16 sample17 sample18 sample19 sample20## [1,] 0.4225247 0.3855191 0.4825476 0.4302995 0.3766141 0.2590889## [2,] 0.4944915 0.3471979 0.4010059 0.2298062 0.4378647 0.3602383## [3,] 0.3303636 0.3470328 0.4227933 0.3855401 0.3916592 0.4209159## [4,] 0.1308567 0.1549738 0.1158741 0.2261024 0.1545304 0.2265255## [5,] 0.7322305 0.4907163 0.4421528 0.6781560 0.4727099 0.5861254## [6,] 0.3036062 0.4667574 0.3621896 0.3737679 0.4760492 0.3609025direction <- res_list$directionhead(direction)## [1] "hypo" "hypo" "hypo" "hyper" "hypo" "hypo"cor_pvalue <- res_list$cor_pvaluehead(cor_pvalue)## [1] 0.65603378 0.57822828 0.34949399 0.02867702 1.23879545 0.10946518gene_type <- res_list$gene_typehead(gene_type)## [1] "protein_coding" "psedo-gene" "psedo-gene" "lincRNA" ## [5] "others" "psedo-gene"anno_gene <- res_list$anno_genehead(anno_gene)## [1] "intergenic" "intergenic" "intergenic" "intergenic" "intergenic"## [6] "TSS"dist <- res_list$disthead(dist)## [1] 388737 97280 36426 420011 355666 317548anno_enhancer <- res_list$anno_enhancerhead(anno_enhancer)## enhancer_1 enhancer_2 enhancer_3## [1,] 0.07196473 0.0000000 0.2879309## [2,] 0.75837107 0.3554948 0.0000000## [3,] 0.00000000 0.8477110 0.0000000## [4,] 0.00000000 0.0000000 0.0000000## [5,] 0.57110298 0.0000000 0.9173112## [6,] 0.00000000 0.0000000 0.6795756
对数据进行解释:加载的res_list数据总共有9个list:
type是样本类型,标注肿瘤或者正常样本;
mat_meth是甲基化水平矩阵,其中行是差异甲基化区域(DMR),矩阵中的值是每个样品中DMR中的平均甲基化水平;
mat_expr是基因表达矩阵,其中行对应与DMR相关的基因(即离DMR最近的基因),矩阵中的值是每个样本中每个基因的表达水平;
direction表示甲基化趋势,在肿瘤中是高甲基化还是低甲基化;
cor_pvalue表示甲基化水平与相关基因表达量的相关性检验p值;
gene_type表示基因类型,是mRNA或者lncRNA;
anno_gene表示基因功能区域,是intergenic, intragenic 或者 TSS;
dist表示相关基因从DMR到TSS的距离;anno_enhancer表示部分DMR与增强子重叠。
好了,了解了数据的构成,我们就可以开始画图了。
绘制图片
接下来我们开始绘图,首先计算甲基化矩阵的列的聚类,以便调整表达矩阵中的列,使得和甲基化矩阵中的列具有相同的列顺序:
column_tree <- hclust(dist(t(mat_meth))) head(column_tree)## $merge## [,1] [,2]## [1,] -5 -7## [2,] -3 -9## [3,] -1 -2## [4,] -4 -10## [5,] -6 1## [6,] 2 3## [7,] -8 5## [8,] 4 6## [9,] 7 8## [10,] -12 -18## [11,] -13 -14## [12,] -16 -17## [13,] -11 -20## [14,] -15 -19## [15,] 10 14## [16,] 12 13## [17,] 11 16## [18,] 15 17## [19,] 9 18## ## $height## [1] 3.900199 3.901578 3.902258 3.910749 3.967042 3.968184 4.031672 4.167456## [9] 4.211256 4.758352 4.764540 4.788488 4.790655 4.798932 4.939085 4.961162## [17] 5.065518 5.141802 8.025530## ## $order## [1] 8 6 5 7 4 10 3 9 1 2 12 18 15 19 13 14 16 17 11 20## ## $labels## [1] "sample1" "sample2" "sample3" "sample4" "sample5" "sample6" ## [7] "sample7" "sample8" "sample9" "sample10" "sample11" "sample12"## [13] "sample13" "sample14" "sample15" "sample16" "sample17" "sample18"## [19] "sample19" "sample20"## ## $method## [1] "complete"## ## $call## hclust(d = dist(t(mat_meth)))column_order <- column_tree$order # 提取顺序的向量
接着我们先定义一些颜色,方便在后续绘图可以直接设置,让代码更加简洁:
library(RColorBrewer)library(circlize)# 定义颜色meth_col_fun <- colorRamp2(c(0, 0.5, 1), c("blue", "white", "red"))direction_col <- c("hyper" = "red", "hypo" = "blue")expr_col_fun <- colorRamp2(c(-2, 0, 2), c("green", "white", "red"))pvalue_col_fun <- colorRamp2(c(0, 2, 4), c("white", "blue", "red"))gene_type_col <- structure(brewer.pal(length(unique(gene_type)), "Set3"), names = unique(gene_type))anno_gene_col <- structure(brewer.pal(length(unique(anno_gene)), "Set1"), names = unique(anno_gene))dist_col_fun <- colorRamp2(c(0, 10000), c("black", "white"))enhancer_col_fun <- colorRamp2(c(0, 1), c("white", "orange"))
# 这一步目的是对前面每一个参数的颜色进行定义,让图片更加美观我们来分析一下图片,图片是由两个热图主体加上注释合并在一起绘制而成,按照我们“拆”跟“叠”的二字法则,我们先把注释内容用HeatmapAnnotation整理出来:library (ComplexHeatmap)ha1 <- HeatmapAnnotation(type = type, col = list(type = c("Tumor" = "red", "Control" = "royalblue")), annotation_name_side = "left") # 第一个热图主体的样本注释ha2 <- HeatmapAnnotation(type = type, col = list(type = c("Tumor" = "red", "Control" = "royalblue")), show_legend = FALSE) # 第二个热图主体的样本注释
弄好注释之后,我们先来画第一个热图:
Heatmap(mat_meth,# 使用甲基化矩阵进行绘图,其余参数都为上面注释好的内容name = "methylation", col = meth_col_fun,column_order= column_order,top_annotation = ha1, column_title = "Methylation")
第一个热图画好之后,我们可以同样的方法再画右边的热图,并且将两个图形合并在一起:
Heatmap(mat_meth,# 使用甲基化矩阵进行绘图,其余参数都为上面注释好的内容name = "methylation", col = meth_col_fun,column_order= column_order,top_annotation = ha1, column_title = "Methylation") +Heatmap(mat_expr[,column_tree$order], # 调整矩阵列的顺序与前面第一个矩阵一致name = "expression", col = expr_col_fun, column_order = column_order, top_annotation = ha2, column_title = "Expression")
两个热图主体中间有一个direction,我们把它加上去:(由于direction在两个主体热图中间,因此我们需要调整顺序,也就是 甲基化热图+direction+表达量热图)
Heatmap(mat_meth, name = "methylation", col = meth_col_fun, column_order= column_order, top_annotation = ha1, column_title = "Methylation") + Heatmap(direction, name = "direction", col = direction_col) + # 加上direction热图 Heatmap(mat_expr[, column_tree$order], name = "expression", col = expr_col_fun, column_order = column_order, top_annotation = ha2, column_title = "Expression")
这时候的热图基本已经完成,但是还差一些各种各样的注释,我们一个一个加上去,怎么加呢:我们先看只有一个注释时候怎么画:
Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) # 绘制p值
发现没有?还是按照画热图的方法,将数据看成是只有一列的矩阵就好,那如果想画所有的注释是不是只需要像上面一样不断用+呢?没错,聪明的孩子有漂亮图出,我们来画一下所有注释:
Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) + Heatmap(gene_type, name = "gene type", col = gene_type_col) + Heatmap(anno_gene, name = "anno_gene", col = anno_gene_col) + Heatmap(dist, name = "dist_tss", col = dist_col_fun) + Heatmap(anno_enhancer, name = "anno_enhancer", col = enhancer_col_fun, cluster_columns = FALSE, column_title = "Enhancer")
# 颜色都是我们上面定义好的结果,可以自行更换颜色
我们把上面两部分合并起来看看效果:
Heatmap(mat_meth, name = "methylation", col = meth_col_fun, column_order= column_order, top_annotation = ha1, column_title = "Methylation") + Heatmap(direction, name = "direction", col = direction_col) + Heatmap(mat_expr[, column_tree$order], name = "expression", col = expr_col_fun, column_order = column_order, top_annotation = ha2, column_title = "Expression") + Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) + Heatmap(gene_type, name = "gene type", col = gene_type_col) + Heatmap(anno_gene, name = "anno_gene", col = anno_gene_col) + Heatmap(dist, name = "dist_tss", col = dist_col_fun) + Heatmap(anno_enhancer, name = "anno_enhancer", col = enhancer_col_fun, cluster_columns = FALSE, column_title = "Enhancer")
这种图形其实就是不断画热图然后叠加在一起就好了,除了上面这些对象,你还可以不断继续叠加内容,甚至比如单个基因的logFC值,这样一个图片展示的内容就更多了,等等等等,是不是还差点什么?还没划分聚类?好吧好吧,那我们就来加上去,咋搞呢?我们这个系列第一次推文给大家介绍了draw函数还记得吗?我知道你们不记得了,没关系,很简单,我们先把上面的结果赋值为ht_list:
ht_list <-Heatmap(mat_meth, name = "methylation", col = meth_col_fun,column_order= column_order,top_annotation = ha1,column_title = "Methylation") +Heatmap(direction, name = "direction", col = direction_col) +Heatmap(mat_expr[, column_tree$order], name = "expression", col = expr_col_fun,column_order = column_order,top_annotation = ha2,column_title = "Expression") +Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) +Heatmap(gene_type, name = "gene type", col = gene_type_col) +Heatmap(anno_gene, name = "anno_gene", col = anno_gene_col) +Heatmap(dist, name = "dist_tss", col = dist_col_fun) +Heatmap(anno_enhancer, name = "anno_enhancer", col = enhancer_col_fun,cluster_columns = FALSE,column_title = "Enhancer")
然后利用draw函数绘图并且设置聚类:
draw(ht_list,row_km = 2, row_split = direction,column_title = "Comprehensive correspondence between methylation, expression and other genomic features", column_title_gp = gpar(fontsize = 12, fontface = "bold"), merge_legends = TRUE, heatmap_legend_side = "bottom")
这样我们整个图片就大功告成,最开始的图片就复现完成啦!
好了,今天的内容先到这里,后台回复“风29”就能获得本次推文所有数据和代码,这只是甲基化的上篇,我们下次将使用Cancer Cell杂志上面一篇文章的数据复现原文一张图片,敬请期待!我们下次再见吧!*^_^*
END

撰文丨风   风
排版丨四金兄
值班 | 阿   琛
主编丨小雪球
欢迎大家关注解螺旋生信频道-挑圈联靠公号~
继续阅读
阅读原文