高分文章不讲xx,竟然用这种图片
学会ComplexHeatmap包的组合技用法 上
大家好,我是风。这里是风风的CNS美图复现系列专栏。如果你能跟到这里,那你肯定是真爱无疑了,不知道有没用上ComplexHeatmap给你文章加上一两个图呢?再次感谢ComplexHeatmap包开发者顾博士,同时提醒大家,如果你使用了这个包,请引用文章:“Gu, Z. (2016) Complex heatmaps reveal patterns and correlations in multidimensional genomic data. DOI: 10.1093/bioinformatics/btw313”。 传送门
基础技能我们基本都学完了,今天我们继续学习一个组合技:用热图展示甲基化、表达量、基因组特征之间的关系,以期能够找到三者之间的潜在关系。如果你想方便一点,可以把数据整理成示例文件的格式,然后直接复制粘贴运行就等出图即可,当然最好还是能够综合前面学习的内容一步一步进行啦!
请先记住这张结果图,接着我们就要对它进行复现:
好,话不多说,我们马上开始!
画图第一步,当然是熟悉我们数据的特征啦,先把数据导入进来:
load("Meth.rda")
type <- res_list$type
head(type)
# [1] "Tumor" "Tumor" "Tumor" "Tumor" "Tumor" "Tumor"
mat_meth <- res_list$mat_meth
head(mat_meth)
# sample1 sample2 sample3 sample4 sample5 sample6 sample7
# [1,] 0.09508993 0.10253955 0.1242478 0.2108422 0.1147976 0.1191285 0.14215247
# [2,] 0.17894877 0.20436340 0.3663206 0.2851402 0.1415034 0.2746609 0.24187039
# [3,] 0.19883432 0.09013638 0.1634693 0.1049659 0.1783066 0.2483717 0.09150014
# [4,] 0.33309078 0.22795393 0.1966336 0.2611358 0.4468088 0.5024891 0.32202946
# [5,] 0.37314951 0.37459873 0.4272505 0.3935484 0.4477683 0.2472484 0.45435208
# [6,] 0.16395173 0.23372889 0.1180172 0.2565564 0.1958752 0.1374908 0.16019662
# sample8 sample9 sample10 sample11 sample12 sample13 sample14
# [1,] 0.1455233 0.1238175 0.2356176 0.4242254 0.4987964 0.4239122 0.3477856
# [2,] 0.2639306 0.3162204 0.2806442 0.2645001 0.5272360 0.5437152 0.3818492
# [3,] 0.1890816 0.1600013 0.1757942 0.2148016 0.3090826 0.5236685 0.2294759
# [4,] 0.5213153 0.4827805 0.2993246 0.2261042 0.1588585 0.1133573 0.1050069
# [5,] 0.4223604 0.4816269 0.3474320 0.3985611 0.5330987 0.5689659 0.5208801
# [6,] 0.2295001 0.1835948 0.1855747 0.2443270 0.4221594 0.4097455 0.3539166
# sample15 sample16 sample17 sample18 sample19 sample20
# [1,] 0.4225247 0.3855191 0.4825476 0.4302995 0.3766141 0.2590889
# [2,] 0.4944915 0.3471979 0.4010059 0.2298062 0.4378647 0.3602383
# [3,] 0.3303636 0.3470328 0.4227933 0.3855401 0.3916592 0.4209159
# [4,] 0.1308567 0.1549738 0.1158741 0.2261024 0.1545304 0.2265255
# [5,] 0.7322305 0.4907163 0.4421528 0.6781560 0.4727099 0.5861254
# [6,] 0.3036062 0.4667574 0.3621896 0.3737679 0.4760492 0.3609025
mat_expr <- res_list$mat_expr
head(mat_meth)
# sample1 sample2 sample3 sample4 sample5 sample6 sample7
# [1,] 0.09508993 0.10253955 0.1242478 0.2108422 0.1147976 0.1191285 0.14215247
# [2,] 0.17894877 0.20436340 0.3663206 0.2851402 0.1415034 0.2746609 0.24187039
# [3,] 0.19883432 0.09013638 0.1634693 0.1049659 0.1783066 0.2483717 0.09150014
# [4,] 0.33309078 0.22795393 0.1966336 0.2611358 0.4468088 0.5024891 0.32202946
# [5,] 0.37314951 0.37459873 0.4272505 0.3935484 0.4477683 0.2472484 0.45435208
# [6,] 0.16395173 0.23372889 0.1180172 0.2565564 0.1958752 0.1374908 0.16019662
# sample8 sample9 sample10 sample11 sample12 sample13 sample14
# [1,] 0.1455233 0.1238175 0.2356176 0.4242254 0.4987964 0.4239122 0.3477856
# [2,] 0.2639306 0.3162204 0.2806442 0.2645001 0.5272360 0.5437152 0.3818492
# [3,] 0.1890816 0.1600013 0.1757942 0.2148016 0.3090826 0.5236685 0.2294759
# [4,] 0.5213153 0.4827805 0.2993246 0.2261042 0.1588585 0.1133573 0.1050069
# [5,] 0.4223604 0.4816269 0.3474320 0.3985611 0.5330987 0.5689659 0.5208801
# [6,] 0.2295001 0.1835948 0.1855747 0.2443270 0.4221594 0.4097455 0.3539166
# sample15 sample16 sample17 sample18 sample19 sample20
# [1,] 0.4225247 0.3855191 0.4825476 0.4302995 0.3766141 0.2590889
# [2,] 0.4944915 0.3471979 0.4010059 0.2298062 0.4378647 0.3602383
# [3,] 0.3303636 0.3470328 0.4227933 0.3855401 0.3916592 0.4209159
# [4,] 0.1308567 0.1549738 0.1158741 0.2261024 0.1545304 0.2265255
# [5,] 0.7322305 0.4907163 0.4421528 0.6781560 0.4727099 0.5861254
# [6,] 0.3036062 0.4667574 0.3621896 0.3737679 0.4760492 0.3609025
direction <- res_list$direction
head(direction)
# [1] "hypo" "hypo" "hypo" "hyper" "hypo" "hypo"
cor_pvalue <- res_list$cor_pvalue
head(cor_pvalue)
# [1] 0.65603378 0.57822828 0.34949399 0.02867702 1.23879545 0.10946518
gene_type <- res_list$gene_type
head(gene_type)
# [1] "protein_coding" "psedo-gene" "psedo-gene" "lincRNA"
# [5] "others" "psedo-gene"
anno_gene <- res_list$anno_gene
head(anno_gene)
# [1] "intergenic" "intergenic" "intergenic" "intergenic" "intergenic"
# [6] "TSS"
dist <- res_list$dist
head(dist)
# [1] 388737 97280 36426 420011 355666 317548
anno_enhancer <- res_list$anno_enhancer
head(anno_enhancer)
# enhancer_1 enhancer_2 enhancer_3
# [1,] 0.07196473 0.0000000 0.2879309
# [2,] 0.75837107 0.3554948 0.0000000
# [3,] 0.00000000 0.8477110 0.0000000
# [4,] 0.00000000 0.0000000 0.0000000
# [5,] 0.57110298 0.0000000 0.9173112
# [6,] 0.00000000 0.0000000 0.6795756
对数据进行解释:加载的res_list数据总共有9个list:
type是样本类型,标注肿瘤或者正常样本;
mat_meth是甲基化水平矩阵,其中行是差异甲基化区域(DMR),矩阵中的值是每个样品中DMR中的平均甲基化水平;
mat_expr是基因表达矩阵,其中行对应与DMR相关的基因(即离DMR最近的基因),矩阵中的值是每个样本中每个基因的表达水平;
direction表示甲基化趋势,在肿瘤中是高甲基化还是低甲基化;
cor_pvalue表示甲基化水平与相关基因表达量的相关性检验p值;
gene_type表示基因类型,是mRNA或者lncRNA;
anno_gene表示基因功能区域,是intergenic, intragenic 或者 TSS;
dist表示相关基因从DMR到TSS的距离;anno_enhancer表示部分DMR与增强子重叠。
好了,了解了数据的构成,我们就可以开始画图了。
接下来我们开始绘图,首先计算甲基化矩阵的列的聚类,以便调整表达矩阵中的列,使得和甲基化矩阵中的列具有相同的列顺序:
column_tree <- hclust(dist(t(mat_meth)))
head(column_tree)
# $merge
# [,1] [,2]
# [1,] -5 -7
# [2,] -3 -9
# [3,] -1 -2
# [4,] -4 -10
# [5,] -6 1
# [6,] 2 3
# [7,] -8 5
# [8,] 4 6
# [9,] 7 8
# [10,] -12 -18
# [11,] -13 -14
# [12,] -16 -17
# [13,] -11 -20
# [14,] -15 -19
# [15,] 10 14
# [16,] 12 13
# [17,] 11 16
# [18,] 15 17
# [19,] 9 18
#
# $height
# [1] 3.900199 3.901578 3.902258 3.910749 3.967042 3.968184 4.031672 4.167456
# [9] 4.211256 4.758352 4.764540 4.788488 4.790655 4.798932 4.939085 4.961162
# [17] 5.065518 5.141802 8.025530
#
# $order
# [1] 8 6 5 7 4 10 3 9 1 2 12 18 15 19 13 14 16 17 11 20
#
# $labels
# [1] "sample1" "sample2" "sample3" "sample4" "sample5" "sample6"
# [7] "sample7" "sample8" "sample9" "sample10" "sample11" "sample12"
# [13] "sample13" "sample14" "sample15" "sample16" "sample17" "sample18"
# [19] "sample19" "sample20"
#
# $method
# [1] "complete"
#
# $call
# hclust(d = dist(t(mat_meth)))
column_order <- column_tree$order # 提取顺序的向量
接着我们先定义一些颜色,方便在后续绘图可以直接设置,让代码更加简洁:
library(RColorBrewer)
library(circlize)
# 定义颜色
meth_col_fun <- colorRamp2(c(0, 0.5, 1), c("blue", "white", "red"))
direction_col <- c("hyper" = "red", "hypo" = "blue")
expr_col_fun <- colorRamp2(c(-2, 0, 2), c("green", "white", "red"))
pvalue_col_fun <- colorRamp2(c(0, 2, 4), c("white", "blue", "red"))
gene_type_col <- structure(brewer.pal(length(unique(gene_type)), "Set3"),
names = unique(gene_type))
anno_gene_col <- structure(brewer.pal(length(unique(anno_gene)), "Set1"),
names = unique(anno_gene))
dist_col_fun <- colorRamp2(c(0, 10000), c("black", "white"))
enhancer_col_fun <- colorRamp2(c(0, 1), c("white", "orange"))
# 这一步目的是对前面每一个参数的颜色进行定义,让图片更加美观
我们来分析一下图片,图片是由两个热图主体加上注释合并在一起绘制而成,按照我们“拆”跟“叠”的二字法则,我们先把注释内容用HeatmapAnnotation整理出来:
library (ComplexHeatmap)
ha1 <- HeatmapAnnotation(type = type,
col = list(type = c("Tumor" = "red", "Control" = "royalblue")),
annotation_name_side = "left") # 第一个热图主体的样本注释
ha2 <- HeatmapAnnotation(type = type,
col = list(type = c("Tumor" = "red", "Control" = "royalblue")),
show_legend = FALSE) # 第二个热图主体的样本注释
弄好注释之后,我们先来画第一个热图:
# 使用甲基化矩阵进行绘图,其余参数都为上面注释好的内容
name = "methylation",
col = meth_col_fun,
column_order= column_order,
top_annotation = ha1,
column_title = "Methylation")
第一个热图画好之后,我们可以同样的方法再画右边的热图,并且将两个图形合并在一起:
# 使用甲基化矩阵进行绘图,其余参数都为上面注释好的内容
name = "methylation",
col = meth_col_fun,
column_order= column_order,
top_annotation = ha1,
column_title = "Methylation") +
column_tree$order], # 调整矩阵列的顺序与前面第一个矩阵一致
name = "expression",
col = expr_col_fun,
column_order = column_order,
top_annotation = ha2,
column_title = "Expression")
两个热图主体中间有一个direction,我们把它加上去:(由于direction在两个主体热图中间,因此我们需要调整顺序,也就是 甲基化热图+direction+表达量热图)
Heatmap(mat_meth, name = "methylation", col = meth_col_fun,
column_order= column_order,
top_annotation = ha1, column_title = "Methylation") +
Heatmap(direction, name = "direction", col = direction_col) + # 加上direction热图
Heatmap(mat_expr[, column_tree$order], name = "expression",
col = expr_col_fun,
column_order = column_order,
top_annotation = ha2, column_title = "Expression")
这时候的热图基本已经完成,但是还差一些各种各样的注释,我们一个一个加上去,怎么加呢:我们先看只有一个注释时候怎么画:
Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) # 绘制p值
发现没有?还是按照画热图的方法,将数据看成是只有一列的矩阵就好,那如果想画所有的注释是不是只需要像上面一样不断用+呢?没错,聪明的孩子有漂亮图出,我们来画一下所有注释:
Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) +
Heatmap(gene_type, name = "gene type", col = gene_type_col) +
Heatmap(anno_gene, name = "anno_gene", col = anno_gene_col) +
Heatmap(dist, name = "dist_tss", col = dist_col_fun) +
Heatmap(anno_enhancer, name = "anno_enhancer", col = enhancer_col_fun,
cluster_columns = FALSE, column_title = "Enhancer")
# 颜色都是我们上面定义好的结果,可以自行更换颜色
我们把上面两部分合并起来看看效果:
Heatmap(mat_meth, name = "methylation", col = meth_col_fun,
column_order= column_order,
top_annotation = ha1, column_title = "Methylation") +
Heatmap(direction, name = "direction", col = direction_col) +
Heatmap(mat_expr[, column_tree$order], name = "expression",
col = expr_col_fun,
column_order = column_order,
top_annotation = ha2, column_title = "Expression") +
Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) +
Heatmap(gene_type, name = "gene type", col = gene_type_col) +
Heatmap(anno_gene, name = "anno_gene", col = anno_gene_col) +
Heatmap(dist, name = "dist_tss", col = dist_col_fun) +
Heatmap(anno_enhancer, name = "anno_enhancer", col = enhancer_col_fun,
cluster_columns = FALSE, column_title = "Enhancer")
这种图形其实就是不断画热图然后叠加在一起就好了,除了上面这些对象,你还可以不断继续叠加内容,甚至比如单个基因的logFC值,这样一个图片展示的内容就更多了,等等等等,是不是还差点什么?还没划分聚类?好吧好吧,那我们就来加上去,咋搞呢?我们这个系列第一次推文给大家介绍了draw函数还记得吗?我知道你们不记得了,没关系,很简单,我们先把上面的结果赋值为ht_list:
ht_list <-Heatmap(mat_meth, name = "methylation", col = meth_col_fun,
column_order= column_order,
top_annotation = ha1,column_title = "Methylation") +
Heatmap(direction, name = "direction", col = direction_col) +
Heatmap(mat_expr[, column_tree$order], name = "expression",
col = expr_col_fun,
column_order = column_order,
top_annotation = ha2,column_title = "Expression") +
Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) +
Heatmap(gene_type, name = "gene type", col = gene_type_col) +
Heatmap(anno_gene, name = "anno_gene", col = anno_gene_col) +
Heatmap(dist, name = "dist_tss", col = dist_col_fun) +
Heatmap(anno_enhancer, name = "anno_enhancer", col = enhancer_col_fun,
cluster_columns = FALSE,column_title = "Enhancer")
然后利用draw函数绘图并且设置聚类:
row_km = 2,
row_split = direction,
column_title = "Comprehensive correspondence between methylation, expression and other genomic features",
column_title_gp = gpar(fontsize = 12, fontface = "bold"),
merge_legends = TRUE, heatmap_legend_side = "bottom")
这样我们整个图片就大功告成,最开始的图片就复现完成啦!
好了,今天的内容先到这里,后台回复“风29”就能获得本次推文所有数据和代码,这只是甲基化的上篇,我们下次将使用Cancer Cell杂志上面一篇文章的数据复现原文一张图片,敬请期待!我们下次再见吧!*^_^*
—
END—
撰文丨风 风
排版丨四金兄
值班 | 阿 琛
主编丨小雪球
欢迎大家关注解螺旋生信频道-挑圈联靠公号~
最新评论
推荐文章
作者最新文章
你可能感兴趣的文章
Copyright Disclaimer: The copyright of contents (including texts, images, videos and audios) posted above belong to the User who shared or the third-party website which the User shared from. If you found your copyright have been infringed, please send a DMCA takedown notice to [email protected]. For more detail of the source, please click on the button "Read Original Post" below. For other communications, please send to [email protected].
版权声明:以上内容为用户推荐收藏至CareerEngine平台,其内容(含文字、图片、视频、音频等)及知识版权均属用户或用户转发自的第三方网站,如涉嫌侵权,请通知[email protected]进行信息删除。如需查看信息来源,请点击“查看原文”。如需洽谈其它事宜,请联系[email protected]。
版权声明:以上内容为用户推荐收藏至CareerEngine平台,其内容(含文字、图片、视频、音频等)及知识版权均属用户或用户转发自的第三方网站,如涉嫌侵权,请通知[email protected]进行信息删除。如需查看信息来源,请点击“查看原文”。如需洽谈其它事宜,请联系[email protected]。