一文学会ComplexHeatmap的花式热图分组
大家好,我是风。又到了ComplexHeatmap狂欢时间!上回有学员宝宝留言“想看花式热图分组”(传送门),这回我们就一起来看看通过不同聚类方式的热图分组!
传送门
手把手教你复现一篇CNS级别美图!附代码,建议收藏!
CNS级别的美图复现起来难不难?小白也能快速上手(附代码)(二)
加载包
# 加载R包library(ComplexHeatmap)library(circlize)# 构建一个跟之前一样的矩阵set.seed(123)nr1 = 4; nr2 = 8; nr3 = 6; nr = nr1 + nr2 + nr3nc1 = 6; nc2 = 8; nc3 = 10; nc = nc1 + nc2 + nc3mat <- cbind(rbind(matrix(rnorm(nr1*nc1, mean = 1, sd = 0.5), nr = nr1), matrix(rnorm(nr2*nc1, mean = 0, sd = 0.5), nr = nr2), matrix(rnorm(nr3*nc1, mean = 0, sd = 0.5), nr = nr3)), rbind(matrix(rnorm(nr1*nc2, mean = 0, sd = 0.5), nr = nr1), matrix(rnorm(nr2*nc2, mean = 1, sd = 0.5), nr = nr2), matrix(rnorm(nr3*nc2, mean = 0, sd = 0.5), nr = nr3)), rbind(matrix(rnorm(nr1*nc3, mean = 0.5, sd = 0.5), nr = nr1), matrix(rnorm(nr2*nc3, mean = 0.5, sd = 0.5), nr = nr2), matrix(rnorm(nr3*nc3, mean = 1, sd = 0.5), nr = nr3)))mat <- mat[sample(nr, nr), sample(nc, nc)] rownames(mat) = paste0("gene", seq_len(nr))colnames(mat) = paste0("Sample", seq_len(nc))head(mat)## Sample1 Sample2 Sample3 Sample4 Sample5 Sample6## gene1 0.90474160 -0.35229823 0.5016096 1.26769942 0.8251229 0.16215217## gene2 0.90882972 0.79157121 1.0726316 0.01299521 0.1391978 0.46833693## gene3 0.28074668 0.02987497 0.7052595 1.21514235 0.1747267 0.20949120## gene4 0.02729558 0.75810969 0.5333504 -0.49637424 -0.5261114 0.56724357## gene5 -0.32552445 1.03264652 1.1249573 0.66695147 0.4490584 1.04236865## gene6 0.58403269 -0.47373731 0.5452483 0.86824798 -0.1976372 -0.03565404## Sample7 Sample8 Sample9 Sample10 Sample11 Sample12## gene1 -0.2869867 0.68032622 -0.1629658 0.8254537 0.7821773 -0.49625358## gene2 1.2814948 0.38998256 -0.3473535 1.3508922 1.1183375 2.05005447## gene3 -0.6423579 -0.31395304 0.2175907 -0.2973086 0.4322058 -0.25803192## gene4 0.8127096 -0.01427338 1.0844780 0.2426662 0.8783874 1.38452112## gene5 2.6205200 0.75823530 -0.2333277 1.3439584 0.8517620 0.85980233## gene6 -0.3203530 1.05534136 0.7771690 0.4594983 0.2550648 -0.02778098## Sample13 Sample14 Sample15 Sample16 Sample17 Sample18## gene1 -0.0895258 -0.35520328 0.1072694 0.96322199 -0.39245223 -0.1878014## gene2 1.3770269 -0.77437640 0.9829664 0.23854381 -0.53589561 1.3003544## gene3 -0.5686518 -0.51321045 -0.0451598 0.82272880 -0.02251386 0.2427300## gene4 0.8376570 0.10797078 0.4520019 0.81648036 0.02650211 0.4072600## gene5 1.9986067 -0.50928769 0.7708173 -0.09421702 -1.15458444 1.2715970## gene6 -0.2112484 0.01669142 -0.3259750 0.54460361 0.89101254 0.3699738## Sample19 Sample20 Sample21 Sample22 Sample23 Sample24## gene1 1.08736320 0.7132199 -0.1853300 -0.14238650 0.6407669 1.3266288## gene2 0.14237891 0.4471643 0.4475628 -0.31251963 0.7057150 0.8120937## gene3 1.09511516 0.5852612 0.1926402 0.51278568 1.6361334 1.1339175## gene4 -0.02625664 0.9556956 0.3443201 0.07668656 1.7857291 0.5280084## gene5 0.60599022 0.8304101 -0.1902355 -0.14753574 1.0123366 0.2140750## gene6 0.81537706 1.2737905 1.8575325 0.88491126 0.5670193 0.5372756
热图分组
ComplexHeatmap在热图切割分组上面有非常大的操作空间,就如我们前面所说的cluster一样。使用不一样的聚类方法就可以得到不同的聚类热图,主要使用的参数是row_km, row_split, column_km, column_split
下面我们分别来看一下ComplexHeatmap都可以怎么样对热图进行分组:
###通过k-means聚类进行分割 我们可以直接使用row_km 和 column_km 对热图进行切割:
# 切割成两行Heatmap(mat, name = "mat", row_km = 2)
# 切割成三列Heatmap(mat, name = "mat", column_km = 3)
# 切割成两行三列Heatmap(mat,name = "mat", row_km = 2, row_km_repeats = 100,column_km = 3, column_km_repeats = 100)
# 最后面的代码比前两个多了repeats参数,这是因为Heatmap()会在数据的随机起点调用kmeans,这可能会导致重复运行生成不同的簇。为了这个问题,我们可以将row_km_repeats和column_km_repeats设置为大于1的数字以多次运行kmeans(),并使用最终的共识k均值聚类(有点像我们设置随机种子seed)
当然也可以跟上次推文所说的内容结合起来进行切割:
# 定义矩阵和距离计算函数mat_with_outliers = matfor(i in 1:10) mat_with_outliers[i, i] = 1000# 设置离群值为1000robust_dist = function(x, y) {qx = quantile(x, c(0.1, 0.9)) # 用quantile取百分位比,此处为10%和90% qy = quantile(y, c(0.1, 0.9)) l = x > qx[1] & x < qx[2] & y > qy[1] & y < qy[2]x = x[l]y = y[l]sqrt(sum((x - y)^2)) # sqrt函数取平方根}Heatmap(mat_with_outliers, name = "mat", col = colorRamp2(c(-2, 0, 2), c("navy", "white", "red")), clustering_distance_rows = robust_dist, clustering_distance_columns = robust_dist, row_km = 3, column_km = 2, column_title = "使用鲁棒距离法")
###通过变量类型进行切割 有时候我们需要对绘制热图的变量进行切割,比如进行差异分析后定义上下调基因,或者是定义m6A相关基因的类型,也可以是定义某些基因某条通路相关基因,从而把我们要突出展示的基因集合分成单独的类进行展示,这时候就可以利用变量类型进行切割:
# 已知分子类型和样本类型对应的个数,则可以使用rep函数来定义并进行分割Heatmap(mat, name = "mat", row_split = rep(c("Immune", "Resistant"), c(7,11)), column_split = rep(c("group1", "group2"), 12))
但是大部分时候我们只会得到一个表格,这个表格告诉了我们分子类型和分子的对应的关系,例如m6A分子中哪些是reader,哪些是eraser,哪些是writer,这时候我们可以利用数据框来进行分割:
# 先构建一个数据框Type <- data.frame(rep(c("writer", "reader","eraser"), 6))rownames(Type) <- row.names(mat) head(Type)## rep.c..writer....reader....eraser....6.## gene1 writer## gene2 reader## gene3 eraser## gene4 writer## gene5 reader## gene6                                  eraser# 使用数据框进行注释Heatmap(mat, name = "mat", row_split = Type)
# 对列名添加一个维度Heatmap(mat, name = "mat", row_split = Type, column_split = factor(rep(c("Normal", "Tumor"), c(8,16))))
那么能不能把kmeans的row_km/column_km和split混合使用呢?我们试试:
Heatmap(mat, name = "mat", row_split = rep(c("A", "B"), 9), row_km = 2) # 2×2=4组
# 配合我们前面推文讲过的内容,那么上面的代码实际上等同于:# kmeans聚类cl <- kmeans(mat, centers = 2)$cluster# 将聚类结果映射到热图Heatmap(mat, name = "mat", row_split = cbind(cl, rep(c("A", "B"), 9)))
如果你不喜欢Kmeans聚类,也可以直接套用其他聚类方法:
library(cluster)# 使用PAM聚类法pa <- pam(mat, k = 3)$clusteringHeatmap(mat, name = "mat", row_split = paste0("pam", pa))
# 使用hclust聚类法hc <- hclust(dist(mat))plot(hc)
out.hc <- cutree(hc, k=2)Heatmap(mat, name = "mat", row_split = paste0("hclust", out.hc))
###通过聚类树进行切割# 按照聚类树进行切割 library(dendextend)dend<- hclust(dist(mat))dend<- color_branches(dend, h=5, k = 2)plot(dend)
Heatmap(mat, name = "mat", cluster_rows = dend, row_split = 2)
# 聚类树联合变量进行分割split <- data.frame(cutree(hclust(dist(mat)), k = 2), rep(c("Up", "Down"), 9))Heatmap(mat, name = "mat", row_split = split)

对切割的热图进行再修饰

给每一部分热图的标题按照特定的顺序进行排序:
# 通常聚类后,每一部分热图的标题都会受到聚类影响而无法按照固定顺序进行排序Heatmap(mat,name = "mat", row_split = rep(LETTERS[1:3], 6),column_split = rep(letters[1:6], 4))
# 那么我们可以设置一个factor赋予行列名等级Heatmap(mat,name = "mat", row_split = factor(rep(LETTERS[1:3], 6), levels = LETTERS[3:1]),column_split = factor(rep(letters[1:6], 4), levels = letters[6:1]))
# 这时候还可以只保留子聚类树Heatmap(mat,name = "mat", row_split = factor(rep(LETTERS[1:3], 6), levels = LETTERS[3:1]),column_split = factor(rep(letters[1:6], 4), levels = letters[6:1]),cluster_row_slices = FALSE, cluster_column_slices = FALSE)
# 看起来是不是很有迷惑性?感觉就像是好几个数据集分别做热图一样?现在知道那些看来来很高大上的热图不是靠AI拼起来的了吧?
如果行名列名比较复杂怎么办?能对列名进行设置吗?当然可以,ComplexHeatmap具有三种类型的设置:
# 使用sprintf()中的%ssplit = data.frame(rep(c("up", "down"), each = 9), rep(c("Immune", "Resistance"), 9))Heatmap(mat, name = "mat", row_split = split)
Heatmap(mat, name = "mat", row_split = split, row_title = "%s|%s")
# 上面的名称太长,不利于展示,我们可以旋转一下Heatmap(mat, name = "mat", row_split = split, row_title = "%s|%s", row_title_rot = 0)
# 合并聚类树的设置Heatmap(mat, name = "mat", row_split = split, column_split = factor(rep(c("ICGC","TCGA"), c(8,16)), levels = c("ICGC","TCGA")), cluster_row_slices = FALSE, cluster_column_slices = FALSE, row_title = "%s|%s", row_title_rot = 0)
# 有内味儿了吗?
对于切割出来的每部分“子热图”,我们还能怎么修饰呢?
library(RColorBrewer)brewer.pal(7,"Set1")## [1] "#E41A1C""#377EB8""#4DAF4A""#984EA3""#FF7F00""#FFFF33""#A65628"col_fun <- c("#E41A1C","#377EB8","#4DAF4A","#984EA3")Heatmap(mat, name = "mat", row_split = split, row_title_gp = gpar(col = col_fun, font = 1:4), row_names_gp = gpar(col = col_fun, fontsize = c(10,11,12,13)), column_split = factor(rep(c("ICGC","TCGA"), c(8,16)), levels = c("ICGC","TCGA")), column_title_gp = gpar(fill = c("#FF7F00","#A65628"), font = 1:2), column_names_gp = gpar(col = c("#FF7F00","#A65628"), fontsize = c(10, 14)), cluster_row_slices = FALSE, cluster_column_slices = FALSE, row_title = "%s|%s", row_title_rot = 0)
在上面的图片中,我们可能觉得热图之间的间隔太小,想要设置大一点,最好再加个边框,我们可以用row_gap/column_gap和border进行设置:
Heatmap(mat, name = "mat", row_split = split, row_title_gp = gpar(col = col_fun, font = 1:4), row_names_gp = gpar(col = col_fun, fontsize = c(10,11,12,13)), column_split = factor(rep(c("ICGC","TCGA"), c(8,16)), levels = c("ICGC","TCGA")), column_title_gp = gpar(fill = c("#FF7F00","#A65628"), font = 1:2), column_names_gp = gpar(col = c("#FF7F00","#A65628"), fontsize = c(10, 14)), cluster_row_slices = FALSE, cluster_column_slices = FALSE, row_title = "%s|%s", row_title_rot = 0, column_gap = unit(5, "mm"), row_gap = unit(c(2,4,6), "mm"), border = "blue")
现在这个热图已经很好看了,但是还不够,还想再好看一点点,还能做什么呢?要不就来个注释吧?标注其实可以玩出很多花样,我们这里先来两个给大家尝尝鲜:
Heatmap(mat, name = "mat", row_split = split, row_title_gp = gpar(col = col_fun, font = 1:4), row_names_gp = gpar(col = col_fun, fontsize = c(8,9,10,11)), column_split = factor(rep(c("ICGC","TCGA"), c(8,16)), levels = c("ICGC","TCGA")), column_title_gp = gpar(fill = c("#FF7F00","#A65628"), font = 1:2), column_names_gp = gpar(col = c("#FF7F00","#A65628"), fontsize = c(10, 11.5)), cluster_row_slices = FALSE, cluster_column_slices = FALSE, row_title = "%s|%s", row_title_rot = 0, column_gap = unit(5, "mm"), row_gap = unit(c(2,4,6), "mm"), border = "blue", top_annotation = HeatmapAnnotation(foo1 = 1:24, bar1 = anno_points(runif(24))), right_annotation = rowAnnotation(foo2 = 18:1, bar2 = anno_barplot(runif(18))))
好了,能看到这里的都是真爱,这次我故意不把最好看的图放在前面,就是为了不给不好好学习的学员宝宝看( ̄y▽, ̄)╭ ,我真是个小机灵鬼哈哈哈。那我们下期再见吧!
欢迎大家关注解螺旋生信频道-挑圈联靠公号~
继续阅读
阅读原文