Cancer Cell的一张美图送给你,新年也发个CC
2021年也继续陪伴着你
大家好,我是风。这里是风风的CNS美图复现系列专栏。今天是元旦耶,元旦快乐哦各位小伙伴们!预祝各位在2021年喜提SCI和课题无数,发文发到手抽筋,基金多到巨开心!好啦,言归正传,上周我们短暂地进行了放松环节,去到“风风的小故事”专栏听小故事,水蜜桃群的宝宝们已经提出了下一个小故事想要听的内容,最近我还听说我们“37°培养箱”队的度渡队长要搞大动作,让我在新的一年刚开始就充满了小期待呢!到底会是啥大动作呢?(我也不知道ing~) 传送门
学了那么久的ComplexHeatmap,我们也该到了收获的时候了,今天我们就找一张已经发表的文章的热图来复现一下:
这张图来自于Cancer Cell上的一篇文章,doi是https://doi.org/10.1016/j.ccr.2012.08.024,原文给大家下载好了放在资料区,看文末回复关键词即可获取,此图为Figure 1,作为一张热图,简单给大家介绍下用了什么内容:
这张图来自于Cancer Cell上的一篇文章,doi是https://doi.org/10.1016/j.ccr.2012.08.024,原文给大家下载好了放在资料区,看文末回复关键词即可获取,此图为Figure 1,作为一张热图,简单给大家介绍下用了什么内容:
1.热图主体为甲基化探针表达量:行为探针名,列为样本名;
2.下面是各个样本的注释,包括:年龄、甲基化聚类、甲基化平均水平、表达水平、基因突变、拷贝数突变
这样看来这张热图好像并不难画对吧?按照我们前面说的拆和叠的内容,只需要先把主体画出来,然后不断叠加注释上去就好了嘛!我们简单复习下,画热图主体我们用Heatmaph函数进行叠加,注释的内容用HeatmapAnnotation函数,然后用draw函数把两部分画在一起就搞定了。听起来好像很简单是不?那我们来试试呗!
画图第一步,当然是熟悉我们数据的特征啦,我们用GSE36278的数据,数据从GEO数据库下载整理,也可以用getGEO("GSE36278"),这里我已经下载好了给大家,直接load进来,然后熟悉数据:
rm(list = ls())
加载R包
"IlluminaHumanMethylation450kanno.ilmn12.hg19") BiocManager::install(
library(matrixStats)
library(GEOquery)
library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
library(tidyverse)
library(RColorBrewer)
library(ComplexHeatmap)
library(circlize)
library(GetoptLong)
"GSE36278") # 想自己下载数据的话直接去掉这一行的#号 gset <- getGEO(
load("GSE36278.Rdata")
$GSE36278_series_matrix.txt.gz@experimentData # 实验信息 gset
$GSE36278_series_matrix.txt.gz@assayData$exprs# 甲基化位点表达矩阵 gset
$GSE36278_series_matrix.txt.gz@phenoData # 表型数据,这里一般包含了我们要的临床信息、样本信息等内容 gset
$GSE36278_series_matrix.txt.gz@featureData # 特征数据,与表型数据类似 gset
$GSE36278_series_matrix.txt.gz@annotation # 平台信息 gset
$GSE36278_series_matrix.txt.gz@protocolData@data # 样本名称 gset
$GSE36278_series_matrix.txt.gz@.__classVersion__ # 分析信息 gset
上述内容可以自己对比一下,一般使用exprs和phenoData就可以满足大部分分析需求
提取甲基化矩阵
甲基化表达谱已通过Illumina HumanMethylation450 BeadChip阵列进行了测量, 我们通过IlluminaHumanMethylation450kanno.ilmn12.hg19程序包加载探针数据,同时将矩阵中的行名称调整为与探针相同
data(Locations)
mat <- exprs(gset[[1]])
colnames(mat) <- phenoData(gset[[1]])@data$title
mat <- mat[rownames(Locations), ]
接下来我们需要对探针进行处理,probe包含探针的位置和CpG位点是否与SNP重叠的信息,接下来我们删除性染色体上的探针以及与SNP重叠的探针:
data(SNPs.137CommonSingle) # SNP信息
data(Islands.UCSC) # UCSC的CpG岛信息
l <- Locations$chr %in% paste0("chr", 1:22) & is.na(SNPs.137CommonSingle$Probe_rs)
# 提取chr1-chr22,也就是去除X和Y的性染色体,并过滤掉与SNP重叠的探针
mat <- mat[l, ] # 重新提取探针矩阵
# 获取探针位置的子集以及相应的CpG岛注释
cgi <- Islands.UCSC$Relation_to_Island[l]
loc <- Locations[l, ]
加下来我们还要对样本进行分组并且重命名:
# 样本分组
mat1 <- as.matrix(mat[, grep("GBM", colnames(mat))]) # grep函数提取肿瘤样本的下标
mat2 <- as.matrix(mat[, grep("CTRL", colnames(mat))]) # grep函数提取正常样本的下标
colnames(mat1) <- gsub("GBM", "dkfz", colnames(mat1))
# 表型数据在文章补充材料中,这里已经给大家下载好了,放在资料区,后台回复关键词获取
# 将表型数据的行调整为与甲基化矩阵的列相同
phenotype <- read.table("450K_annotation.txt", header = TRUE, sep = "\t",
row.names = 1, check.names = FALSE,
comment.char = "", stringsAsFactors = FALSE)
phenotype <- phenotype[colnames(mat1), ]
提取肿瘤样品中甲基化程度最高的前8000个探针和相应信息:
ind <- order(rowVars(mat1, na.rm = TRUE),
decreasing = TRUE)[1:8000]
m1 <- mat1[ind, ]
m2 <- mat2[ind, ]
cgi2 <- cgi[ind]
cgi2 <- ifelse(grepl("Shore", cgi2), "Shore", cgi2)
cgi2 <- ifelse(grepl("Shelf", cgi2), "Shelf", cgi2)
loc <- loc[ind, ]
m1[is.na(m1)] = 0.5
m2[is.na(m2)] = 0.5
接下来我们开始绘图,首先给图例进行赋值,使用HeatmapAnnotation绘制图例,这里的图例包括age、Methylation Cluster、TCGA Methylation、TCGA Expression、Mutations、Cytogenetics等内容:
ha = HeatmapAnnotation(
age = anno_points(phenotype[[13]],
gp = gpar(col = ifelse(phenotype[[13]] > 20, "black", "red")),
height = unit(3, "cm")), # 设置age注释
dkfz_cluster = phenotype[[1]], # 设置Methylation Cluster注释
tcga_cluster = phenotype[[2]], # 设置TCGA Methylation注释
tcga_expr = phenotype[[3]], # 设置TCGA Expression注释
IDH1 = phenotype[[5]], # 设置IDH1突变注释
H3F3A = phenotype[[4]], # 设置H3F3A突变注释
TP53 = phenotype[[6]], # 设置TP53突变注释
chr7_gain = ifelse(phenotype[[7]] == 1, "gain", "normal"), # 设置chr7_gain(CNV)注释
chr10_loss = ifelse(phenotype[[8]] == 1, "loss", "normal"), # 设置chr10_loss(CNV)注释
CDKN2A_del = ifelse(phenotype[[9]] == 1, "del", "normal"), # 设置CDKN2A_del(CNV)注释
EGFR_amp = ifelse(phenotype[[10]] == 1, "amp", "normal"), # 设置EGFR_amp(CNV)注释
PDGFRA_amp = ifelse(phenotype[[11]] == 1, "amp", "normal"), # 设置PDGFRA_amp(CNV)注释
col = list(dkfz_cluster = structure(names = c("IDH", "K27", "G34", "RTK I PDGFRA",
"Mesenchymal",
"RTK II Classic"),
brewer.pal(6, "Set3")), # 设置Methylation Cluster颜色
tcga_cluster = structure(names = c("G-CIMP+", "Cluster #2", "Cluster #3"),
brewer.pal(3, "Set1")), # 设置TCGA Methylation颜色
tcga_expr = structure(names = c("Proneural", "Classical", "Mesenchymal"),
c("#377EB8", "#FFFF33", "#FF7F00")), # 设置TCGA Expression颜色
IDH1 = structure(names = c("MUT", "WT", "G34R", "G34V", "K27M"),
c("black", "white", "#4DAF4A", "#4DAF4A", "#377EB8")), # 将mutation合并为同一种颜色类型
H3F3A = structure(names = c("MUT", "WT", "G34R", "G34V", "K27M"),
c("black", "white", "#4DAF4A", "#4DAF4A", "#377EB8")),
TP53 = structure(names = c("MUT", "WT", "G34R", "G34V", "K27M"),
c("black", "white", "#4DAF4A", "#4DAF4A", "#377EB8")),
chr7_gain = c("gain" = "black", "normal" = "white"), # 将Cytogenetics合并为同一种颜色类型
chr10_loss = c("loss" = "black", "normal" = "white"),
CDKN2A_del = c("del" = "black", "normal" = "white"),
EGFR_amp = c("amp" = "black", "normal" = "white"),
PDGFRA_amp = c("amp" = "black", "normal" = "white")),
na_col = "grey", border = TRUE,
show_legend = c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE),
show_annotation_name = FALSE,
annotation_legend_param = list(
dkfz_cluster = list(title = "Methylation Cluster"),
tcga_cluster = list(title = "TCGA Methylation"),
tcga_expr = list(title = "TCGA Expression"),
H3F3A = list(title = "Mutations"),
chr7_gain = list(title = "Cytogenetics")) # 定义注释名
)
ha
## A HeatmapAnnotation object with 12 annotations
## name: heatmap_annotation_0
## position: column
## items: 136
## width: 1npc
## height: 88.8660578386606mm
## this object is subsetable
## 5.13831111111111mm extension on the left
##
## name annotation_type color_mapping height
## age anno_points() 30mm
## dkfz_cluster discrete vector user-defined 5mm
## tcga_cluster discrete vector user-defined 5mm
## tcga_expr discrete vector user-defined 5mm
## IDH1 discrete vector user-defined 5mm
## H3F3A discrete vector user-defined 5mm
## TP53 discrete vector user-defined 5mm
## chr7_gain discrete vector user-defined 5mm
## chr10_loss discrete vector user-defined 5mm
## CDKN2A_del discrete vector user-defined 5mm
## EGFR_amp discrete vector user-defined 5mm
## PDGFRA_amp discrete vector user-defined 5mm
图例绘制完成后,开始绘制主体热图,主体热图有三个热图组合而成,用我们前面学习的方法,那就是分别绘制Heatmap然后用+组合起来:
col_fun = colorRamp2(c(0, 0.5, 1), c("navy", "white", "red")) # 设置颜色,低表达为navy,高表达为red
ht_list = Heatmap(m1, col = col_fun, name = "Methylation",
clustering_distance_columns = "spearman",
show_row_dend = FALSE, show_column_dend = FALSE,
show_column_names = FALSE,
bottom_annotation = ha, column_title = qq("GBM samples (n = @{ncol(m1)})"),
row_split = factor(cgi2, levels = c("Island", "Shore", "Shelf", "OpenSea")),
row_title_gp = gpar(col = "#FFFFFF00")) +
Heatmap(m2, col = col_fun, show_column_names = FALSE,
show_column_dend = FALSE, column_title = "Controls",
show_heatmap_legend = FALSE, width = unit(1, "cm")) +
Heatmap(cgi2, name = "CGI", show_row_names = FALSE, width = unit(5, "mm"),
col = structure(names = c("Island", "Shore", "Shelf", "OpenSea"),
c("red", "blue", "green", "#CCCCCC")))
好了,一个热图已经绘制完成,接下来可以使用draw函数进行调整和组合:
draw(ht_list,
row_title = paste0("DNA methylation probes (n = ", nrow(m1), ")"),
annotation_legend_side = "left",
heatmap_legend_side = "left")
是不是感觉还差点意思?图例上面的修饰没跟上对吧?那我们给它补上:
annotation_titles = c(dkfz_cluster = "Methylation Cluster",
tcga_cluster = "TCGA Methylation",
tcga_expr = "TCGA Expression",
IDH1 = "IDH1",
H3F3A = "H3F3A",
TP53 = "TP53",
chr7_gain = "Chr7 gain",
chr10_loss = "Chr10 loss",
CDKN2A_del = "Chr10 loss",
EGFR_amp = "EGFR amp",
PDGFRA_amp = "PDGFRA amp")
for(an in names(annotation_titles)) {
decorate_annotation(an, {
grid.text(annotation_titles[an], unit(-2, "mm"), just = "right")
grid.rect(gp = gpar(fill = NA, col = "black"))
})
}
decorate_annotation("age", {
grid.text("Age", unit(8, "mm"), just = "right")
grid.rect(gp = gpar(fill = NA, col = "black"))
grid.lines(unit(c(0, 1), "npc"), unit(c(20, 20), "native"), gp = gpar(lty = 2))
})
decorate_annotation("IDH1", {
grid.lines(unit(c(-40, 0), "mm"), unit(c(1, 1), "npc"))
})
decorate_annotation("chr7_gain", {
grid.lines(unit(c(-40, 0), "mm"), unit(c(1, 1), "npc"))
})
这样我们整个图片就复现完成啦,其实使用的方法都在我们前面的推文都讲解过了,这次复现就是把前面内容进行组合,并且使用自己现有的数据进行分析,如果你有自己的类似数据,可以直接更改数据套用代码,这样就可以直接出一张Cancer Cell的美图啦!
杂志推荐
只有符合以下条件的杂志才会被推荐:
至少位于3区及以上,影响因子在3+; 每年发文量至少在200以上; 网上可查审稿周期不超过2.5个月 未被“预警名单”收录(emmmmmm) 尽量推荐一些大家在其他地方看的比较少的期刊(知道的人越少,目前来说可能相对更大几率接收) (等着你们提)
本周推荐期刊:
期刊名:OncoImmunology(在“挑圈联靠”公众号回复杂志名可以获得如上图的期刊详细信息👆)
上榜理由:美国老牌期刊,中科院分区2区,JCR分区1区,年发文量251,近几年上面也有不少生信文章,如果有实验验证,中的概率会更大一点,给大家列举几篇看题目就能猜到大概率属于生信分析的文章:
- An integrated classifier improves prognostic accuracy in non-metastatic gastric cancer. 2020 Aug 30;9(1):1792038.doi: 10.1080/2162402X.2020.1792038.
- HLA class-I and class-II restricted neoantigen loads predict overall survival in breast cancer. 2020 Apr 1;9(1):1744947. doi: 10.1080/2162402X.2020.1744947.
- Identification of tumor-infiltrating immune cells and prognostic validation of tumor-infiltrating mast cells in adrenocortical carcinoma: results from bioinformatics and real-world data. 2020 Jun 23;9(1):1784529. doi: 10.1080/2162402X.2020.1784529.
好了,后台回复“feng31”获得本次代码和绘图数据,别忘了引用ComplexHeatmap的文献哦!今天的内容就到这里,我们下次再见吧!*^_^*
我是风,解螺旋先锋班学员,翡翠讲师,挑圈联靠专栏作者,训练营助教。2020过得真快,新的一年已经到来,我加入先锋班大家庭也一年多啦,这一年从解螺旋各位老师身上学习了很多,也有了很多收获和很多珍贵的回忆,希望新的一年能够再接再厉,继续成长,也希望能够在不断输入的同时,也有更好的输出来回馈解螺旋!mua~
—
END—
撰文丨风 风
排版丨四金兄
值班 | 风间琉璃
主编丨小雪球
欢迎大家关注解螺旋生信频道-挑圈联靠公号~
最新评论
推荐文章
作者最新文章
你可能感兴趣的文章
Copyright Disclaimer: The copyright of contents (including texts, images, videos and audios) posted above belong to the User who shared or the third-party website which the User shared from. If you found your copyright have been infringed, please send a DMCA takedown notice to [email protected]. For more detail of the source, please click on the button "Read Original Post" below. For other communications, please send to [email protected].
版权声明:以上内容为用户推荐收藏至CareerEngine平台,其内容(含文字、图片、视频、音频等)及知识版权均属用户或用户转发自的第三方网站,如涉嫌侵权,请通知[email protected]进行信息删除。如需查看信息来源,请点击“查看原文”。如需洽谈其它事宜,请联系[email protected]。
版权声明:以上内容为用户推荐收藏至CareerEngine平台,其内容(含文字、图片、视频、音频等)及知识版权均属用户或用户转发自的第三方网站,如涉嫌侵权,请通知[email protected]进行信息删除。如需查看信息来源,请点击“查看原文”。如需洽谈其它事宜,请联系[email protected]。