ROnmine处理oncomine数据
- 基本介绍
oncomine是一个很好的肿瘤数据库,功能很强大,也有可视化的操作,但是最大的问题是数据不能很好的获得,,而且自带的图很丑(而且只是png),关于oncomine的教程有很多,比较经典有解螺旋的教程,比如下面这个,基本涵盖了常用的功能:
然而,大神yikeshu0611默默的为我们付出,之前开发了一个R包叫ROncomine
可以很方便的获得数据,并且再次出图,不过最近又更新为oncomineR
了,新版的我还没研究,这里我们还是先说老版的ROncomine
这个包以前托管在Github上(由于众所皆知的原因,国内访问Github很困难),但是目前已经删除了,好在我之前已经导入到了我的码云上了,所以也可以很快的安装。
以前其实还有一个教程破解oncomine无法免费下载数据,学员开发了一款R语言包,但是目前也被删除了,估计这个还是有版权的,所以我还是悄悄的写个自己看的教程
首先需要安装devtools,然后调用install_git函数就可以直接安装,之后就方便了
install.packages(‘devtools’)
devtools::install_git(’https://gitee.com/swcyo/oncomineR’))
其实每一个oncomine的数据都可以使用浏览器获得,用chrome浏览器登录,随便一个地方鼠标右击找到检查元素,就可以看到代码区,使用Crtl+F查找<map,我们需要的代码全都在<map那一片,右击复制然后新建一个文本文档保存起来
oncomineR的原理只是是把检查元素里<map那串代码二次处理,从而简化工作,然后调用ploty作图,当然你也可以用ggplot2再次画图
- 单基因在基本中的总结(Gene Summary)
按照解螺旋的教程,比如搜索CXCL8基因,实际上基因名为IL8
,然后我们可以在右边看到这样的一个热图:
我们复制<map
那一串代码,保存为heatmap.txt
文件,我们可以用Oncomine_heatmap_DiseaseSummaryfor_SomeGene
处理
library(ROncomine)
heat<-Oncomine_heatmap_DiseaseSummaryfor_SomeGene('/Users/mac/Documents/GitHub/myblog/content/post/2021-08-19-oncomine/heatmap.txt') # 命名为heat
knitr::kable(heat) ## 这步是我的教程示例代码,不需要执行
Cancer Type | Analysis Type | meat threshold | meat intotal | expression | |
---|---|---|---|---|---|
X1 | Bladder Cancer | Cancer Histology | 3 | 18 | over |
X2 | Bladder Cancer | Cancer Histology | 3 | 18 | under |
X3 | Bladder Cancer | Multi-cancer | 1 | 9 | over |
X4 | Bladder Cancer | Outlier | 4 | 12 | over |
X5 | Bladder Cancer | Outlier | 2 | 12 | under |
X6 | Brain and CNS Cancer | Cancer vs. Normal | 1 | 36 | over |
X7 | Brain and CNS Cancer | Cancer Histology | 3 | 66 | over |
X8 | Brain and CNS Cancer | Cancer Histology | 2 | 66 | under |
X9 | Brain and CNS Cancer | Outlier | 16 | 37 | over |
X10 | Brain and CNS Cancer | Outlier | 7 | 37 | under |
X11 | Breast Cancer | Multi-cancer | 2 | 23 | under |
X12 | Breast Cancer | Outlier | 29 | 68 | over |
X13 | Breast Cancer | Outlier | 5 | 68 | under |
X14 | Cervical Cancer | Cancer vs. Normal | 4 | 10 | over |
X15 | Cervical Cancer | Outlier | 2 | 8 | over |
X16 | Colorectal Cancer | Cancer vs. Normal | 18 | 35 | over |
X17 | Colorectal Cancer | Cancer Histology | 1 | 32 | over |
X18 | Colorectal Cancer | Cancer Histology | 1 | 32 | under |
X19 | Colorectal Cancer | Multi-cancer | 4 | 24 | over |
X20 | Colorectal Cancer | Outlier | 3 | 35 | over |
X21 | Colorectal Cancer | Outlier | 4 | 35 | under |
X22 | Esophageal Cancer | Cancer vs. Normal | 3 | 11 | over |
X23 | Esophageal Cancer | Cancer Histology | 1 | 8 | over |
X24 | Esophageal Cancer | Cancer Histology | 1 | 8 | under |
X25 | Esophageal Cancer | Multi-cancer | 1 | 8 | over |
X26 | Esophageal Cancer | Outlier | 3 | 12 | over |
X27 | Esophageal Cancer | Outlier | 1 | 12 | under |
X28 | Gastric Cancer | Cancer vs. Normal | 2 | 23 | over |
X29 | Gastric Cancer | Cancer Histology | 1 | 29 | over |
X30 | Gastric Cancer | Cancer Histology | 1 | 29 | under |
X31 | Gastric Cancer | Outlier | 1 | 15 | over |
X32 | Gastric Cancer | Outlier | 3 | 15 | under |
X33 | Head and Neck Cancer | Cancer vs. Normal | 4 | 32 | over |
X34 | Head and Neck Cancer | Cancer Histology | 1 | 14 | over |
X35 | Head and Neck Cancer | Multi-cancer | 2 | 10 | over |
X36 | Head and Neck Cancer | Multi-cancer | 1 | 10 | under |
X37 | Head and Neck Cancer | Outlier | 4 | 22 | over |
X38 | Head and Neck Cancer | Outlier | 6 | 22 | under |
X39 | Kidney Cancer | Cancer Histology | 1 | 44 | over |
X40 | Kidney Cancer | Cancer Histology | 1 | 44 | under |
X41 | Kidney Cancer | Multi-cancer | 1 | 18 | over |
X42 | Kidney Cancer | Multi-cancer | 1 | 18 | under |
X43 | Kidney Cancer | Outlier | 10 | 18 | over |
X44 | Kidney Cancer | Outlier | 2 | 18 | under |
X45 | Leukemia | Cancer vs. Normal | 1 | 39 | over |
X46 | Leukemia | Cancer vs. Normal | 3 | 39 | under |
X47 | Leukemia | Cancer Histology | 6 | 117 | over |
X48 | Leukemia | Cancer Histology | 4 | 117 | under |
X49 | Leukemia | Outlier | 14 | 56 | over |
X50 | Leukemia | Outlier | 12 | 56 | under |
X51 | Liver Cancer | Cancer vs. Normal | 2 | 13 | over |
X52 | Liver Cancer | Outlier | 2 | 15 | over |
X53 | Liver Cancer | Outlier | 7 | 15 | under |
X54 | Lung Cancer | Multi-cancer | 2 | 22 | over |
X55 | Lung Cancer | Outlier | 10 | 36 | over |
X56 | Lung Cancer | Outlier | 4 | 36 | under |
X57 | Lymphoma | Cancer vs. Normal | 2 | 36 | under |
X58 | Lymphoma | Multi-cancer | 5 | 17 | under |
X59 | Lymphoma | Outlier | 18 | 34 | over |
X60 | Lymphoma | Outlier | 2 | 34 | under |
X61 | Melanoma | Cancer Histology | 1 | 4 | over |
X62 | Melanoma | Cancer Histology | 1 | 4 | under |
X63 | Melanoma | Multi-cancer | 1 | 18 | over |
X64 | Melanoma | Outlier | 7 | 22 | over |
X65 | Melanoma | Outlier | 4 | 22 | under |
X66 | Myeloma | Cancer Histology | 1 | 13 | over |
X67 | Myeloma | Cancer Histology | 1 | 13 | under |
X68 | Myeloma | Multi-cancer | 2 | 6 | under |
X69 | Myeloma | Outlier | 2 | 16 | over |
X70 | Myeloma | Outlier | 3 | 16 | under |
X71 | Other Cancer | Cancer vs. Normal | 3 | 32 | over |
X72 | Other Cancer | Cancer Histology | 1 | 32 | under |
X73 | Other Cancer | Outlier | 12 | 36 | over |
X74 | Other Cancer | Outlier | 2 | 36 | under |
X75 | Ovarian Cancer | Outlier | 5 | 19 | over |
X76 | Ovarian Cancer | Outlier | 2 | 19 | under |
X77 | Pancreatic Cancer | Cancer vs. Normal | 3 | 12 | over |
X78 | Pancreatic Cancer | Outlier | 1 | 14 | over |
X79 | Pancreatic Cancer | Outlier | 1 | 14 | under |
X80 | Prostate Cancer | Outlier | 7 | 30 | over |
X81 | Prostate Cancer | Outlier | 1 | 30 | under |
X82 | Sarcoma | Cancer Histology | 2 | 101 | over |
X83 | Sarcoma | Cancer Histology | 1 | 101 | under |
X84 | Sarcoma | Multi-cancer | 2 | 13 | under |
X85 | Sarcoma | Outlier | 9 | 25 | over |
X86 | Sarcoma | Outlier | 2 | 25 | under |
这样我们很快的就提取了所有需要的数据,比如我们只想提取Cancer vs. Normal
,那么可以用R语言处理,也可以用DataEditR交互式处理,或者导出来用excel处理
# cn<-heat[heat$`Analysis Type` == 'Cancer vs. Normal',]
# DataEditR::data_edit(heat) # 也可以用交互式编辑
cn<-read.csv('/Users/mac/Documents/GitHub/myblog/content/post/2021-08-19-oncomine/cn.csv')
knitr::kable(cn)
X | Cancer.Type | Analysis.Type. | meat.threshold | meat.intotal | expression |
---|---|---|---|---|---|
X6 | Brain and CNS Cancer | Cancer vs. Normal | 1 | 36 | over |
X14 | Cervical Cancer | Cancer vs. Normal | 4 | 10 | over |
X16 | Colorectal Cancer | Cancer vs. Normal | 18 | 35 | over |
X22 | Esophageal Cancer | Cancer vs. Normal | 3 | 11 | over |
X28 | Gastric Cancer | Cancer vs. Normal | 2 | 23 | over |
X33 | Head and Neck Cancer | Cancer vs. Normal | 4 | 32 | over |
X45 | Leukemia | Cancer vs. Normal | 1 | 39 | over |
X46 | Leukemia | Cancer vs. Normal | 3 | 39 | under |
X51 | Liver Cancer | Cancer vs. Normal | 2 | 13 | over |
X57 | Lymphoma | Cancer vs. Normal | 2 | 36 | under |
X71 | Other Cancer | Cancer vs. Normal | 3 | 32 | over |
X77 | Pancreatic Cancer | Cancer vs. Normal | 3 | 12 | over |
- 正常与癌症组织比较(Differential Analysis)
继续使用解螺旋的示例,用Oncomine分别查询CXCL8在肠癌组织(与正常比)中高表达的数据集和低表达的数据集。筛选条件:P-value:1E-4;Fold Change:3;GENE Rank:Top10%。
这里要注意,默认选择的是子数据,比如Kalser Colon,光标是在Colon Mucinous Adenocarcinoma vs. Normal,这样只是比较肿瘤亚型与正常,这个时候在Differential Analysis的GROUP BY是灰白的,如果我们要看所有,就要点击Kalser Colon,然后GROUP BY就可以选择下拉框了,我们选择Cancer and Normal Type,我们可以看到在GROUP里有很多很多的分组,其实就是很多很多的数据,我们想要的都可以提取了
默认的是一个barplot,出现的是直方图,鼠标在柱子上停留就可以看到value,也就是我们需要的
可以看到有一个箱式图的图标,点一下就变成了Boxplot,鼠标房子Box上可以看到参数
同样的办法,复制<map代码区并保存为文本文件,比如命名为box.txt,这里其实box和bar的界面结果是一样的,我们在bar里复制,然后使用Oncomine_bar
函数
box<-Oncomine_bar('/Users/mac/Documents/GitHub/myblog/content/post/2021-08-19-oncomine/box.txt')
knitr::kable(box)
Expression value | Cancer Type | Sample Name | Normal Tissue Type | Legend Value |
---|---|---|---|---|
3.509 | Colon Small Cell Carcinoma | T4360A3 | No value | No value |
4.205 | Colon Signet Ring Cell Adenocarcinoma | T5245A1 | No value | No value |
4.407 | Rectosigmoid Mucinous Adenocarcinoma | T940A | No value | No value |
4.575 | Colon Small Cell Carcinoma | T4360A2 | No value | No value |
5.109 | Rectal Signet Ring Cell Adenocarcinoma | T924C | No value | No value |
5.867 | Colon Signet Ring Cell Adenocarcinoma | T5002A1 | No value | No value |
6.110 | Rectosigmoid Mucinous Adenocarcinoma | T4980A1 | No value | No value |
0.188 | Cancer | N552G | Colon | Colon |
0.315 | Cancer | N1369A | Colon | Colon |
0.533 | Cancer | N773A1 | Colon | Colon |
1.020 | Cancer | N1102A | Colon | Colon |
1.160 | Cancer | N2367A | Colon | Colon |
2.732 | Cancer | T4452A1 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
2.889 | Cancer | T4550B | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
2.944 | Cancer | T902A | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
3.397 | Cancer | T4834A1 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
3.453 | Cancer | T4354H | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
4.437 | Cancer | T4544A1 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
4.492 | Cancer | T4926A1 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
4.597 | Cancer | T5376A1 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
4.623 | Cancer | T4452A2 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
4.724 | Cancer | T4452A | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
4.801 | Cancer | T4452A3 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
5.183 | Cancer | T5133A | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
5.315 | Cancer | T4452A4 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
5.407 | Cancer | T5024A1 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
5.897 | Cancer | T4573A1 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
5.981 | Cancer | T4984A1 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
6.215 | Cancer | T5565A1 | Cecum Adenocarcinoma | Cecum Adenocarcinoma |
1.644 | Cancer | T5811A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
2.586 | Cancer | T4174A | Colon Adenocarcinoma | Colon Adenocarcinoma |
2.747 | Cancer | T4701A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.043 | Cancer | T519A | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.377 | Cancer | T6190A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.553 | Cancer | T5287A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.634 | Cancer | T4660A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.748 | Cancer | T573A | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.838 | Cancer | T740A | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.877 | Cancer | T4373B1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.927 | Cancer | T5573A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.931 | Cancer | T826A | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.982 | Cancer | T4448A | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.052 | Cancer | T4612A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.061 | Cancer | T5266A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.161 | Cancer | T433A | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.168 | Cancer | T4376A | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.173 | Cancer | T5589B1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.188 | Cancer | T4475A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.294 | Cancer | T5164A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.395 | Cancer | T4956A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.441 | Cancer | T4975A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.478 | Cancer | T773A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.566 | Cancer | T4373B2 | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.761 | Cancer | T949B | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.810 | Cancer | T4750A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.857 | Cancer | T4257A | Colon Adenocarcinoma | Colon Adenocarcinoma |
4.956 | Cancer | T801A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.044 | Cancer | T5162A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.077 | Cancer | T5389A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.172 | Cancer | T4695A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.241 | Cancer | T4667A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.243 | Cancer | T4508A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.286 | Cancer | T4920A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.330 | Cancer | T5107B1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.614 | Cancer | T4373B3 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.638 | Cancer | T4541A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.674 | Cancer | T4489A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.819 | Cancer | T4555A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.852 | Cancer | T5063A | Colon Adenocarcinoma | Colon Adenocarcinoma |
5.959 | Cancer | T5102A1 | Colon Adenocarcinoma | Colon Adenocarcinoma |
3.001 | Cancer | T5513A1 | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
3.836 | Cancer | T4948A2 | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
4.035 | Cancer | T4611B1 | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
4.225 | Cancer | T4799A1 | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
4.389 | Cancer | T932B1 | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
4.776 | Cancer | T5536C4 | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
4.838 | Cancer | T451A | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
4.872 | Cancer | T5261B1 | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
5.087 | Cancer | T4491A1 | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
5.304 | Cancer | T694C | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
5.676 | Cancer | T4644B1 | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
5.973 | Cancer | T625A | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
6.231 | Cancer | T4491A2 | Colon Mucinous Adenocarcinoma | Colon Mucinous Adenocarcinoma |
2.231 | Cancer | T551C | Rectal Adenocarcinoma | Rectal Adenocarcinoma |
2.638 | Cancer | T645A | Rectal Adenocarcinoma | Rectal Adenocarcinoma |
3.811 | Cancer | T521B | Rectal Adenocarcinoma | Rectal Adenocarcinoma |
4.441 | Cancer | T789A | Rectal Adenocarcinoma | Rectal Adenocarcinoma |
4.584 | Cancer | T74A | Rectal Adenocarcinoma | Rectal Adenocarcinoma |
4.669 | Cancer | T565A | Rectal Adenocarcinoma | Rectal Adenocarcinoma |
5.346 | Cancer | T462A | Rectal Adenocarcinoma | Rectal Adenocarcinoma |
5.547 | Cancer | T420A | Rectal Adenocarcinoma | Rectal Adenocarcinoma |
3.798 | Cancer | T4449E4 | Rectal Mucinous Adenocarcinoma | Rectal Mucinous Adenocarcinoma |
4.233 | Cancer | T4449E3 | Rectal Mucinous Adenocarcinoma | Rectal Mucinous Adenocarcinoma |
4.360 | Cancer | T4449E1 | Rectal Mucinous Adenocarcinoma | Rectal Mucinous Adenocarcinoma |
4.784 | Cancer | T4449E2 | Rectal Mucinous Adenocarcinoma | Rectal Mucinous Adenocarcinoma |
2.439 | Cancer | T5139A1 | Rectosigmoid Adenocarcinoma | Rectosigmoid Adenocarcinoma |
2.922 | Cancer | T4919A1 | Rectosigmoid Adenocarcinoma | Rectosigmoid Adenocarcinoma |
2.983 | Cancer | T656A | Rectosigmoid Adenocarcinoma | Rectosigmoid Adenocarcinoma |
3.072 | Cancer | T771A | Rectosigmoid Adenocarcinoma | Rectosigmoid Adenocarcinoma |
3.404 | Cancer | T866D | Rectosigmoid Adenocarcinoma | Rectosigmoid Adenocarcinoma |
3.846 | Cancer | T5430A1 | Rectosigmoid Adenocarcinoma | Rectosigmoid Adenocarcinoma |
4.396 | Cancer | T552G | Rectosigmoid Adenocarcinoma | Rectosigmoid Adenocarcinoma |
4.789 | Cancer | T981A | Rectosigmoid Adenocarcinoma | Rectosigmoid Adenocarcinoma |
5.055 | Cancer | T4603A1 | Rectosigmoid Adenocarcinoma | Rectosigmoid Adenocarcinoma |
5.815 | Cancer | T672A | Rectosigmoid Adenocarcinoma | Rectosigmoid Adenocarcinoma |
我们可以用自带的函数画个图,有两个图,一个是bar_plot,一个是box_plot
Oncomine_bar_plot(box)
## 载入需要的程辑包:ggplot2
##
## 载入程辑包:'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Oncomine_box_plot(box)
当然我们也可以用ggplot2作图
```r
library(ggplot2)
ggplot(box,aes(`Cancer Type`,`Expression value`,color=`Cancer Type`))+
geom_boxplot()+
theme_bw(base_size = 12)+
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=.5))