
Case study for tidyMass
Hong Yan (https://ysph.yale.edu/profile/hong_yan/)
Xiaotao Shen (https://www.shenxt.info/)
Chuhu Wang (https://www.linkedin.com/in/chuchu-wang-71331190/)
Created on 2022-03-03 and updated on 2022-03-09
tidymass_case_study.Rmd
Introduction
Data introduction. We have RPLC (positive and negative mode), HILIC (positive and negative mode). More information can be found here.
Sex Differences in Colon Cancer Metabolism Reveal A Novel Subphenotype
Download data
Mass spectrometry raw data (mzXML) for the case study in this paper is accessible on MetaboLights
with MTBLS1122 (HILIC positive), MTBLS1124 (HILIC negative), MTBLS1122 (RPLC positive) and MTBLS1130 (RPLC negative). The MS2 data (mgf) and processed data (“mass_dataset” class) from the massProcesser
package are available on the tidyMass project website (https://tidymass.github.io/case_study_data/).
Note: In some steps, we just skip them to save time, try to change
eval=FALSE
toeval=TRUE
so you can run this code chunk. For the annotation step, because we can’t share the in-house database, so we just provide themass_dataset
class after annotation. We apologize for the inconvenience.
Data preparation
Please download all the data and code, and put them in one folder named as case_study
.
Install packages
Please install the pacakge we need in this analysis.
tidymass
Please refer this document.
if(!require(remotes)){
install.packages("remotes")
}
if(!require(remotes)){
remotes::install_gitlab("jaspershen/tidymass")
}
tidyverse
if(!require(tidyverse)){
install.packages("tidyverse")
}
Other packages
if(!require(BiocManager)){
install.packages("BiocManager")
}
if(!require(ComplexHeatmap)){
BiocManager::install("ComplexHeatmap")
}
if(!require(ggraph)){
install.packages("ggraph")
}
if(!require(tidygraph)){
install.packages("tidygraph")
}
if(!require(extrafont)){
install.packages("extrafont")
}
if(!require(shadowtext)){
install.packages("shadowtext")
}
Raw data processing
massprocesser
package is used to do the raw data processing. Please refer this website for more information.
RPLC positive mode
The code used to do raw data processing.
process_data(
path = "mzxml_ms1_data/RPLC/POS/",
polarity = "positive",
ppm = 20,
peakwidth = c(5, 30),
threads = 6,
output_tic = FALSE,
output_bpc = FALSE,
output_rt_correction_plot = FALSE,
min_fraction = 0.5,
group_for_figure = "QC"
)
All the results will be placed in the folder named case_study/data/mzxml_ms1_data/POS/Result
. More information about that can be found here.
You can just load the object
, which is a mass_dataset
class object.
load("mzxml_ms1_data/RPLC/POS/Result/object")
object
#> --------------------
#> massdataset version: 0.99.9
#> --------------------
#> 1.expression_data:[ 14585 x 298 data.frame]
#> 2.sample_info:[ 298 x 4 data.frame]
#> 3.variable_info:[ 14585 x 3 data.frame]
#> 4.sample_info_note:[ 4 x 2 data.frame]
#> 5.variable_info_note:[ 3 x 2 data.frame]
#> 6.ms2_data:[ 0 variables x 0 MS2 spectra]
#> --------------------
#> Processing information (extract_process_info())
#> create_mass_dataset ----------
#> Package Function.used Time
#> 1 massdataset create_mass_dataset() 2022-03-02 14:50:02
#> process_data ----------
#> Package Function.used Time
#> 1 massprocesser process_data 2022-03-02 12:25:08
dim(object)
#> [1] 14585 298
We can see that there are 14,585 metabolic features in positive mode.
RT correction plot
load data
load("mzxml_ms1_data/RPLC/POS/Result/intermediate_data/xdata2")
Set the group_for_figure if you want to show specific groups. And set it as “all” if you want to show all samples.
We can use the plot_adjusted_rt()
function to get the interactive plot.
load("mzxml_ms1_data/RPLC/POS/Result/intermediate_data/xdata2")
####set the group_for_figure if you want to show specific groups.
####And set it as "all" if you want to show all samples.
plot <-
massprocesser::plot_adjusted_rt(object = xdata2,
group_for_figure = "QC",
interactive = TRUE)
plot