Introduction
Data introduction. We have RPLC (positive and negative mode), HILIC (positive and negative mode). More information can be found here.
Sex Differences in Colon Cancer Metabolism Reveal A Novel Subphenotype
Download data
Mass spectrometry raw data (mzXML) for the case study in this paper is accessible on MetaboLights
with MTBLS1122 (HILIC positive), MTBLS1124 (HILIC negative), MTBLS1122 (RPLC positive) and MTBLS1130 (RPLC negative). The MS2 data (mgf) and processed data (βmass_datasetβ class) from the massProcesser
package are available on the tidyMass project website (https://tidymass.github.io/case_study_data/).
Note: In some steps, we just skip them to save time, try to change
eval=FALSE
toeval=TRUE
so you can run this code chunk. For the annotation step, because we canβt share the in-house database, so we just provide themass_dataset
class after annotation. We apologize for the inconvenience.
Data preparation
Please download all the data and code, and put them in one folder named as case_study
.
Install packages
Please install the pacakge we need in this analysis.
tidymass
π︎
Please refer this document.
if(!require(remotes)){
install.packages("remotes")
}
if(!require(remotes)){
remotes::install_gitlab("jaspershen/tidymass")
}
tidyverse
π︎
if(!require(tidyverse)){
install.packages("tidyverse")
}
Other packages π︎
if(!require(BiocManager)){
install.packages("BiocManager")
}
if(!require(ComplexHeatmap)){
BiocManager::install("ComplexHeatmap")
}
if(!require(ggraph)){
install.packages("ggraph")
}
if(!require(tidygraph)){
install.packages("tidygraph")
}
if(!require(extrafont)){
install.packages("extrafont")
}
if(!require(shadowtext)){
install.packages("shadowtext")
}
Raw data processing
massprocesser
package is used to do the raw data processing. Please refer this website for more information.
library(tidymass)
#> Registered S3 method overwritten by 'Hmisc':
#> method from
#> vcov.default fit.models
#> ββ Attaching packages βββββββββββββββββββββββββββββββββββββββ tidymass 0.99.6 ββ
#> β massdataset 0.99.20 β massstat 0.99.13
#> β massprocesser 0.99.3 β metpath 0.99.4
#> β masscleaner 0.99.7 β metid 1.2.4
#> β massqc 0.99.7 β masstools 0.99.5
#> ββ Conflicts βββββββββββββββββββββββββββββββββββββββββββ tidymass_conflicts() ββ
#> x massdataset::apply() masks base::apply()
#> x xcms::collect() masks dplyr::collect()
#> x BiocGenerics::colMeans() masks massdataset::colMeans(), base::colMeans()
#> x BiocGenerics::colSums() masks massdataset::colSums(), base::colSums()
#> x MSnbase::combine() masks Biobase::combine(), BiocGenerics::combine(), dplyr::combine()
#> x tidyr::extract() masks magrittr::extract()
#> x metpath::filter() masks massdataset::filter(), tidygraph::filter(), dplyr::filter(), stats::filter()
#> x S4Vectors::first() masks dplyr::first()
#> x xcms::groups() masks tidygraph::groups(), dplyr::groups()
#> x massstat::Heatmap() masks ComplexHeatmap::Heatmap()
#> x S4Vectors::intersect() masks BiocGenerics::intersect(), massdataset::intersect(), base::intersect()
#> x dplyr::lag() masks stats::lag()
#> x masstools::mz_rt_match() masks massdataset::mz_rt_match()
#> x MSnbase::reduce() masks purrr::reduce()
#> x S4Vectors::rename() masks massdataset::rename(), tidygraph::rename(), dplyr::rename()
#> x BiocGenerics::rowMeans() masks massdataset::rowMeans(), base::rowMeans()
#> x BiocGenerics::rowSums() masks massdataset::rowSums(), base::rowSums()
#> x purrr::set_names() masks magrittr::set_names()
library(tidyverse)
RPLC positive mode π︎
The code used to do raw data processing.
process_data(
path = "mzxml_ms1_data/RPLC/POS/",
polarity = "positive",
ppm = 20,
peakwidth = c(5, 30),
threads = 6,
output_tic = FALSE,
output_bpc = FALSE,
output_rt_correction_plot = FALSE,
min_fraction = 0.5,
group_for_figure = "QC"
)
All the results will be placed in the folder named case_study/data/mzxml_ms1_data/POS/Result
. More information about that can be found here.
You can just load the object
, which is a mass_dataset
class object.
load("mzxml_ms1_data/RPLC/POS/Result/object")
object
#> --------------------
#> massdataset version: 0.99.9
#> --------------------
#> 1.expression_data:[ 14585 x 298 data.frame]
#> 2.sample_info:[ 298 x 4 data.frame]
#> 3.variable_info:[ 14585 x 3 data.frame]
#> 4.sample_info_note:[ 4 x 2 data.frame]
#> 5.variable_info_note:[ 3 x 2 data.frame]
#> 6.ms2_data:[ 0 variables x 0 MS2 spectra]
#> --------------------
#> Processing information (extract_process_info())
#> create_mass_dataset ----------
#> Package Function.used Time
#> 1 massdataset create_mass_dataset() 2022-03-02 14:50:02
#> process_data ----------
#> Package Function.used Time
#> 1 massprocesser process_data 2022-03-02 12:25:08
dim(object)
#> [1] 14585 298
We can see that there are 14,585 metabolic features in positive mode.
RT correction plot
load data
load("mzxml_ms1_data/RPLC/POS/Result/intermediate_data/xdata2")
Set the group_for_figure if you want to show specific groups. And set it as βallβ if you want to show all samples.
We can use the plot_adjusted_rt()
function to get the interactive plot.
load("mzxml_ms1_data/RPLC/POS/Result/intermediate_data/xdata2")
##set the group_for_figure if you want to show specific groups.
##And set it as "all" if you want to show all samples.
plot <-
massprocesser::plot_adjusted_rt(object = xdata2,
group_for_figure = "QC",
interactive = TRUE)
plot