16   Case study for tidyMass

 

TidyMass packages:

Introduction

Data introduction. We have RPLC (positive and negative mode), HILIC (positive and negative mode). More information can be found here.

Sex Differences in Colon Cancer Metabolism Reveal A Novel Subphenotype

Download data

Mass spectrometry raw data (mzXML) for the case study in this paper is accessible on MetaboLights with MTBLS1122 (HILIC positive), MTBLS1124 (HILIC negative), MTBLS1122 (RPLC positive) and MTBLS1130 (RPLC negative). The MS2 data (mgf) and processed data (β€œmass_dataset” class) from the massProcesser package are available on the tidyMass project website (https://tidymass.github.io/case_study_data/).

Note: In some steps, we just skip them to save time, try to change eval=FALSE to eval=TRUE so you can run this code chunk. For the annotation step, because we can’t share the in-house database, so we just provide the mass_dataset class after annotation. We apologize for the inconvenience.

Data preparation

Please download all the data and code, and put them in one folder named as case_study.

Install packages

Please install the pacakge we need in this analysis.

tidymass πŸ”—︎

Please refer this document.

if(!require(remotes)){
  install.packages("remotes")
}

if(!require(remotes)){
  remotes::install_gitlab("jaspershen/tidymass")
}

tidyverse πŸ”—︎

if(!require(tidyverse)){
  install.packages("tidyverse")
}

Other packages πŸ”—︎

if(!require(BiocManager)){
  install.packages("BiocManager")
}

if(!require(ComplexHeatmap)){
  BiocManager::install("ComplexHeatmap")
}

if(!require(ggraph)){
  install.packages("ggraph")
}

if(!require(tidygraph)){
  install.packages("tidygraph")
}

if(!require(extrafont)){
  install.packages("extrafont")
}

if(!require(shadowtext)){
  install.packages("shadowtext")
}

Raw data processing

massprocesser package is used to do the raw data processing. Please refer this website for more information.

library(tidymass)
#> Registered S3 method overwritten by 'Hmisc':
#>   method       from      
#>   vcov.default fit.models
#> ── Attaching packages ─────────────────────────────────────── tidymass 0.99.6 ──
#> βœ“ massdataset   0.99.20     βœ“ massstat      0.99.13
#> βœ“ massprocesser 0.99.3      βœ“ metpath       0.99.4 
#> βœ“ masscleaner   0.99.7      βœ“ metid         1.2.4  
#> βœ“ massqc        0.99.7      βœ“ masstools     0.99.5
#> ── Conflicts ─────────────────────────────────────────── tidymass_conflicts() ──
#> x massdataset::apply()     masks base::apply()
#> x xcms::collect()          masks dplyr::collect()
#> x BiocGenerics::colMeans() masks massdataset::colMeans(), base::colMeans()
#> x BiocGenerics::colSums()  masks massdataset::colSums(), base::colSums()
#> x MSnbase::combine()       masks Biobase::combine(), BiocGenerics::combine(), dplyr::combine()
#> x tidyr::extract()         masks magrittr::extract()
#> x metpath::filter()        masks massdataset::filter(), tidygraph::filter(), dplyr::filter(), stats::filter()
#> x S4Vectors::first()       masks dplyr::first()
#> x xcms::groups()           masks tidygraph::groups(), dplyr::groups()
#> x massstat::Heatmap()      masks ComplexHeatmap::Heatmap()
#> x S4Vectors::intersect()   masks BiocGenerics::intersect(), massdataset::intersect(), base::intersect()
#> x dplyr::lag()             masks stats::lag()
#> x masstools::mz_rt_match() masks massdataset::mz_rt_match()
#> x MSnbase::reduce()        masks purrr::reduce()
#> x S4Vectors::rename()      masks massdataset::rename(), tidygraph::rename(), dplyr::rename()
#> x BiocGenerics::rowMeans() masks massdataset::rowMeans(), base::rowMeans()
#> x BiocGenerics::rowSums()  masks massdataset::rowSums(), base::rowSums()
#> x purrr::set_names()       masks magrittr::set_names()
library(tidyverse)

RPLC positive mode πŸ”—︎

The code used to do raw data processing.

process_data(
  path = "mzxml_ms1_data/RPLC/POS/",
  polarity = "positive",
  ppm = 20,
  peakwidth = c(5, 30),
  threads = 6,
  output_tic = FALSE,
  output_bpc = FALSE,
  output_rt_correction_plot = FALSE,
  min_fraction = 0.5,
  group_for_figure = "QC"
)

All the results will be placed in the folder named case_study/data/mzxml_ms1_data/POS/Result. More information about that can be found here.

You can just load the object, which is a mass_dataset class object.

load("mzxml_ms1_data/RPLC/POS/Result/object")
object
#> -------------------- 
#> massdataset version: 0.99.9 
#> -------------------- 
#> 1.expression_data:[ 14585 x 298 data.frame]
#> 2.sample_info:[ 298 x 4 data.frame]
#> 3.variable_info:[ 14585 x 3 data.frame]
#> 4.sample_info_note:[ 4 x 2 data.frame]
#> 5.variable_info_note:[ 3 x 2 data.frame]
#> 6.ms2_data:[ 0 variables x 0 MS2 spectra]
#> -------------------- 
#> Processing information (extract_process_info())
#> create_mass_dataset ---------- 
#>       Package         Function.used                Time
#> 1 massdataset create_mass_dataset() 2022-03-02 14:50:02
#> process_data ---------- 
#>         Package Function.used                Time
#> 1 massprocesser  process_data 2022-03-02 12:25:08
dim(object)
#> [1] 14585   298

We can see that there are 14,585 metabolic features in positive mode.

RT correction plot

load data

load("mzxml_ms1_data/RPLC/POS/Result/intermediate_data/xdata2")

Set the group_for_figure if you want to show specific groups. And set it as β€œall” if you want to show all samples.

We can use the plot_adjusted_rt() function to get the interactive plot.

load("mzxml_ms1_data/RPLC/POS/Result/intermediate_data/xdata2")
##set the group_for_figure if you want to show specific groups. 
##And set it as "all" if you want to show all samples.
plot <-
massprocesser::plot_adjusted_rt(object = xdata2, 
                 group_for_figure = "QC", 
                 interactive = TRUE)
plot