
Raw data processing
Xiaotao Shen (https://www.shenxt.info/)
Created on 2021-12-04 and updated on 2022-03-21
raw_data_processing.Rmd
Data preparation
Download the demo data and refer this article.
We have positive and negative mode. For each mode, we have control
, case
and QC
groups. Control group have 110 samples, and case group have 110 samples as well.
Positive mode
massprocesser
package is used to do the raw data processing. Please refer this website.
Code
The code used to do raw data processing.
library(tidymass)
#> Registered S3 method overwritten by 'Hmisc':
#> method from
#> vcov.default fit.models
#> ── Attaching packages ─────────────────────────────────────── tidymass 0.99.4 ──
#> ✓ masscleaner 0.99.3 ✓ metid 1.2.2
#> ✓ massqc 0.99.3 ✓ dplyr 1.0.8
#> ✓ massstat 0.99.6 ✓ ggplot2 3.3.5
#> ✓ metpath 0.99.2
#> ── Conflicts ─────────────────────────────────────────── tidymass_conflicts() ──
#> x massdataset::apply() masks base::apply()
#> x dplyr::collect() masks xcms::collect()
#> x massdataset::colMeans() masks BiocGenerics::colMeans(), base::colMeans()
#> x massdataset::colSums() masks BiocGenerics::colSums(), base::colSums()
#> x dplyr::combine() masks MSnbase::combine(), Biobase::combine(), BiocGenerics::combine()
#> x dplyr::filter() masks metpath::filter(), massdataset::filter(), stats::filter()
#> x dplyr::first() masks S4Vectors::first()
#> x tinytools::get_compound_class() masks masstools::get_compound_class()
#> x tinytools::get_os() masks masstools::get_os()
#> x tinytools::getDP() masks masstools::getDP()
#> x dplyr::groups() masks xcms::groups()
#> x massdataset::intersect() masks S4Vectors::intersect(), BiocGenerics::intersect(), base::intersect()
#> x tinytools::keep_one() masks masstools::keep_one()
#> x dplyr::lag() masks stats::lag()
#> x tinytools::ms2_plot() masks masstools::ms2_plot()
#> x tinytools::ms2Match() masks masstools::ms2Match()
#> x tinytools::mz_rt_match() masks masstools::mz_rt_match(), massdataset::mz_rt_match()
#> x tinytools::name_duplicated() masks masstools::name_duplicated()
#> x metid::read_mgf() masks tinytools::read_mgf(), masstools::read_mgf()
#> x tinytools::removeNoise() masks masstools::removeNoise()
#> x dplyr::rename() masks massdataset::rename(), S4Vectors::rename()
#> x massdataset::rowMeans() masks BiocGenerics::rowMeans(), base::rowMeans()
#> x massdataset::rowSums() masks BiocGenerics::rowSums(), base::rowSums()
#> x tinytools::setwd_project() masks masstools::setwd_project()
#> x tinytools::split_formula() masks masstools::split_formula()
#> x tinytools::trans_ID() masks masstools::trans_ID()
#> x tinytools::trans_id_database() masks masstools::trans_id_database()
process_data(
path = "mzxml_ms1_data/POS",
polarity = "positive",
ppm = 10,
peakwidth = c(10, 60),
threads = 4,
output_tic = FALSE,
output_bpc = FALSE,
output_rt_correction_plot = FALSE,
min_fraction = 0.5,
group_for_figure = "QC"
)
#> Reading raw data, it will take a while...
#>
#> Use old saved data in Result.
#>
#> ✔ OK
#>
#> Detecting peaks...
#>
#> Use old saved data in Result.
#>
#> ✔ OK
#>
#> Correcting rentention time...
#>
#> Use old saved data in Result.
#>
#> ✔ OK
#>
#> Grouping peaks across samples...
#>
#> Use old saved data in Result.
#>
#> ✔ OK
#>
#> Outputting peak table...
#>
#> ✔ OK
#>
#> OK
#>
#> ✔ All done!
#>
Results
All the results will be placed in the folder mzxml_ms1_data/POS/Result
. More information can be found here.
You can just load the object
, which is a mass_dataset
class object.
load("mzxml_ms1_data/POS/Result/object")
object
#> --------------------
#> massdataset version: 0.99.8
#> --------------------
#> 1.expression_data:[ 10149 x 259 data.frame]
#> 2.sample_info:[ 259 x 4 data.frame]
#> 3.variable_info:[ 10149 x 3 data.frame]
#> 4.sample_info_note:[ 4 x 2 data.frame]
#> 5.variable_info_note:[ 3 x 2 data.frame]
#> 6.ms2_data:[ 0 variables x 0 MS2 spectra]
#> --------------------
#> Processing information (extract_process_info())
#> create_mass_dataset ----------
#> Package Function.used Time
#> 1 massdataset create_mass_dataset() 2022-02-22 16:37:06
#> process_data ----------
#> Package Function.used Time
#> 1 massprocesser process_data 2022-02-22 16:36:42
We can see that there are 10,149 metabolic features in positive mode.
You can use the plot_adjusted_rt()
function to get the interactive plot.
load("mzxml_ms1_data/POS/Result/intermediate_data/xdata2")
##set the group_for_figure if you want to show specific groups. And set it as "all" if you want to show all samples.
plot =
massprocesser::plot_adjusted_rt(object = xdata2,
group_for_figure = "QC",
interactive = TRUE)
plot