Functional Analysis of Global Metabolomics

Zhiqiang Pang, Jeff Xia

2023-07-23

1. Introduction

MetaboAnalyst encompassed two modules of functional analysis for targeted metabolomics datasets. They are metabolic pathway analysis (MetPA) and metabolite set enrichment analysis (MSEA). These two modules are introduced in another two vignette (Pathway Analysis and Enrichment Analysis).

However, these modules require metabolite identifications prior to use, which remains an important challenge in untargeted metabolomics. In comparison, the mummichog algorithm (Li et al. 2013) bypasses the bottleneck of metabolite identification prior to pathway analysis, leveraging a priori pathway and network knowledge to directly infer biological activity based on mass peaks. We have therefore implemented the mummichog algorithm in R in a new module named “Functional Analysis”. The knowledge-base for this module consists of five genome-scale metabolic models from the original Python implementation which have either been manually curated or downloaded from BioCyc, as well as an expanded library of 21 organisms derived from KEGG metabolic pathways.

2. Functions and Data formats

MetaboAnalyst and MetaboAnalystR accept multiple data format for functional analysis on global metabolomics data. See details below.

2.1 Peak list

To use this module, users have multiple options to upload their data. The typical data formats is a table (*.txt) containing one to four columns. Users can format the table based on the following rules,

1). Three columns - m/z features, p-values, and t-scores or fold-change values.

2). Two columns containing m/z features and either p-values or t-scores

3). One column ranked by either p-values or t-scores.

4). Four columns - m/z features, p-values, t-scores or fold-change values, and the mode (positive or negative).

Data uploading can be performed by using the Read.PeakListData function.

If p-values have not yet been calculated, users can use the “Statistical Analysis” module from MetaboAnalyst website or MetaboAnalystR (see more details from Statistical Analysis (one-factor) vignette) to upload their raw peak tables, process the data, perform t-tests, and then upload these results into the module. With the table, users also need to specify the type of MS instrument, the ion mode (positive or negative), and the p-value cutoff to delineate between significantly enriched and non-significantly enriched m/z features using the UpdateMummichogParameters function.

Currently, MetaboAnalystR only supports the handling of peaks obtained from high-resolution MS instruments such as Orbitrap, or Fourier Transform (FT)-MS instruments as recommended by the original mummichog implementation. Following data upload, Users can then perform the mummichog algorithm on their data using PerformPSEA. First, users will set algorithm to be used to mummichog using SetPeakEnrichMethod. Second, users will set the p-value cutoff to delineate between significantly enriched and non-significantly enriched m/z features using the SetMummichogPval function. Finally, use the PerformPSEA to calculate pathway activity.

See an example from section 3 (Case Study 1) in this vignette.

2.2 Peak table

MetaboAnalystR also accepts a typical peak intensity table, including complete features’ information, groups of samples and peak height(or areas). Here is the basic rules for peak table:

1). Feature information must contains both m/z and retention time (RT). They should be pasted a one string with double underscore (__) . e.g. m/z 157.0241 and RT 28.64 sec should be formatted as “157.0241__28.64”. See example below.

2). At least two groups are required for functional analysis. User may provide more groups, but it is highly recommended to include only TWO groups each time to discover the functional perturbation;

3). Intensity values must be a number, all NAs will be replaced with 0;

Here is an example peak intensity table.

Table 1. An example of a peak intensity table.
Sample Naive_007 Naive_027 Naive_071 Semi_025 Semi_045 Semi_061
Label Naive Naive Naive Semi_immue Semi_immue Semi_immue
85.065__24.64 166020.8 120086.1 19509.1 152651.8 65984.4 149758.9
85.0843__167.65 1645008.2 1190623 638456.2 2314313.4 2527459 3709995.8
86.0602__37.16 18142.5 16616 10886.5 1212986 661407.9 1393239.7
86.0602__129.2 757420.7 1401437.4 986969.3 3910272.1 3044111.5 3490387.8
86.0603__31.13 40168.4 28760.7 40007.5 1880331.9 925490.8 2595342.9
86.0966__58.82 95507679.6 88613771.1 71913129.5 124289527.1 193556499.2 184419890.8
86.0966__93.94 24447611.9 33839015.1 18306222.1 264723805.5 119992641.6 138170927.9
86.0966__82.57 16330118.2 29280720 30329806.6 118751749 93636719.9 178579137.2
86.0966__68.05 175048972.7 211835242 195847604.5 122900362.3 209581146.4 177805709.8

Different from peak list option, there are a few more steps for peak intensity feature processing. In details, Read.TextData is used to read the peak table. ReplaceMin and FilterVariable function is used to replace the minimum value and filter out the redundancy. PreparePrenormData and Normalization is also needed to normalize your peak table. Finally, execute PreparePeakTable4PSEA function to automatically convert peak table into compatible peak list for the following processing. Other steps are similar to the ones implemented for peak list.

See an example from section 4 (Case Study 2) in this vignette.

2.3 MS/MS identification results

Since the version 4.0, MetaboAnalystR has also accepted the MS/MS based identification results, together with MS peak list for more accurate pathway prediction. To prepare the MS peak and MS/MS based identification data, user need to follow some rules to format their data. Here are two files needed for MS/MS identification based functional analysis.

1). MS peak list is same as the one described in section 2.1. But users are highly encouraged to include four columns (mz, rt, t.score and p.value);

2). MS/MS compound list must include same number of rows as the MS peak list. The header of each column can be any characters based on users’ preference. Besides, the compound identifications can be one of these types (InChiKeys, PubChem CIDs, PubChem SIDs and SMILES). Other compound ID will be supported soon;

To use MS/MS based identification results, some extra functions need to be executed together with the typical mummichog functions. In details, SetMS2IDType is used to set ID type of compounds from MS/MS identification. Different from section 2.1 and 2.2, Read.PeakMS2ListData is used to import both MS features and MS/MS identification results consurrently. Other functions and steps are similar to the ones used to process MS peaks in section 2.1.

See an example from section 5 (Case Study 3) in this vignette.

2.4 Mummichog version 1 or version 2?

The SetPeakEnrichMethod now has a parameter to let users select whether to use Version 1 or Version 2 of the MS Peaks to Paths algorithms. With Version 2, users can now upload retention time information to perform pathway analysis. Note that Version 2 should only be used if user’s data contains a retention time (“rt” or “r.t”) column. Retention time is used to move pathway analysis from the “Compound” space to “Empirical Compound” space (see details in “How are Empirical Compounds calculated?”). The inclusion of retention time will increase the confidence and robustness of the potential compound matches. Another difference is that currency compounds are removed directly from the user’s selected pathway library, versus removed from potential compound hits during the permutations.

2.5 Results output

The output of this module first consists of a table of results identifying the top-pathways that are enriched in the user-uploaded data, which can be found in your working directory named “mummichog_pathway_enrichment_mummichog.csv”. The table consists of the total number of hits, the raw p-value (Fisher’s or Hypergeometric), the EASE score, and the adjusted p-value (for permutations) per pathway. A second table can also be found in your working directory named “mummichog_matched_compound_all.csv”, that contains all matched metabolites from the user’s uploaded list of m/z features.

For this tutorial we will directly use several examples collected using LC-high resolution MS) to showcase the performance and implementation.

3. Case Study 1 - Mummichog analysis on MS1 peak list

# This example data set is a peak list from a COVID-19 study
# Two groups (COVID-19 vs. Healthy Controls) are included for this example

download.file("https://raw.githubusercontent.com/Zhiqiang-PANG/MetaboRaw/master/examples/peaks_ms1.txt", 
              destfile = "peaks_ms1.txt", 
              mode = "auto")
# Load MetaboAnalystR
library(MetaboAnalystR)

# Clean global environment
rm(list = ls())
# Create objects for storing processed data
mSet <- InitDataObjects("mass_all", "mummichog", FALSE)
## [1] "MetaboAnalyst R objects initialized ..."
# Set Peak format as "mpt" here.
# This option can be "mp", "mt", "mpt", "mprt", "mrt", "mpr", "rmp", "rmt".
# "rmp" and "rmt" refers to peaks ranked by p values or t scores;
# For other options, "m" means "mz"; "p" means "p value"; "t" means "t score"; "r" means "retention time";
mSet <- SetPeakFormat(mSet, "mpt")

# Set parameters for analysis, in this case the mass accuracy is set to 15 ppm, 
# the mode of the MS instrument is "mixed" (contains both positive and negative), 
# and the p-value cut-off is 0.02
mSet <- UpdateInstrumentParameters(mSet, 15.0, "mixed", "yes", 0.02)

# Read in peak-list data
mSet <- Read.PeakListData(mSet, "peaks_ms1.txt");

# set retention time included, unit is "seconds"
mSet <- SetRTincluded(mSet, "seconds")

# Sanity check of the uploaded data
mSet <- SanityCheckMummichogData(mSet)

# set peak enrichment method. Method can be one of the "mum", "gsea" or "integ";
# Method, "integ" means integration of both mummichog and GSEA algorithm;
# version can be "v1" or "v2" ("v1" will use m/z only; "v2" will use both "m/z" and "retention time")
mSet <- SetPeakEnrichMethod(mSet, "mum", "v2")

# Here we use the top 10% peaks as the p value cutoff
pval <- sort(mSet[["dataSet"]][["mummi.proc"]][["p.value"]])[ceiling(length(mSet[["dataSet"]][["mummi.proc"]][["p.value"]])*0.1)]
mSet <- SetMummichogPval(mSet, pval)

# Perform the mummichog algorith, in this case the model is the human MFN model, using "current" version by default
# This function may take sometime for processing, and will output the pathway-results and the compound matching tables in your working directory
mSet <- PerformPSEA(mSet, "hsa_mfn", "current", 3 , 100)
## [1] "Got 15018 mass features."
## [1] "4884 initial ECs created!"
## Loading required package: plyr
## [1] "2722 merged ECs identified!"
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
## [1] "648 empirical compounds identified in 2.46367573738098 seconds."
## [1] "A total of 3129 of duplicates were merged."
## [1] "A total of 2912 of duplicates were merged."
## [1] "A total of 2912 of duplicates were merged."
## [1] "A total of 3129 of duplicates were merged."
## [1] "A total of 2126 of duplicates were merged."
## [1] "A total of 2126 of duplicates were merged."
## [1] "A total of 3129 of duplicates were merged."
## [1] "A total of 2912 of duplicates were merged."
## [1] "Resampling,  100 permutations to estimate background ..."
# To view the results of the pathway analysis
mSet <- PlotPeaks2Paths(mSet, "peaks_to_paths_ms1_", "png", 72, width=8)
Figure 1. Functional Analysis Results (MS peaks only).

Figure 1. Functional Analysis Results (MS peaks only).

4. Case Study 2 - Mummichog analysis on MS1 peak table

## In this case study, we are downloading a peak table as an example dataset
## This peak table is from a Malaria study, which includes two immune status (Naive vs. Semi-immune)
## Six samples are included in each group
download.file("https://raw.githubusercontent.com/Zhiqiang-PANG/MetaboRaw/master/examples/malaria_feature_table.csv", 
              destfile = "malaria_feature_table.csv", mode = "auto")
# Load MetaboAnalystR
library(MetaboAnalystR)

# Clean global environment
rm(list = ls())
# Create objects for storing processed data
mSet <- InitDataObjects("mass_table", "mummichog", FALSE)
## [1] "MetaboAnalyst R objects initialized ..."
# Set parameters, ppm is 4.3 here
# Only positive mode (ESI+) included
mSet <- SetPeakFormat(mSet, "colu")
mSet <- UpdateInstrumentParameters(mSet, 4.3, "positive", "yes", 0.02);
mSet <- SetRTincluded(mSet, "seconds")

# Read Peak table
mSet <- Read.TextData(mSet, "malaria_feature_table.csv", "mpt", "disc");
mSet <- SanityCheckMummichogData(mSet)
##  [1] "Successfully passed sanity check!"                                                                                
##  [2] "Samples are not paired."                                                                                          
##  [3] "2 groups were detected in samples."                                                                               
##  [4] "Only English letters, numbers, underscore, hyphen and forward slash (/) are allowed."                             
##  [5] "<font color=\"orange\">Other special characters or punctuations (if any) will be stripped off.</font>"            
##  [6] "All data values are numeric."                                                                                     
##  [7] "<font color=\"red\"> 235 features with a constant or single value across samples were found and deleted.</font>"  
##  [8] "A total of 0 (0%) missing values were detected."                                                                  
##  [9] "<u>By default, missing values will be replaced by 1/5 of min positive values of their corresponding variables</u>"
## [10] "Click the <b>Proceed</b> button if you accept the default practice;"                                              
## [11] "Or click the <b>Missing Values</b> button to use other methods."
# Replace minimum value and data filtration
mSet <- ReplaceMin(mSet);
mSet <- SanityCheckMummichogData(mSet)
##  [1] "Successfully passed sanity check!"                                                                                
##  [2] "Samples are not paired."                                                                                          
##  [3] "2 groups were detected in samples."                                                                               
##  [4] "Only English letters, numbers, underscore, hyphen and forward slash (/) are allowed."                             
##  [5] "<font color=\"orange\">Other special characters or punctuations (if any) will be stripped off.</font>"            
##  [6] "All data values are numeric."                                                                                     
##  [7] "<font color=\"red\"> 235 features with a constant or single value across samples were found and deleted.</font>"  
##  [8] "A total of 0 (0%) missing values were detected."                                                                  
##  [9] "<u>By default, missing values will be replaced by 1/5 of min positive values of their corresponding variables</u>"
## [10] "Click the <b>Proceed</b> button if you accept the default practice;"                                              
## [11] "Or click the <b>Missing Values</b> button to use other methods."
mSet <- FilterVariable(mSet, "none", -1, "F", 25, F)
## [1] "  Feature filtering based on Interquantile Range"
# Perform data normalization
# The normailization distribution will be displaced in figure "norm_0_dpi72.png" and "snorm_0_dpi72.png"
# in your working directory
mSet <- PreparePrenormData(mSet)
mSet <- Normalization(mSet, "NULL", "LogNorm", "NULL", ratio=FALSE, ratioNum=20)
## [1]   12 4878
mSet <- PlotNormSummary(mSet, "norm_0_", "png", 72, width=NA)
mSet <- PlotSampleNormSummary(mSet, "snorm_0_", "png", 72, width=NA)

# Perform functional analysis with mummichog algorithm
mSet <- SetPeakEnrichMethod(mSet, "mum", "v2")
mSet <- PreparePeakTable4PSEA(mSet)
## Loading required package: memoise
## [1] "Performing regular t-tests ...."
## [1] "A total of 4878 significant features were found."
mSet <- SetMummichogPval(mSet, 0.003)
mSet <- PerformPSEA(mSet, "hsa_mfn", "current", 3 , 100)
## [1] "Got 4878 mass features."
## [1] "1967 initial ECs created!"
## [1] "1138 merged ECs identified!"
## [1] "336 empirical compounds identified in 0.745071172714233 seconds."
## [1] "A total of 852 of duplicates were merged."
## [1] "A total of 725 of duplicates were merged."
## [1] "A total of 725 of duplicates were merged."
## [1] "A total of 852 of duplicates were merged."
## [1] "A total of 727 of duplicates were merged."
## [1] "A total of 727 of duplicates were merged."
## [1] "A total of 852 of duplicates were merged."
## [1] "A total of 725 of duplicates were merged."
## [1] "Resampling,  100 permutations to estimate background ..."
mSet <- PlotPeaks2Paths(mSet, "peaks_to_paths_0_", "png", 72, width=NA)
Figure 2. Functional Analysis Results (MS table).

Figure 2. Functional Analysis Results (MS table).

5. Case Study 3 - Mummichog analysis on both MS1 and MS/MS data

## In this case study, we are downloading a peak list and the corresponding compound list as an example dataset
## This dataset is from a COVID-19 study, which includes two groups (COVID vs. Healthy Controls)
## Six samples are included in each group
## This example is same as "Case Study 1" above
download.file("https://raw.githubusercontent.com/Zhiqiang-PANG/MetaboRaw/master/examples/peaks_ms1.txt", 
              destfile = "peaks_ms1.txt", mode = "auto")

download.file("https://raw.githubusercontent.com/Zhiqiang-PANG/MetaboRaw/master/examples/compound_ms2.txt", 
              destfile = "compound_ms2.txt", mode = "auto")
# Load MetaboAnalystR
library(MetaboAnalystR)

# Clean global environment
rm(list = ls())
# Create objects for storing processed data
mSet <- InitDataObjects("mass_all", "mummichog", FALSE)
## [1] "MetaboAnalyst R objects initialized ..."
# Set parameters, ppm is 4.3 here
# Only positive mode (ESI+) included
mSet <- SetPeakFormat(mSet, "mpt")
mSet <- SetMS2IDType(mSet, "inchikeys")
mSet <- UpdateInstrumentParameters(mSet, 15.0, "mixed", "yes", 0.02)

# Read in MS peaks and MS/MS based compounds 
mSet <- Read.PeakMS2ListData(mSet, msfile = "peaks_ms1.txt", 
                             msmsfile = "compound_ms2.txt")

# Set parameters
# Set Retention time unit as "seconds"
# set algorithm as "mummichog" and version 2
# Here we use the top 10% peaks as the p value cutoff
mSet <- SetRTincluded(mSet, "seconds")
mSet <- SanityCheckMummichogData(mSet)
mSet<-SetPeakEnrichMethod(mSet, "mum", "v2")
pval <- sort(mSet[["dataSet"]][["mummi.proc"]][["p.value"]])[ceiling(length(mSet[["dataSet"]][["mummi.proc"]][["p.value"]])*0.1)]
mSet<-SetMummichogPval(mSet, pval)

# Perform functional analysis
mSet<-PerformPSEA(mSet, "hsa_mfn", "current", 3 , 100)
## [1] "Got 15018 mass features."
## [1] "Got 8492 Compound features."
## [1] "Empirical Compounds Filtration based on MS/MS results.."
## [1] "Total of 1247/4941 Empirical Compounds has been filtered from ESI- Mode!"
## [1] "Empirical Compounds Filtration based on MS/MS results.."
## [1] "Total of 1049/5732 Empirical Compounds has been filtered from ESI+ Mode!"
## [1] "3828 initial ECs created!"
## [1] "2396 merged ECs identified!"
## [1] "564 empirical compounds identified in 1.86224818229675 seconds."
## [1] "A total of 2392 of duplicates were merged."
## [1] "A total of 2266 of duplicates were merged."
## [1] "A total of 2266 of duplicates were merged."
## [1] "A total of 2392 of duplicates were merged."
## [1] "A total of 1478 of duplicates were merged."
## [1] "A total of 1478 of duplicates were merged."
## [1] "A total of 2392 of duplicates were merged."
## [1] "A total of 2266 of duplicates were merged."
## [1] "Resampling,  100 permutations to estimate background ..."
mSet<-PlotPeaks2Paths(mSet, "peaks_to_paths_msms_", "png", 72, width=8)
Figure 3. Functional Analysis Results (MS peak list + MS/MS compound list).

Figure 3. Functional Analysis Results (MS peak list + MS/MS compound list).

6. Adduct & Currency Customization

6.1 Adduct Customization

Raw MS peaks contain a significant amount of adducts specific to their MS instrument and analytical mode. A comprehensive adduct list is shown below in the “Available” panel. Use this to customize the adduct list used by the Mummichog/GSEA algorithms to match m/z peaks to potential compounds hits. The list of available adducts to choose from can be found in positive; negative; mixed.

For the negative ion mode, the adducts used are: M-H[-], M-2H[2-], M(C13)-H[-], M(S34)-H[-], M(Cl37)-H[-], M+Na-2H[-], M+K-2H[-], M-H2O-H[-], M+Cl[-], M+Cl37[-], M+Br[-], M+Br81[-], M+ACN-H[-], M+HCOO[-], M+CH3COO[-], and M-H+O[-].

For the positive ion mode, the adducts used are: M[1+], M+H[1+], M+2H[2+], M+3H[3+], M(C13)+H[1+], M(C13)+2H[2+], M(C13)+3H[3+], M(S34)+H[1+], M(Cl37)+H[1+], M+Na[1+], M+H+Na[2+], M+K[1+], M+H2O+H[1+], M-H2O+H[1+], M-H4O2+H[1+], M-NH3+H[1+], M-CO+H[1+], M-CO2+H[1+], M-HCOOH+H[1+], M+HCOONa[1+], M-HCOONa+H[1+], M+NaCl[1+], M-C3H4O2+H[1+], M+HCOOK[1+], and M-HCOOK+H[1+].

6.2 Currency Customization

Currency metabolites are abundant substances such as water and carbon dioxide known to occur in normal functioning cells and participate in a large number of metabolic reactions (Huss and Holme, 2007). Because of their ubiquitous nature, removing them can greatly improve pathway analysis and visualization. The list of available currency metabolites can be found here.

By default, the MS Peaks to Paths module considers these metabolites as currency: ‘C00001’, ‘C00080’, ‘C00007’, ‘C00006’, ‘C00005’, ‘C00003’, ‘C00004’, ‘C00002’, ‘C00013’, ‘C00008’, ‘C00009’, ‘C00011’, ‘G11113’, ‘H2O’, ‘H+’, ‘Oxygen’, ‘NADP+’, ‘NADPH’, ‘NAD+’, ‘NADH’, ‘ATP’, ‘Pyrophosphate’, ‘ADP’, ‘Orthophosphate’, and ‘CO2’.

Now we will use another example peak list data obtained from untargeted metabolomics of mice alveolar macrophages in lungs, using an Orbitrap LC-MS (C18 negative ion mode and HILIC positive ion mode). This is an example of the four-column table containing the m.z, mode, p.value, and t.score. We will perform both currency and adduct customization.

# Load MetaboAnalystR
library(MetaboAnalystR)

# Clean global environment
rm(list = ls())
# Create objects for storing processed data
mSet <- InitDataObjects("mass_all", "mummichog", FALSE)
## [1] "MetaboAnalyst R objects initialized ..."
# Set peak formart - contains m/z features, p-values and t-scores
mSet <- SetPeakFormat(mSet, "mpt")
mSet <- UpdateInstrumentParameters(mSet, 5.0, "mixed");
mSet <- Read.PeakListData(mSet, "https://www.metaboanalyst.ca/MetaboAnalyst/resources/data/mummichog_mixed.txt");
mSet <- SanityCheckMummichogData(mSet)

# Customize currency
curr.vec <- c("Water (C00001)", "Proton (C00080)", "Oxygen (C00007)", "NADPH (C00005)",
              "NADP (C00006)", "NADH (C00004)", "NAD (C00003)", "Adenosine triphosphate (C00002)",
              "Pyrophosphate (C00013)","Phosphate (C00009)","Carbon dioxide (C00011)",
              "Hydrogen (C00282)","Hydrogen peroxide (C00027)","Sodium (C01330)");

# Map selected currency to user's data
mSet <- Setup.MapData(mSet, curr.vec);
mSet <- PerformCurrencyMapping(mSet);
## [1] "Loaded files from MetaboAnalyst web-server."
# Now customize adducts
add.vec <- c("M [1+]","M+H [1+]","M+2H [2+]","M+3H [3+]","M+Na [1+]",
             "M+H+Na [2+]","M+K [1+]","M+H2O+H [1+]","M-H2O+H [1+]",
             "M-H4O2+H [1+]","M(C13)+H [1+]","M(C13)+2H [2+]","M(C13)+3H [3+]",
             "M(Cl37)+H [1+]","M-NH3+H [1+]","M-CO+H [1+]","M-CO2+H [1+]",
             "M-HCOOH+H [1+]","M+HCOONa [1+]","M-HCOONa+H [1+]","M+NaCl [1+]",
             "M-C3H4O2+H [1+]","M+HCOOK [1+]","M-HCOOK+H [1+]","M-H [1-]",
             "M-2H [2-]","M-H2O-H [1-]","M-H+O [1-]","M+K-2H [1-]","M+Na-2H [1- ]",
             "M+Cl [1-]","M+Cl37 [1-]","M+HCOO [1-]","M+CH3COO [1-]");

# Set up the selected adducts
mSet <- Setup.AdductData(mSet, add.vec);
mSet <- PerformAdductMapping(mSet, "mixed");
## [1] "Loaded files from MetaboAnalyst web-server."
# Perform mummichog algorithm using selected currency and adducts, using Version1 of the mummichog algorithm
mSet <- SetPeakEnrichMethod(mSet, "mum", "v1");
mSet <- SetMummichogPval(mSet, 1.0E-5);
mSet <- PerformPSEA(mSet, "hsa_mfn", "current", 100);
## [1] "Got 3915 mass features."
## [1] "A total of 1188 of duplicates were merged."
## [1] "A total of 1292 of duplicates were merged."
## [1] "A total of 1188 of duplicates were merged."
## [1] "A total of 1188 of duplicates were merged."
## [1] "Resampling,  100 permutations to estimate background ..."
# Image is not shown below to avoid to be overwhelmed.
mSet <- PlotPeaks2Paths(mSet, "peaks_to_paths_2_", "png", 72, width=NA)

7. Customizing Empirical Compound Formation

By default, V2 of the MS Peaks to Pathways algorithms enforces primary ions to be present when creating Empirical Compounds (step 3 above). Users can disable this in the UpdateInstrumentParameters function by setting the “force_primary_ion” parameter to “no”. Additionally, by default the retention-time window used to split Empirical Compounds is calculated as the maximum retention time in the user’s data multiplied by the retention time fraction (default is 0.02). Users can either change the retention time fraction (rt_frac) or set the retention time-window (rt_tol - in seconds). Code details are shown below.

# Disable force primary ion
mSet <- UpdateInstrumentParameters(mSet, instrumentOpt, msModeOpt, force_primary_ion ="no", rt_frac =0.02,rt_tol =NA)
# Change retention time fraction when calculating the retention time window
mSet <- UpdateInstrumentParameters(mSet, instrumentOpt, msModeOpt,force_primary_ion ="yes", rt_frac =0.025, rt_tol =NA)
# Set the retention time window (in seconds)
mSet <- UpdateInstrumentParameters(mSet, instrumentOpt, msModeOpt, force_primary_ion ="yes", rt_frac =0.02, rt_tol =25)

8. Functional Analysis with GSEA

# Load MetaboAnalystR
library(MetaboAnalystR)

# Clean global environment
rm(list = ls())
# Create objects for storing processed data

mSet<-InitDataObjects("mass_all", "mummichog", FALSE)
## [1] "MetaboAnalyst R objects initialized ..."
mSet<-SetPeakFormat(mSet, "mpt")
mSet<-UpdateInstrumentParameters(mSet, 5.0, "negative", "yes", 0.02);
mSet<-Read.PeakListData(mSet, "https://www.metaboanalyst.ca/MetaboAnalyst/resources/data/mummichog_ibd.txt");
mSet<-SetRTincluded(mSet, "no")
mSet<-SanityCheckMummichogData(mSet)

## Set method as "GSEA"
mSet<-SetPeakEnrichMethod(mSet, "gsea", "v2")
mSet<-PerformPSEA(mSet, "hsa_mfn", "current", 3 , 100)
## [1] "Got 4187 mass features."
## [1] "A total of 2865 of duplicates were merged."
## [1] "A total of 2719 of duplicates were merged."
## [1] "A total of 2865 of duplicates were merged."
## [1] "A total of 2865 of duplicates were merged."
mSet<-PlotPeaks2Paths(mSet, "peaks_to_paths_3_", "png", 72, width=NA)