|
MetaboAnalystR PackageMetaboAnalystR package is synchronized with the MetaboAnalyst website and is designed for metabolomics researchers who are comfortable using R coding platform. In this MetaboAnalystR 4.0, an unified metabolomics analysis workflow from LC-MS/MS raw spectral processing to a more accurate functional interpretation has been established. The following tutorials are meant to complement our web-based functions by providing step-by-step instructions for several of the most common tasks using the R package. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Contents
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Overview1.1 IntroductionMetaboAnalystR 4.0 contains the R functions and libraries underlying the popular MetaboAnalyst website,
including metabolomic data analysis, visualization, and functional interpretation.
The version 4 aims to improve the current global metabolomics workflow by implementing an unified LC-MS/MS workflow from global metabolomics data into functional insights.
Here we introduce MetaboAnalystR 4.0, an open-source R package that have been developed to provide a unified workflow to help address three key bioinformatics
bottlenecks facing LC-MS-based global metabolomics, including:
1) auto-optimized LC-MS spectral processing for feature detection and quantification;
2) streamlined MS/MS spectral deconvolution and compound annotation coupled with comprehensive spectral reference databases (~1.5 million MS2 spectra);
3) a sensitive functional interpretation module for functional analysis directly from LC-MS and MS/MS results.
Read here for more details on the basic design and rules of MetaboAnalystR 4.0. 1.2 InstallationStep 1. Install package dependenciesTo use MetaboAnalystR 4.0, first install all package dependencies. Ensure that you have necessary system environment configured. For Linux (e.g. Ubuntu 18.04/20.04): libcairo2-dev, libnetcdf-dev, libxml2, libxt-dev and libssl-dev should be installed at frist; For Windows (e.g. 7/8/8.1/10): Rtools should be installed. For Mac OS: In order to compile R for Mac OS, you need Xcode and GNU Fortran compiler installed (https://mac.r-project.org/tools/). We suggest you follow these steps: https://thecoatlessprofessor.com/programming/cpp/r-compiler-tools-for-rcpp-on-macos/ to help with your installation. R base with version > 4.0 is required. The compatibility of latest version (v4.2) is under evaluation. As for installation of package dependencies, there are two options: Option 1 Enter the R function (metanr_packages) and then use the function. A printed message will appear informing you whether or not any R packages were installed. Function to download packages:
Usage of function:
Option 2 Use the pacman R package (for those with >R 3.5.1).
Step 2. Install the packageMetaboAnalystR 4.0 is freely available from GitHub. The package documentation, including the vignettes for each module and user manual is available within the downloaded R package file. You can install the MetaboAnalylstR 3.0 via any of the three options: A) using the R package devtools, B) cloning the github, C) manually downloading the .tar.gz file. Note that the MetaboAnalystR 3.2 github will have the most up-to-date version of the package. Option A) Install the package directly from github using the devtools package. Open R and enter:Due to issues with Latex, some users may find that they are only able to install MetaboAnalystR 3.2 without any documentation (i.e. vignettes).
Option B) Install from a pre-built source package
Option C) Clone Github and install locallyThe * must be replaced by what is actually downloaded and built.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2. Analysis Utilities
MetaboAnalystR has been designed to synchronize with MetaboAnalyst website for comprehensive metabolomics analysis.
LC-MS/MS Raw Spectral Analysis, Functional Analysis of GLBiomarker Analysis, Enrichment Analysis, Meta-Analysis,
Pathway Analysis, Integrated Pathway Analysis, Power Analysis Module, Time Series or Two Factor Design,
Network Explorer Module, MS Peaks to Pathways, Batch effect correction etc. can be easily achieved in website.
2.1 LC-MS/MS Raw Spectra Processing
Liquid chromatography coupled to high-resolution mass spectrometry platforms are increasingly employed to comprehensively
measure metabolome changes in systems biology and complex diseases. Over the past decade, several powerful computational pipelines
have been developed for spectral processing, annotation, and analysis. However, significant obstacles remain with regard to parameter
settings, computational efficiencies, spectral deconvolution and compound identification.
A detailed vignette has been prepared here to showcase how to perform LC-MS/MS Raw Spectra Processing with MetaboAnalystR 4.0. 2.2 Functional Analysis of Global Metabolomics
Tools for functional interpretation of global metabolomics data is in general lacking or poorly addressed.
A prerequisite for metabolomics data interpretation is metabolite identification, thereby permitting the contextualization of
annotated peaks in metabolic pathways and their integration with other omics data.
A step by step vignette has been prepared here on how to perform functional analysis for global metabolomics data with MetaboAnalystR 4.0. 2.3 Statistical Analysis (one-factor)
In metabolomics studies, it is often assumed that most observed changes in metabolite concentrations or spectral profiles are a result of normal physiological
variations (background noise) and that only a small proportion of these changes are associated with the experimental condition of interest.
Identifying these “key” features is typically the first step toward finding useful biomarkers or understanding the biological processes involved in the
condition under investigation. A variety of approaches have been developed for these tasks, with the majority based on classical univariate statistical methods.
MetaboAnalyst and MetaboAnalystR supports three common feature selection approaches: Read here for more details on a step-by-step statistical analysis with MetaboAnalystR 4.0. 2.4 Enrichment Analysis of Targeted Metabolomics
The enrichment analysis module performs metabolite set enrichment analysis (MSEA) for human and mammalian species based on several
libraries containing ~6300 groups of metabolite sets. Users can upload either
Read here for more details on how to perform Enrichment Analysis with MetaboAnalystR 4.0. 2.5 Pathway Analysis of Targeted MetabolomicsThe pathway analysis module supports pathway analysis (integrating enrichment analysis and pathway topology analysis) and visualization for 21 model organisms, including Human, Mouse, Rat, Cow, Chicken, Zebrafish, Arabidopsis thaliana, Rice, Drosophila, Malaria, S. cerevisae, E.coli, and others, with a total of ~1600 metabolic pathways. Read here for more details on the basic design and rules of MetaboAnalystR 4.0. 2.6 Biomarker AnalysisThe metabolome is well-known to be a sensitive measure of health and disease, reflecting alterations to the genome, proteome, and transcriptome, as well as changes in life-style and environment. As such, one common goal of metabolomic studies is biomarker discovery, aiming to identify a metabolite or a set of metabolites capable of classifying conditions or disease, with high sensitivity (true-positive rate) and specificity (true negative rate). This is achieved through building predictive models of one or multiple metabolites and evaluating the performance, or robustness of the model, to classify new patients into diseased or healthy categories. The main steps for biomarker analysis are as follows: Detailed tutorial of Pathway Analysis can be downloaded here. 2.7 Statistical Analysis (Metadata table)
Metadata describes the data, and contains details on the experimental conditions, sample sources (i.e., species, tissue), sample collection
(i.e., location, time) and other factors. Such metadata are critical for data interpretation, because they allow researchers to account for the
biological and environmental context when they analyze the data, and facilitate data reuse by allowing other researchers to search for, and meaningfully
compare and potentially integrate, results from across diverse studies. Details on the context and sample source are becoming increasingly important
as observational studies that collect omics data from human populations or animals outside laboratory settings are becoming more common.
Detailed tutorial of Statistical Analysis (Metadata table) will be available soon 2.8 Joint-Pathway AnalysisThis module performs integrated metabolic pathway analysis on results obtained from combined metabolomics and gene expression studies conducted under the same experimental conditions. This approach exploits KEGG metabolic pathway models to complete the analysis. The underlying assumption behind this module is that by combining evidence from both changes in gene expression and metabolite concentrations, one is more likely to pinpoint the pathways involved in the underlying biological processes. To this end, users need to supply a list of genes and metabolites of interest that have been identified from the same samples or obtained under similar conditions. The metabolite list can be selected from the results of a previous analysis downloaded from MetaboAnalyst. Similarly, the gene list can be easily obtained using many excellent web-based tools such as GEPAS or INVEX. After users have uploaded their data, the genes and metabolites are then mapped to KEGG metabolic pathways for over-representation analysis and pathway topology analysis. Topology analysis uses the structure of a given pathway to evaluate the relative importance of the genes/compounds based on their relative location. Clicking on the name of a specific pathway will generate a graphical representation of that pathway highlighted with the matched genes/metabolites. Users must keep in mind that unlike transcriptomics, where the entire transcriptome is routinely mapped, current metabolomic technologies only capture a small portion of the metabolome. This difference can lead to potentially biased results. To address this issue, the current implementation of this omic integration module allows users to explore the enriched pathways based either on joint evidence or on the evidence obtained from one particular omic platform for comparison. Detailed tutorial of Integrated Pathway Analysis can be downloaded here. 2.9 Functional Meta-AnalysisIt is notoriously challenging to integrate untargeted metabolomics data across different studies, because different extraction methods, chromatographic conditions and mass spectrometry platforms all lead to heterogeneity of HRMS data. This issue has precluded the use of untargeted metabolomics datasets for large-scale meta-analysis using conventional statistical methods. To address this gap, we have developed a new module to enable researchers to perform functional meta-analysis of global metabolomics datasets. Detailed tutorial of Functional Meta-Analysis will be available soon 2.10 Network AnalysisBiological processes are driven by a complex web of interactions amongst numerous molecular entities of a biological system. The classical method of pathway analysis is unable to identify important associations or interactions between molecules belonging to different pathways. Network analysis is therefore commonly used to address this limitation. Here, the aim of the Network Explorer module is to provide an easy-to-use tool to that allows users to map their metabolites and/or genes onto different networks for novel insights or development of new hypotheses. Mapping of both metabolites and genes are supported in this module (including KOs), whereby either entity can be projected onto five existing biological networks including the KEGG global metabolic network, the gene-metabolite interaction network, the metabolite-disease interaction network, the metabolite-metabolite interaction network, and the metabolite-gene-disease interaction network. The last four networks are created based on information gathered from HMDB and STITCH and are applicable to human studies only. Users can upload either a list of metabolites, a list of genes, or a list of both metabolites and genes. MetaboAnalystR currently accepts compound names, HMDB IDs, KEGG compound IDs as metabolite identifiers. As well, we only accept Entrez IDs, ENSEMBL IDs, Official Gene Symbols, or KEGG Orthologs (KO) as gene identifiers. The uploaded list of metabolites and/or genes is then mapped using our internal databases of metabolites and gene annotations. Following this step, users can select which of the five networks to begin visually exploring their data. Detailed tutorial of Network Explorer Module can be downloaded here. 2.11 Power Analysis
The Power analysis module supports sample size estimation and power analysis for designing population-based or clinical metabolomic studies.
As metabolomics is becoming a more accessible and widely used tool, methods to ensure proper experimental design are crucial to allow for accurate and
robust identification of metabolites linked to disease, drugs, environmental or genetic differences. Traditional power analysis methods are unsuitable
for metabolomics data as the high-throughput nature of this data means that it is highly dimensional and often correlated. Further, the number of
metabolites identified greatly outnumbers the sample size. Thus, modified methods of power analysis are needed to address such concerns.
Detailed tutorial of Power Analysis Module can be downloaded here. 2.12 Meta-AnalysisA major challenge in biomarker discovery for disease detection, classification, and monitoring is the validation of potential metabolic markers. Questions have been raised about biomarker consistency and robustness across individual metabolomic studies of the same disease, and the importance of external validation to improve statistical power to validate biomarkers has been recently reviewed. Therefore to address the lack of user-friendly tools for the horizontal integration of metabolomics data, we present a second new module called “Meta-Analysis”. The primary goal of the Meta-Analysis module is to provide a user-friendly and comprehensive tool for the integration of individual metabolomic studies to identify biomarkers of disease. The steps for Meta-Analysis occur as follows: Detailed tutorial of Meta Analysis can be downloaded here. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3. Featured Utilities3.1 Batch Effect CorrectionIntroduction of Batch Effect CorrectionThe batch effect correction analysis module performs an automatic or specific correction on the peak data table generated by peak picking step. Multiple correction methods have been embedded inside. They include combat, WaveICA, EigenMS, QC_RLSC, ANCOVA, RUV_random, RUV_2, RUVseq series (RUV_s, RUV_r and RUV_g), NOMIS, CCMN. The detailed limitation and mathmatic mechanism of different methods have been illustrated in our manuscript. Here, 4 cases is provided to show users a step to step workflow for both batch effect and signal drift correction. Data preparationThe peak table generated by FormatPeakList function needs to be manually prepared by hand to supplements the corresponding information below.
Here is an example table.
3.1.1 Case 1 - IBD Benchmark DataThis Benchmark data is a large batch data with over 600 samples. The data file size is over 70 MB. This part is designed for repeating our published results. Running of this part may take over 30min to finish. You are strongly recommonded to use your own data instead of this large file to avoid the consuming of patience. Data Downloading and Preparation
Library Package
Data Filtering and Normalization
Initializing mSet Object
Data Reading
Automatic Correction
Data visualization - PCA plotting with groups
3.1.2 Case 2 - Signal Drift CorrectionIn addition of the batch effect, signal drift is also an important issue faced by metabolomics study. QC-RLSC can be used to correct both batch effect and signal drift (PMID: 30253838). Here, we showcase an example for user to correct the signal drift with an simulated datatable. There is one point noted: As for batch effect correction in case 1 and case 2, only batch information are needed. But for the signal drift correction, injection order is mandatory to be provided as the 4th column or row. While batch information is optional and should be added at 3rd column or row. If missing, please leave the column or row empty or mark it as one batch. You can prepare your data table follow the following example. Data Downloading and Preparation
Initializing mSet Object
Data Reading
Signal Drift Correction
3.2 Compound name mappingMetaboAnalyst now provides a microservice for users to perform compound name mapping using our comprehensive in-house metabolite database (>200, 000 compounds). In R, to use this microservice, users first must have the httr R package installed. The request must be a POST request, containing the list of compounds, the input compound type, and be sent to api.metaboanalyst.ca/mapcompounds. The code below will show a step-by-step how to perform compound name mapping using the MetaboAnalyst API. For other programming languages, please refer to the APIs page of MetaboAnalyst.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4. Other Analysis |
Do you want to continue your session?