Select your comparison


OWL Stat App has been successfully tested on Google-Chrome 41.0, Mozilla Firefox 36.0.1, Safari 5.1.7 for Windows and IOS 8.1.3.

OWL Stat App


An interactive web-application for univariate and multivariate metabolomics data analysis


OWL Stat App is a Shiny-based Web application, accessible independently of the operating system and without the need to install programs locally. It has been implemented entirely in the R language (R v.3.1.1; R Development Core Team, 2011; http://cran.r-project.org). All calculations are performed with caret package to classification training and ROCR package for visualizing classifier performance in R. The pheatmap package is used for drawing heatmaps.

This app combines the R-based analytical tools with metabolite identification and pathway mapping tools, overlaying the users data on the pathway mapping libraries of SMPDB (The Small Molecule Pathway Database) and pathway outputs originally developed in our laboratory.



Manual



The analysis and plots in this application can be configured from the Settings tab. Here, you can find a description of the different configuration options:

Groups See Abbreviations.

Volcano Plot The volcano plot can be customized through the following settings. The plot will be recalculated after any change in the Groups selection panel.
  • Groups: Check/uncheck the groups you want to show/hide on the Volcano plot.
  • Volcano - Range of X-axis: Using the slider bar, you can enter the minimum and maximum value for the X-axis range.
  • Volcano - Range of Y-axis: Using the slider bar, you can enter the minimum and maximum value for the Y-axis range.
  • Volcano - Dot size: The marker size can be set using this slider bar.
  • Volcano - Axis: Other settings for Volcano plot's plotting area are:
    • Horizontal and vertical lines: Places a certain number of grid lines on the plot.
    • p-value = 0.05: Plots a horizontal dashed red line showing where p-value = 0.05 is.
    • p-value = 0.01: Plots a horizontal dashed red line showing where p-value = 0.01 is.
    • p-value = 0.001: Plots a horizontal dashed red line showing where p-value = 0.001 is.
    • Plain figure: All points are plotted in black dots, showing no distinction between groups.

Transformation The different transformations that can be applied to variables in the different analyses are shown in this combo box field. The default value is y = x. Changing this value, all the plots are recalculated. The values that can be chosen are:
  • y = x
  • y = x2
  • y = x1/2
  • y = 1/x
  • y = 1/x2
  • y = 1/x1/2
  • y = log(x+1)
  • y = sh(x)

Boxplot (Distribution plot window) Boxplot customization options are:
  • Boxplot - Color Groups: This panel allows setting the colors and marker's shapes in which comparison groups are represented in the Boxplot analysis. Two drop down menus will be presented for each comparison group.
  • Boxplot - Samples:
    • Show sample distribution: This check box controls whether the sample distribution will be shown on the Boxplot. The width of the area in which the distribution is plotted can be changed through the sliding bar below.

Correlation Plot
  • Correlation between samples: Different values for the correlation coefficients that can be used in the correlation plot can be selected here.
    • pearson
    • kendall
    • spearman
  • Distance: The distance that is used in the correlation plot can be changed in this drop down menu.
    • euclidean
    • maximum
    • manhattan
    • minkowski

Fold-change
  • Fold-change - Subsample: The value selected in this sliding bar sets the percentage of samples in the subsample of each group.
  • Fold-change - Repeat: Sets the number of times the process is repeated.

PCA plot PCA plot settings are the following:
  • X component: sets the component plotted in the X axis.
  • Y component: sets the component plotted in the Y axis.
  • Scale: Scaling is applied to the plot if this option is checked.
  • Horizontal and vertical lines: Sets whether grid lines are to be plotted.
  • Show code: Selecting this field, sample codes are shown.

Heatmap Samples Heatmap plot (multivariate analysis) settings are:
  • Plot width: Plot width can be adjusted with this field.
  • Plot height: Plot height can be adjusted with this field.
  • Rows clustered: This field sets whether rows in the heatmap are clustered or not.
  • Columns clustered: This field sets whether columns in the heatmap are clustered or not.

Heatmap Metabolites
  • Plot width: Plot width can be adjusted with this field.
  • Plot height: Plot height can be adjusted with this field.


Abbreviations



  • AA Amino acids
  • AC Acylcarnitines
  • ArAA Aromatic amino acids
  • BA Bile acids
  • BCAA Branched chain amino acids
  • Cer Ceramides
  • ChoE Cholesteryl esters
  • CMH Monohexosylceramides
  • DAG Diacylglycerols
  • FAA Fatty acid amides (Primary Fatty Amides)
  • FFA Free fatty acids (Non-esterified fatty acids)
  • FFAox Oxidized fatty acids
  • FSB Free sphingoid bases
  • MAG Monoacylglycerides
  • MUFA Monounsaturated fatty acids
  • NAE N-acyl ethanolamines
  • OPLS Orthogonal partial least-squares to latent structures
  • PC Phosphatidylcholines
  • PCA Principal Component Analysis
  • PE Phosphatidylethanolamines
  • PG Phosphatidylglycerols
  • PI Phosphatidylinositols
  • PUFA Polyunsaturated fatty acids
  • SFA Saturated fatty acids
  • SM Sphingomyelins
  • TAG Triacylglycerols
  • UFA Unsaturated fatty acids
  • UPLC®-MS Ultra performance liquid chromatography-mass spectrometry


News


1st July, 2015 | 11th International Conference of the Metabolomics Society


Poster presented at the 11th International Conference of the Metabolomics Society. San Francisco, California. June 29th 2015 to July 2nd 2015.

OWLStatApp OWL Stat App - An interactive web-application for univariate and multivariate metabolomics data analysis
Ibon Martínez-Arranz | Maite Gutiérrez-Calzada | David Balgoma | Cristina Alonso
Metabolomics research has evolved considerably, particularly during the last decade. Over the course of this evolution, the interest in this omic discipline is now more evident than ever. However, the future of metabolomics will depend on its capability to find biomarkers. For that reason, data mining constitutes a challenging task in metabolomics workflow, being a time-consuming issue which usually requires detailed knowledge of bioinformatics, statistics and specialized software. We have developed OWL Stat App, an easy-to-use web application for metabolomics data analysis. It combines powerful univariate and multivariate data analysis with pathway mapping tools and visualization capacities to facilitate interpretation of the results.

8th May, 2015 | Science+


OWL Stat App, the web application for metabolomics data analysis is presented during the Science+ meeting.





OWL metabolomics


OWL, formerly known as OWL GENOMICS, is a biotechnology company founded in 2002 with the mission to contribute to the monitoring and diagnosis of human or animal health. OWL was born upon the knowledge and scientific developments of Dr. Jose Maria Mato, founding partner and Scientific Director of the company.

The activity of the company is centered in the area of health, with pioneering applications in the international scientific panorama, and whose objective is to identify, validate, patent and commercialize diagnostic and/or prognostic systems, as well as therapeutic targets involved in the development of complex diseases.

OWL has developed a comprehensive set of Metabolomics tools, which makes possible the discovery and identification of biomarkers for either research or diagnostics purposes. It is a state of the art technology that combines ultra performance liquid chromatography with mass spectrometry (UPLC-MS), and that allows OWL to offer a novel metabolomics service, with potential clients in hospitals, research centers, and biotechnology and pharmaceutical industries.

Thus, OWL has established a leading position in personalized medicine for liver disease, specifically, non-alcoholic steatohepatitis (NASH) and is currently introducing the first in vitro serum based diagnostic for NAFLD and NASH, based on the studies on hepatic diseases carried out by Dr. Mato. The current clinical diagnosis for NASH is based on liver biopsy, an invasive and costly procedure. The company has developed a simple blood test to be used for NASH diagnosis. An early diagnosis seems to be the best way to stop the progression of the disease by changing their life style and monitoring patient evolution.

Besides, OWL has qualified professionals to complete and develop present and future new research lines in a wide range of pathologies, offering a variety of innovative products of excellence.

Since its foundation OWL has obtained full support from its main share holder, Cross Road Biotech, who bring their management skills and biotechnology business knowledge into OWL.

Furthermore, a number of strategic alliances with biotechnology and bioinformatics companies, hospitals, research centers and universities has also enhanced OWL's leading position.

On the other hand the leading position of OWL would not have been possible without an important network of collaborators, for which a number of strategic alliances with biotechnology and bioinformatics companies, hospitals, research centers and universities has been formed.


References



Barr, Jonathan, J. Caballería, I. Martínez-Arranz, A. Domínguez-Díez, C. Alonso, J. Muntané, M. Pérez-Cormenzana, et al. 2012. Obesity-Dependent Metabolic Signatures Associated with Nonalcoholic Fatty Liver Disease Progression. J Proteome Res 11 (4). OWL, Derio, Bizkaia, Spain.: 2521-32. doi:10.1021/pr201223p. http://dx.doi.org/10.1021/pr201223p.

Barr, Jonathan, Mercedes Vázquez-Chantada, Cristina Alonso, Miriam Pérez-Cormenzana, Rebeca Mayo, Asier Galán, Juan Caballería, et al. 2010. Liquid Chromatography-Mass Spectrometry-Based Parallel Metabolic Profiling of Human and Mouse Model Serum Reveals Putative Biomarkers Associated with the Progression of Nonalcoholic Fatty Liver Disease. J Proteome Res 9 (9). OWL, Bizkaia Technology Park, 48160-Derio, Bizkaia, Spain.: 4501-12. doi:10.1021/pr1002593. http://dx.doi.org/10.1021/pr1002593.

Martínez-Arranz, Ibon, Rebeca Mayo, Miriam Pérez-Cormenzana, Itziar Mincholé, Lorena Salazar, Cristina Alonso, and José M. Mato. 2015. Enhancing Metabolomics Research Through Data Mining. J Proteomics, doi:10.1016/j.jprot.2015.01.019. http://dx.doi.org/10.1016/j.jprot.2015.01.019.

Genz, Alan, and Frank Bretz. 2009. Computation of Multivariate Normal and T Probabilities. Lecture Notes in Statistics. Heidelberg: Springer-Verlag.

Genz, Alan, Frank Bretz, Tetsuhisa Miwa, Xuefei Mi, Friedrich Leisch, Fabian Scheipl, and Torsten Hothorn. 2014. mvtnorm: Multivariate Normal and T Distributions. http://CRAN.R-project.org/package=mvtnorm.

Gesmann, Markus, and Diego de Castillo. 2011. googleVis: Interface Between R and the Google Visualisation API. The R Journal 3 (2): 40-44. http://journal.r-project.org/archive/2011-2/RJournal_2011-2_Gesmann+de~Castillo.pdf.

Jarek, Slawomir. 2012. mvnormtest: Normality Test for Multivariate Variables. http://CRAN.R-project.org/package=mvnormtest.

Mevik, Bjørn-Helge, and Ron Wehrens. 2007. The Pls Package: Principal Component and Partial Least Squares Regression in R. Journal of Statistical Software 18 (2): 1-24. http://www.jstatsoft.org/v18/i02.

Mevik, Bjørn-Helge. 2006. The Pls Package. R News 6 (3): 12-17. http://CRAN.R-project.org/doc/Rnews/.

Mevik, Bjørn-Helge, Ron Wehrens, and Kristian Hovde Liland. 2013. pls: Partial Least Squares and Principal Component Regression. http://CRAN.R-project.org/package=pls.

R Core Team. 2014. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/.

RStudio Team. 2012. RStudio: Integrated Development Environment for R. Boston, MA: RStudio, Inc. http://www.rstudio.com/.

RStudio, and Inc. 2014. shiny: Web Application Framework for R. http://CRAN.R-project.org/package=shiny.

Xie, Yihui. 2013. Dynamic Documents with R and Knitr. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.

knitr: A General-Purpose Package for Dynamic Report Generation in R. http://yihui.name/knitr/.

knitr: A Comprehensive Tool for Reproducible Research in R. In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595/.



Contact



Thank you for your interest in our application for metabolomics data analysis!

If you are looking for information about OWL Stat App you can contact us by email at owlstatapp@owlmetabolomics.com.

OWL is a trading name of
ONE WAY LIVER, S.L
Parque Tecnológico de Bizkaia
Edificio 502 - Planta 0
48160 Derio - Bizkaia - Spain
Phone: +34 94 431 85 40
Fax: +34 94 431 71 40


How to cite


Please, cite OWL in your Material and Methods as "Barr et al. J Proteome Res. 2012;11;2521-32" for sample preparation and UHPLC-MS analysis; and "Martínez-Arranz et al. J Proteomics. 2015;127(B):275-88" for data processing.
Groups

volcano plot

Volcano - Range of X-axis
Volcano - Range of Y-axis
Volcano - Dot size
Volcano - Axis

Transformation

Boxplot (Distribution plot window)

Boxplot - Color Groups
Boxplot - Samples

Fold-change

PCA plot

Heatmap Samples

Heatmap Metabolites

Pathway plot

Correlation Plot

Metabolite Network

Volcano plot

In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data. It plots significance versus fold-change on the y- and x-axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a statistical test (e.g., p-value, ANOVA) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also statistically significant.

The volcano plot is an effective and easy-to-interpret graph that summarizes both fold-change and t-test criteria. It is a scatter-plot of the negative log10-transformed p-values from the t test against the log2 fold change.

Metabolites with statistically significant differential levels according to the t-test will lie above a horizontal threshold line. Metabolites with large fold-change values will lie far from the vertical threshold line at log2 fold change=0, indicating also if the metabolite is up or down-regulated.

Selecting a metabolite in this plot, its description, distribution between the two groups (boxplot), ROC analysis and pathways in which it is involved are shown in the following windows

Description of the selected metabolite in volcano plot



HMDB: The Human Metabolome Database

The Human Metabolome Database (HMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body. It is intended to be used for applications in metabolomics, clinical chemistry, biomarker discovery and general education.

KEGG: Kyoto Encyclopedia of Genes and Genomes

KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies

Structure

SMPDB: The Small Molecule Pathway Database

SMPDB (The Small Molecule Pathway Database) is an interactive, visual database containing more than 618 small molecule pathways found in humans. More than 70% of these pathways (>433) are not found in any other pathway database. SMPDB is designed specifically to support pathway elucidation and pathway discovery in metabolomics, transcriptomics, proteomics and systems biology.


SMPDB is offered to the public as a freely available resource. Use and re-distribution of the data, in whole or in part, for commercial purposes requires explicit permission of the authors and explicit acknowledgment of the source material (SMPDB) and the original publication (see below). We ask that users who download significant portions of the database cite the SMPDB paper in any resulting publications.

1. Wishart DS, Frolkis A, Knox C, et al., SMPDB: The Small Molecule Pathway Database. Nucleic Acids Res. 2010 Jan;38(Database issue):D480-7.
2. Jewison T, Su Y, Disfany FM, et al., SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database Nucleic Acids Res. 2013 Submitted.

Boxplot

In descriptive statistics, a box plot is a convenient way of graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers may be plotted as individual points.



Box plots display differences between populations without making any assumptions of the underlying statistical distribution: they are non-parametric. The spacings between the different parts of the box help indicate the degree of dispersion (spread) and skewness in the data, and identify outliers. In addition to the points themselves, they allow one to visually estimate various L-estimators, notably the interquartile range, midhinge, range, mid-range, and trimean.

Histogram

In statistics, a histogram is a graphical representation of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram is a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area proportional to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size. The rectangles of a histogram are drawn so that they touch each other to indicate that the original variable is continuous.

Density



Normal Q-Q Plot

In statistics, a Q-Q plot (Q stands for quantile) is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles is chosen. A point (x,y) on the plot corresponds to one of the quantiles of the second distribution (y-coordinate) plotted against the same quantile of the first distribution (x-coordinate). Thus the line is a parametric curve with the parameter which is the (number of the) interval for the quantile.

If the two distributions being compared are similar, the points in the Q-Q plot will approximately lie on the line y = x. If the distributions are linearly related, the points in the Q-Q plot will approximately lie on a line, but not necessarily on the line y = x. Q-Q plots can also be used as a graphical means of estimating parameters in a location-scale family of distributions.

A Q-Q plot is used to compare the shapes of distributions, providing a graphical view of how properties such as location, scale, and skewness are similar or different in the two distributions. Q-Q plots can be used to compare collections of data, or theoretical distributions. The use of Q-Q plots to compare two samples of data can be viewed as a non-parametric approach to comparing their underlying distributions.



Shapiro-Wilk Normality Test

perfoms the Shapiro-Wilk test of normality


                  

F Test to Compare Two Variances

Performs an F test to compare the variances of two samples from normal populations.


                  

Student's t-Test

Performs one and two sample t-tests on vectors of data.


                

Outlier Analysis



Boxplot without outliers

In descriptive statistics, a box plot is a convenient way of graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers may be plotted as individual points.



Box plots display differences between populations without making any assumptions of the underlying statistical distribution: they are non-parametric. The spacings between the different parts of the box help indicate the degree of dispersion (spread) and skewness in the data, and identify outliers. In addition to the points themselves, they allow one to visually estimate various L-estimators, notably the interquartile range, midhinge, range, mid-range, and trimean.

Histogram

In statistics, a histogram is a graphical representation of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram is a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area proportional to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size. The rectangles of a histogram are drawn so that they touch each other to indicate that the original variable is continuous.

Density



Normal Q-Q Plot

In statistics, a Q-Q plot (Q stands for quantile) is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles is chosen. A point (x,y) on the plot corresponds to one of the quantiles of the second distribution (y-coordinate) plotted against the same quantile of the first distribution (x-coordinate). Thus the line is a parametric curve with the parameter which is the (number of the) interval for the quantile.

If the two distributions being compared are similar, the points in the Q-Q plot will approximately lie on the line y = x. If the distributions are linearly related, the points in the Q-Q plot will approximately lie on a line, but not necessarily on the line y = x. Q-Q plots can also be used as a graphical means of estimating parameters in a location-scale family of distributions.

A Q-Q plot is used to compare the shapes of distributions, providing a graphical view of how properties such as location, scale, and skewness are similar or different in the two distributions. Q-Q plots can be used to compare collections of data, or theoretical distributions. The use of Q-Q plots to compare two samples of data can be viewed as a non-parametric approach to comparing their underlying distributions.

fold-change distribution

fold-change

ROC Analysis

In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is created by plotting the fraction of true positives out of the total actual positives (TPR = true positive rate) vs. the fraction of false positives out of the total actual negatives (FPR = false positive rate), at various threshold settings. TPR is also known as sensitivity or recall in machine learning. The FPR is also known as the fall-out and can be calculated as one minus the more well known specificity. The ROC curve is then the sensitivity as a function of fall-out. In general, if both of the probability distributions for detection and false alarm are known, the ROC curve can be generated by plotting the Cumulative Distribution Function (area under the probability distribution from -inf to +inf) of the detection probability in the y-axis versus the Cumulative Distribution Function of the false alarm probability in x-axis.

Confusion matrix

Calculates a cross-tabulation of observed and predicted classes with associated statistics


                
SMPDB is offered to the public as a freely available resource. Use and re-distribution of the data, in whole or in part, for commercial purposes requires explicit permission of the authors and explicit acknowledgment of the source material (SMPDB) and the original publication (see below). We ask that users who download significant portions of the database cite the SMPDB paper in any resulting publications.

1. Wishart DS, Frolkis A, Knox C, et al., SMPDB: The Small Molecule Pathway Database. Nucleic Acids Res. 2010 Jan;38(Database issue):D480-7.
2. Jewison T, Su Y, Disfany FM, et al., SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database Nucleic Acids Res. 2013 Submitted.
Lipid biosynthesis, Non-esterified fatty acids (NEFA), Primary fatty amides (FAA), N-Acyl ethanolamines (NAE), glycerol 3-phosphate (G3P), phosphatidic acids (PA), phosphatidylinositols (PI), lysophosphatidylinositols (LPI), monoacylglycerides (MAG), diacylglycerides (DAG), triacylglycerides (TAG), acyl carnitines (AC), unesterified cholesterol (UC), cholesterol sulfate (CS), cholesteryl esters (CE), steroids (ST) phosphatidylserines (PS), phosphatidylethanolamines (PE), lysophosphatidylethanolamines (LPE), phosphatidylcholines (PC), lysophosphatidylcholines (LPC), phosphatidylglycerols (PG), lysophosphatidylglycerols (LPG), cardiolipins (CL), ceramides (Cer), sphingomyelins (SM), S-adenosylmethionine (SAMe). Areas in orange represent processes carried out in the mitochondria.

de novo lipogenesis, Fatty acid synthase (FAS), long-chain elongase (LCE), stearoyl-CoA desaturase (SCD), delta-6 desaturase (Δ6D), delta-5 desaturase (Δ5D), elongase (ELOVL).

Biosynthetic pathway of n-3 and n-6 fatty acids, Preformed DHA (22:6n-3) and EPA (22:5n-3) can be obtained directly from the diet. COX-2 mediated metabolism of DHA and arachidonic acid yields anti-inflammatory docosenoids and proinflammatory eicosanoids, respectively.
Diacylglycerides (DAG), delta-6 desaturase (Δ6D), delta-5 desaturase (Δ5D), elongase (ELOVL), Β-oxidation (Β-ox), cyclooxygenase-2 (COX-2), phosholipases (PL).

Citric acid cycle and catabolism of proteinogenic amino acids, Amino acids have been classified according the abilities of their products to enter gluconeogenesis: glucogenic amino acids have this ability (in red), ketogenic amino acids do not (in orange). These products may still be used for ketogenesis or lipid synthesis. Some amino acids are catabolized into both glucogenic and ketogenic products (in purple). Fatty acid (FA), nicotinamide adenine dinucleotide (NAD+, NADH), flavin adenine dinucleotide (FAD+, FADH2), guanosine diphosphate (GDP), guanosine triphosphate (GTP), phosphoenolpyruvate (PEP).

Citric acid cycle and catabolism of proteinogenic amino acids, Amino acids have been classified according the abilities of their products to enter gluconeogenesis: glucogenic amino acids have this ability (in red), ketogenic amino acids do not (in orange). These products may still be used for ketogenesis or lipid synthesis. Some amino acids are catabolized into both glucogenic and ketogenic products (in purple). Fatty acid (FA), nicotinamide adenine dinucleotide (NAD+, NADH), flavin adenine dinucleotide (FAD+, FADH2), guanosine diphosphate (GDP), guanosine triphosphate (GTP), phosphoenolpyruvate (PEP).

Methionine Cycle, S-adenosylmethionine (SAMe), S-adenosylhomocysteine (SAH), homocysteine (Hcy), reduced or oxidized glutathione (GSH, GSSG) phosphatidylethanolamines (PE), phosphatidylcholines (PC), 5-methylenetetrahydrofolate (5-MTHF), dimethylglycine (DMG). Methionine adenosyltransferase 1-alpha (MAT1A), Glycine N-methyltransferase (GNMT), phosphatidylethanolamine methyltransferase (PEMT).

Glycolysis and Pentose Phosphate, Phosphate (P), bisphosphate (BP), glyceraldehyde 3-phosphate (G3P), dihydroxyacetone phosphate (DHAP), phosphoglycerate (PG), phosphoenolpyruvate (PEP), 6-phosphogluconolactone (6PGL), 6-phosphogluconate (6PG), nicotinamide adenine dinucleotide P (NAD+, NADH), D-ribulose 5-phosphate (Ru5P), D-ribose 5-phosphate (R5P), xylulose 5-phosphate (Xu5P), sedoheptulose 7-phosphate (Su7P), D-erythrose 4-phosphate (E4P).

Bile acid synthesis, Unesterified cholesterol (UC), cholic acid (CA), chenodeoxycholic acid (CDCA), lithocholic acid (LCA), glycine-conjugated bile acid (G-), taurine-conjugated bile acid (T-).

Bile acid synthesis, Unesterified cholesterol (UC), cholic acid (CA), chenodeoxycholic acid (CDCA), lithocholic acid (LCA), glycine-conjugated bile acid (G-), taurine-conjugated bile acid (T-).

Biosynthesis of amino acids, This map presents a modular architecture of the biosynthesis pathways of twenty amino acids, which may be viewed as consisting of the core part and its extensions. The core part is the pathway module for conversion of three-carbon compounds from glyceraldehyde-3P to pyruvate [MD:M00002], together with the pathways around serine and glycine. The pathway module is the most conserved one in the KEGG MODULE database and is found in almost all the completely sequenced genomes. The extensions are the pathways containing the reaction modules RM001 and RM002 for biosynthesis of branched-chain amino acids (left) and basic amino acids (bottom), and the pathways for biosynthesis of histidine and aromatic amino acids (top right). It is interesting to note that the so-called essential amino acids that cannot be synthesized in human and other organisms generally appear in these extensions. Furthermore, the bottom extension of basic amino acids appears to be most divergent containing multiple pathways for lysine biosynthesis and multiple gene sets for arginine biosynthesis.

Principal component analysis (PCA)

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding components. Principal components are guaranteed to be independent if the data set is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables.

Matrix of variable loadings

Matrix of variable loadings

Standard deviations of the principal components