Chapter 11 KEGG Analysis

last updated: 2023-10-27

Package Install

As usual, make sure we have the right packages for this exercise

if (!require("pacman")) install.packages("pacman"); library(pacman)

# temporary install to fix bug in package
pacman::p_install_version("dbplyr", version = "2.3.4")

## 
## Version of dbplyr (v. 2.3.4) is suitable

# let's load all of the files we were using and want to have again today
p_load("tidyverse", "knitr", "readr",
       "pander", "BiocManager", 
       "dplyr", "stringr", "DOSE",
       "purrr", # for working with lists (beautify column names)
       "reactable", # for pretty tables.
       "BiocFileCache" # for saving downloaded data files
       ) 

# a package from github to install (using pacman library to install)
p_install_gh("noriakis/ggkegg")

## Skipping install of 'ggkegg' from a github remote, the SHA1 (0cece6db) has not changed since last install.
##   Use `force = TRUE` to force installation

## 
## The following packages were installed:
## ggkegg

# We also need these Bioconductor packages today.
p_load("edgeR", 
       "AnnotationDbi", "org.Sc.sgd.db", 
       "pathview","clusterProfiler", "ggupset",
       "KEGGgraph", "ggkegg", "patchwork",
       "igraph", "tidygraph", "ggfx")

11.1 Description

In this class exercise, we will explore the use of KEGG pathways in genomic data analysis. KEGG is a valuable resource for understanding biological pathways and functions associated with genomic data.

11.2 Learning Outcomes

At the end of this exercise, you should be able to:

Understand the basics of KEGG pathways.
Learn how to retrieve and analyze pathway information using R.
Apply KEGG analysis to genomic data.
Identify paralog differences within KEGG pathways

library(edgeR)
library(org.Sc.sgd.db)
# for ease of use, set max number of digits after decimal
options(digits=3)

11.3 Loading in the edgeR DE gene file output.

# Choose topTags destination
dir_output_edgeR <-
  path.expand("~/Desktop/Genomic_Data_Analysis/Analysis/edgeR/")

# for this, let's load the res objects as an R data object.
res <- read_rds(file = paste0(dir_output_edgeR, "yeast_res_edgeR.Rds"))

# the below code downloads the file from the internet if you don't have in on your computer
# comment out the above line of code if it gives an error.
if (!exists("res", envir = .GlobalEnv)) {
  # If variable doesn't exist, load from
  url <- "https://github.com/clstacy/GenomicDataAnalysis_Fa23/raw/main/data/yeast_res_all_edgeR.Rds"
  res <- read_rds(url)
  # assign("res", data, envir = .GlobalEnv)
  cat(paste("Loaded res from", url, "\n"))
  rm("url", envir = .GlobalEnv)
} else {
  url <- "https://github.com/clstacy/GenomicDataAnalysis_Fa23/raw/main/data/yeast_res_all_edgeR.Rds"
  cat(paste("res object is already loaded. Skipping loading from", url, "\n"))
  rm("url", envir = .GlobalEnv)
}

## res object is already loaded. Skipping loading from https://github.com/clstacy/GenomicDataAnalysis_Fa23/raw/main/data/yeast_res_all_edgeR.Rds

# let's also get the res object for all of the contrasts at once, (from GitHub)
res_all <- read_rds("https://github.com/clstacy/GenomicDataAnalysis_Fa23/raw/main/data/yeast_res_all_edgeR.Rds")

11.4 KEGG Analysis

Recall that the res object from the edgeR Differential Expression analysis was defined as glmQLFTest(fit, contrast = my.contrasts[,"EtOHvsWT.MSN24ddvsWT"]). In other words, we are looking at the difference in the ethanol stress response of the Msn2/4∆∆ mutant and the WT cells. Recall that positive log2 fold changes correspond to relatively higher expression of that gene in the Msn2/4∆∆ mutant cells in response stress compared to how WT cells change during stress.

11.4.1 Using `kegga()` from limma

One approach is to do a simple KEGG enrichment with the limma package, using the function kegga(). This approach is nice, because we can directly load in the res object we created in the edgeR workflow, set our FDR cutoff, and the process happens automatically. See https://www.kegg.jp/kegg/catalog/org_list.html or https://rest.kegg.jp/list/organism for list of organisms available.

# test for overrepresentation of KEGG pathways in given gene list
k <- kegga(res,
           species.KEGG = "sce",# three-letter KEGG species identifier.
           FDR = 0.01) #FDR cutoff we want in deciding which genes to include.

# we can see the p-value results for both up & down enrichment
k

##                                                         Pathway   N Up Down
## sce00010                           Glycolysis / Gluconeogenesis  51  5   11
## sce00020                              Citrate cycle (TCA cycle)  30  1   10
## sce00030                              Pentose phosphate pathway  27  3    6
## sce00040               Pentose and glucuronate interconversions  10  0    3
## sce00051                        Fructose and mannose metabolism  20  0    7
## sce00052                                   Galactose metabolism  22  0    7
## sce00053                      Ascorbate and aldarate metabolism  11  2    4
## sce00061                                Fatty acid biosynthesis  13  2    1
## sce00062                                  Fatty acid elongation   8  1    0
## sce00071                                 Fatty acid degradation  20  7    5
## sce00100                                   Steroid biosynthesis  18  6    3
## sce00130    Ubiquinone and other terpenoid-quinone biosynthesis  10  0    5
## sce00190                              Oxidative phosphorylation  71  0   27
## sce00220                                  Arginine biosynthesis  16  3    1
## sce00230                                      Purine metabolism  57  9    8
## sce00240                                  Pyrimidine metabolism  31  5    2
## sce00250            Alanine, aspartate and glutamate metabolism  27  5    5
## sce00260               Glycine, serine and threonine metabolism  30  4    5
## sce00261                                Monobactam biosynthesis   3  1    0
## sce00270                     Cysteine and methionine metabolism  43 12    3
## sce00280             Valine, leucine and isoleucine degradation  14  6    4
## sce00290            Valine, leucine and isoleucine biosynthesis  12  8    0
## sce00300                                    Lysine biosynthesis  12  6    0
## sce00310                                     Lysine degradation  16  4    5
## sce00330                        Arginine and proline metabolism  23  3    4
## sce00332                                Carbapenem biosynthesis   3  1    0
## sce00340                                   Histidine metabolism  14  2    3
## sce00350                                    Tyrosine metabolism  13  3    2
## sce00360                               Phenylalanine metabolism   7  2    0
## sce00380                                  Tryptophan metabolism  20  3    6
## sce00400    Phenylalanine, tyrosine and tryptophan biosynthesis  17  3    0
## sce00410                                beta-Alanine metabolism  13  3    6
## sce00430                     Taurine and hypotaurine metabolism   3  0    3
## sce00440                 Phosphonate and phosphinate metabolism   4  1    0
## sce00450                              Selenocompound metabolism  12  2    0
## sce00460                             Cyanoamino acid metabolism   5  0    1
## sce00470                                D-Amino acid metabolism   3  0    1
## sce00480                                 Glutathione metabolism  23  1    5
## sce00500                          Starch and sucrose metabolism  39  1   20
## sce00510                                  N-Glycan biosynthesis  30  2    1
## sce00511                               Other glycan degradation   1  0    1
## sce00513                 Various types of N-glycan biosynthesis  31  4    1
## sce00514                   Other types of O-glycan biosynthesis  14  2    0
## sce00515                     Mannose type O-glycan biosynthesis   7  0    0
## sce00520            Amino sugar and nucleotide sugar metabolism  31  0    7
## sce00561                                Glycerolipid metabolism  31  2   13
## sce00562                          Inositol phosphate metabolism  21  0    1
## sce00563 Glycosylphosphatidylinositol (GPI)-anchor biosynthesis  25  0    2
## sce00564                         Glycerophospholipid metabolism  40  2   10
## sce00565                                 Ether lipid metabolism   8  0    3
## sce00590                            Arachidonic acid metabolism   3  0    2
## sce00592                        alpha-Linolenic acid metabolism   4  0    1
## sce00600                                Sphingolipid metabolism  14  0    3
## sce00620                                    Pyruvate metabolism  48 12   12
## sce00630                Glyoxylate and dicarboxylate metabolism  27  4    6
## sce00640                                  Propanoate metabolism  13  2    1
## sce00650                                   Butanoate metabolism   9  3    4
## sce00660                    C5-Branched dibasic acid metabolism   3  1    0
## sce00670                              One carbon pool by folate  15  2    0
## sce00680                                     Methane metabolism  20  3    3
## sce00730                                    Thiamine metabolism  16  0    1
## sce00740                                  Riboflavin metabolism  16  0    1
## sce00750                                  Vitamin B6 metabolism  11  2    1
## sce00760                 Nicotinate and nicotinamide metabolism  20  2    3
## sce00770                      Pantothenate and CoA biosynthesis  23 10    4
## sce00780                                      Biotin metabolism   6  0    0
## sce00785                                 Lipoic acid metabolism  12  1    2
## sce00790                                    Folate biosynthesis  11  0    1
## sce00791                                   Atrazine degradation   2  0    0
## sce00860                                   Porphyrin metabolism  17  0    2
## sce00900                        Terpenoid backbone biosynthesis  19  5    1
## sce00909          Sesquiterpenoid and triterpenoid biosynthesis   2  0    0
## sce00910                                    Nitrogen metabolism   7  2    1
## sce00920                                      Sulfur metabolism  15  3    1
## sce00970                            Aminoacyl-tRNA biosynthesis  40  4    1
## sce00999    Biosynthesis of various plant secondary metabolites   3  2    0
## sce01040                Biosynthesis of unsaturated fatty acids  10  1    0
## sce01100                                     Metabolic pathways 766 86  132
## sce01110                  Biosynthesis of secondary metabolites 339 56   65
## sce01200                                      Carbon metabolism 104 10   24
## sce01210                        2-Oxocarboxylic acid metabolism  42 13    6
## sce01212                                  Fatty acid metabolism  23  4    1
## sce01230                            Biosynthesis of amino acids 122 32    7
## sce01232                                  Nucleotide metabolism  42  6    3
## sce01240                              Biosynthesis of cofactors 126 18   18
## sce01250                      Biosynthesis of nucleotide sugars  22  0    6
## sce02010                                       ABC transporters  18  1    3
## sce03008                      Ribosome biogenesis in eukaryotes  74 32    1
## sce03010                                               Ribosome 174 61    1
## sce03013                            Nucleocytoplasmic transport  65  8    2
## sce03015                              mRNA surveillance pathway  48  6    1
## sce03018                                        RNA degradation  61  6    3
## sce03020                                         RNA polymerase  31 14    0
## sce03022                            Basal transcription factors  32  2    0
## sce03030                                        DNA replication  31  4    0
## sce03040                                            Spliceosome  75  4    4
## sce03050                                             Proteasome  36  0    0
## sce03060                                         Protein export  22  1    1
## sce03082                     ATP-dependent chromatin remodeling  35  3    1
## sce03083                            Polycomb repressive complex  13  2    2
## sce03250                               Viral life cycle - HIV-1  15  0    1
## sce03410                                   Base excision repair  25  2    0
## sce03420                             Nucleotide excision repair  49  5    0
## sce03430                                        Mismatch repair  20  1    0
## sce03440                               Homologous recombination  21  0    0
## sce03450                             Non-homologous end-joining  10  0    0
## sce04011                         MAPK signaling pathway - yeast 108  7   12
## sce04070                  Phosphatidylinositol signaling system  23  0    1
## sce04111                                     Cell cycle - yeast 129 12    2
## sce04113                                        Meiosis - yeast 117  9   15
## sce04120                         Ubiquitin mediated proteolysis  50  1    1
## sce04122                                    Sulfur relay system   7  0    1
## sce04130              SNARE interactions in vesicular transport  20  0    0
## sce04136                                      Autophagy - other  23  0    1
## sce04138                                      Autophagy - yeast  90  2    8
## sce04139                                      Mitophagy - yeast  40  1    3
## sce04141            Protein processing in endoplasmic reticulum  92  1   17
## sce04144                                            Endocytosis  79  1   11
## sce04145                                              Phagosome  35  0    0
## sce04146                                             Peroxisome  39  2    8
## sce04148                                          Efferocytosis  24  0    4
## sce04213        Longevity regulating pathway - multiple species  38  1   17
## sce04392             Hippo signaling pathway - multiple species   8  2    0
## sce04814                                         Motor proteins  24  2    1
##              P.Up   P.Down
## sce00010 4.59e-01 1.13e-02
## sce00020 9.35e-01 4.71e-04
## sce00030 4.21e-01 4.86e-02
## sce00040 1.00e+00 7.17e-02
## sce00051 1.00e+00 2.47e-03
## sce00052 1.00e+00 4.55e-03
## sce00053 2.47e-01 1.90e-02
## sce00061 3.14e-01 7.50e-01
## sce00062 5.17e-01 1.00e+00
## sce00071 1.02e-03 4.45e-02
## sce00100 3.13e-03 2.71e-01
## sce00130 1.00e+00 1.69e-03
## sce00190 1.00e+00 2.93e-10
## sce00220 1.57e-01 8.18e-01
## sce00230 5.55e-02 2.13e-01
## sce00240 1.27e-01 8.35e-01
## sce00250 7.96e-02 1.30e-01
## sce00260 2.61e-01 1.80e-01
## sce00261 2.39e-01 1.00e+00
## sce00270 2.03e-04 8.24e-01
## sce00280 6.85e-04 4.53e-02
## sce00290 1.11e-06 1.00e+00
## sce00300 2.45e-04 1.00e+00
## sce00310 4.42e-02 1.75e-02
## sce00330 3.23e-01 1.97e-01
## sce00332 2.39e-01 1.00e+00
## sce00340 3.47e-01 1.62e-01
## sce00350 9.69e-02 3.84e-01
## sce00360 1.18e-01 1.00e+00
## sce00380 2.49e-01 1.16e-02
## sce00400 1.79e-01 1.00e+00
## sce00410 9.69e-02 9.52e-04
## sce00430 1.00e+00 1.02e-03
## sce00440 3.05e-01 1.00e+00
## sce00450 2.80e-01 1.00e+00
## sce00460 1.00e+00 4.13e-01
## sce00470 1.00e+00 2.73e-01
## sce00480 8.77e-01 7.52e-02
## sce00500 9.72e-01 9.65e-11
## sce00510 7.49e-01 9.59e-01
## sce00511 1.00e+00 1.01e-01
## sce00513 2.81e-01 9.63e-01
## sce00514 3.47e-01 1.00e+00
## sce00515 1.00e+00 1.00e+00
## sce00520 1.00e+00 3.16e-02
## sce00561 7.65e-01 3.67e-06
## sce00562 1.00e+00 8.94e-01
## sce00563 1.00e+00 7.35e-01
## sce00564 8.74e-01 5.25e-03
## sce00565 1.00e+00 3.90e-02
## sce00590 1.00e+00 2.85e-02
## sce00592 1.00e+00 3.47e-01
## sce00600 1.00e+00 1.62e-01
## sce00620 6.17e-04 2.31e-03
## sce00630 2.03e-01 4.86e-02
## sce00640 3.14e-01 7.50e-01
## sce00650 3.69e-02 8.57e-03
## sce00660 2.39e-01 1.00e+00
## sce00670 3.79e-01 1.00e+00
## sce00680 2.49e-01 3.29e-01
## sce00730 1.00e+00 8.18e-01
## sce00740 1.00e+00 8.18e-01
## sce00750 2.47e-01 6.90e-01
## sce00760 5.29e-01 3.29e-01
## sce00770 9.09e-06 1.97e-01
## sce00780 1.00e+00 1.00e+00
## sce00785 6.65e-01 3.46e-01
## sce00790 1.00e+00 6.90e-01
## sce00791 1.00e+00 1.00e+00
## sce00860 1.00e+00 5.24e-01
## sce00900 2.03e-02 8.68e-01
## sce00909 1.00e+00 1.00e+00
## sce00910 1.18e-01 5.26e-01
## sce00920 1.36e-01 7.98e-01
## sce00970 4.63e-01 9.86e-01
## sce00999 2.13e-02 1.00e+00
## sce01040 5.97e-01 1.00e+00
## sce01100 5.52e-03 3.24e-11
## sce01110 1.16e-06 1.37e-07
## sce01200 4.18e-01 7.56e-05
## sce01210 3.33e-05 2.46e-01
## sce01212 1.34e-01 9.14e-01
## sce01230 6.35e-09 9.70e-01
## sce01232 1.53e-01 8.11e-01
## sce01240 2.36e-02 8.12e-02
## sce01250 1.00e+00 1.88e-02
## sce02010 8.06e-01 2.71e-01
## sce03008 1.21e-15 1.00e+00
## sce03010 3.84e-23 1.00e+00
## sce03013 2.00e-01 9.92e-01
## sce03015 2.35e-01 9.94e-01
## sce03018 4.39e-01 9.54e-01
## sce03020 7.75e-08 1.00e+00
## sce03022 7.80e-01 1.00e+00
## sce03030 2.81e-01 1.00e+00
## sce03040 9.02e-01 9.53e-01
## sce03050 1.00e+00 1.00e+00
## sce03060 8.65e-01 9.04e-01
## sce03082 5.97e-01 9.76e-01
## sce03083 3.14e-01 3.84e-01
## sce03250 1.00e+00 7.98e-01
## sce03410 6.53e-01 1.00e+00
## sce03420 4.24e-01 1.00e+00
## sce03430 8.38e-01 1.00e+00
## sce03440 1.00e+00 1.00e+00
## sce03450 1.00e+00 1.00e+00
## sce04011 8.41e-01 4.09e-01
## sce04070 1.00e+00 9.14e-01
## sce04111 4.47e-01 1.00e+00
## sce04113 6.99e-01 1.99e-01
## sce04120 9.90e-01 9.95e-01
## sce04122 1.00e+00 5.26e-01
## sce04130 1.00e+00 1.00e+00
## sce04136 1.00e+00 9.14e-01
## sce04138 9.97e-01 7.01e-01
## sce04139 9.74e-01 7.84e-01
## sce04141 1.00e+00 9.67e-03
## sce04144 9.99e-01 1.69e-01
## sce04145 1.00e+00 1.00e+00
## sce04146 8.65e-01 3.79e-02
## sce04148 1.00e+00 2.19e-01
## sce04213 9.69e-01 3.55e-08
## sce04392 1.49e-01 1.00e+00
## sce04814 6.30e-01 9.23e-01

Interesting, but note those are in order of the pathway ID, which are the left-most rowname values above. If we want to sort the rows based on their significance, we can use the topKEGG function to do so, like this:

# extract the top KEGG pathways from kegg output
topKEGG(k, sort="down")

##                                                      Pathway   N Up Down
## sce01100                                  Metabolic pathways 766 86  132
## sce00500                       Starch and sucrose metabolism  39  1   20
## sce00190                           Oxidative phosphorylation  71  0   27
## sce04213     Longevity regulating pathway - multiple species  38  1   17
## sce01110               Biosynthesis of secondary metabolites 339 56   65
## sce00561                             Glycerolipid metabolism  31  2   13
## sce01200                                   Carbon metabolism 104 10   24
## sce00020                           Citrate cycle (TCA cycle)  30  1   10
## sce00410                             beta-Alanine metabolism  13  3    6
## sce00430                  Taurine and hypotaurine metabolism   3  0    3
## sce00130 Ubiquinone and other terpenoid-quinone biosynthesis  10  0    5
## sce00620                                 Pyruvate metabolism  48 12   12
## sce00051                     Fructose and mannose metabolism  20  0    7
## sce00052                                Galactose metabolism  22  0    7
## sce00564                      Glycerophospholipid metabolism  40  2   10
## sce00650                                Butanoate metabolism   9  3    4
## sce04141         Protein processing in endoplasmic reticulum  92  1   17
## sce00010                        Glycolysis / Gluconeogenesis  51  5   11
## sce00380                               Tryptophan metabolism  20  3    6
## sce00310                                  Lysine degradation  16  4    5
##              P.Up   P.Down
## sce01100 5.52e-03 3.24e-11
## sce00500 9.72e-01 9.65e-11
## sce00190 1.00e+00 2.93e-10
## sce04213 9.69e-01 3.55e-08
## sce01110 1.16e-06 1.37e-07
## sce00561 7.65e-01 3.67e-06
## sce01200 4.18e-01 7.56e-05
## sce00020 9.35e-01 4.71e-04
## sce00410 9.69e-02 9.52e-04
## sce00430 1.00e+00 1.02e-03
## sce00130 1.00e+00 1.69e-03
## sce00620 6.17e-04 2.31e-03
## sce00051 1.00e+00 2.47e-03
## sce00052 1.00e+00 4.55e-03
## sce00564 8.74e-01 5.25e-03
## sce00650 3.69e-02 8.57e-03
## sce04141 1.00e+00 9.67e-03
## sce00010 4.59e-01 1.13e-02
## sce00380 2.49e-01 1.16e-02
## sce00310 4.42e-02 1.75e-02

Notice I chose to sort on p-value for the “downregulated” genes (aka lower in the mutant). We could change “down” to “up” in the code above to sort on the other column.

11.4.2 Pathway Visualization

So, we’ve found out which KEGG pathways are enriched in our gene list for this comparison. A common thing we will want to do is visualize those enriched pathways. The Bioconductor package pathview allows us to visualize these pathways. If you prefer web based tools, it has a free web version at: https://pathview.uncc.edu

In addition to just the genes that are DE, we can also include information about their changes by coloring the corresponding parts of the graph accordingly.

# this makes sure you have pathview and tries to install it if you don't
if (!requireNamespace("pathview", quietly = TRUE))
    BiocManager::install("pathview")
library("pathview")

# save the logFC values from the res object to a new variable
gene_data_logFC <- res$table %>% dplyr::select(logFC)

# put the ORF names as names for each entry
fold_change_geneList <- setNames(object = data.frame(res)$logFC,
                                 nm = data.frame(res)$ORF)

# we can see what this look like with head()
head(fold_change_geneList)

##  YIL170W  YFL056C  YAR061W  YGR014W  YPR031W  YIL003W 
##  1.00593  0.82290 -0.63346 -0.00538 -0.10371  0.25097

# the pathview command saves the file to your current directory.
# let's create a place for that information to go.
path_to_kegg_images <- "~/Desktop/Genomic_Data_Analysis/Analysis/KEGG/"
if (!dir.exists(path_to_kegg_images)) {
  dir.create(path_to_kegg_images, recursive = TRUE)
}

# move to this place so images save there.
setwd(path_to_kegg_images)

# now, we can run the pathview command.
# you can try changing the pathway.id below to one of the shown examples
# this will let you see the different pathways.
test <- pathview(gene.data  = fold_change_geneList,
                     # pathway.id = "sce01100", #metabolic pathways
                     pathway.id = "sce00010", #glycolysis/gluconeogenesis
                     # pathway.id = "sce03050", # proteasome cycle
                     # pathway.id = "sce00020", # TCA cycle
                     # pathway.id = "sce00500", # starch & sucrose metabolism (trehalose)
                     # pathway.id = "sce00030", #PPP
                     species    = "sce",
                     gene.idtype="orf",
                     # expand.node = TRUE,
                     split.group=T,
                     map.symbol = T,
                     is.signal=F,
                     kegg.native = F, 
                     match.data = T, 
                     node.sum='max.abs', # this determines how to choose which gene to show.
                     multi.state = T, 
                     bins=20,
                     same.layer = F,
                     pdf.size=c(7,7),
                     limit = list(gene=5, cpd=1))

## 'select()' returned 1:1 mapping between keys and columns

## [1] "Note: 148 of 5615 unique input IDs unmapped."

## Info: Getting gene ID data from KEGG...

## Info: Done with data retrieval!

## Warning in .local(from, to, graph): edges replaced: '139|61', '139|62',
## '118|121'

## Info: Working in directory /Users/clstacy/Desktop/Genomic_Data_Analysis/Analysis/KEGG

## Info: Writing image file sce00010.pathview.pdf

Of course, you might want more control over the process. In that case, we can use another package. For example, we could use clusterProfiler that we’ve used before. If you want to learn more, this is a useful guide: https://yulab-smu.top/biomedical-knowledge-mining-book/clusterprofiler-kegg.html. In this analysis, we need to manually select the genes that we want to run KEGG enrichment on, and save that character vector of gene names.

# turn res object into a list of genes
# Recall that these genes are those that are DE between mutant stress response and
# the WT stress response.
DE_genes <- res %>% 
  data.frame() %>%
  filter(PValue < 0.01 & abs(logFC)>0.5
         ) %>%
  pull(ORF)

To run the enrichment, we can use the enrichKEGG() function from the clusterProfiler package. The argument “gene” for this function requires a vector of Entrez gene id’s. Right now we have a vector of gene IDs, but they are the ORF names instead of entrez IDs, so for most organisms we need to convert them.

# how can we find our organism & its code? try this clusterProfiler command:
search_kegg_organism('yeast', by='common_name')

##     kegg_code           scientific_name   common_name
## 708       sce  Saccharomyces cerevisiae budding yeast
## 820       spo Schizosaccharomyces pombe fission yeast

Thankfully, the clusterProfiler package comes with the function bitr (Biological Id TRanslator) to translate geneIDs.

Note: the sce genome database is coded differently than many genome databases, so it requests the ORF instead of the entrezID, so we can directly use the DE_genes vector instead of the entrez ID. If you don’t work with yeast, you’ll probably need to use the entrez ID list for analyses that you do on your own.

# convert gene IDs to entrez IDs
entrez_ids <- bitr(DE_genes, fromType = "ORF", toType = "ENTREZID", OrgDb = org.Sc.sgd.db)

## 'select()' returned 1:1 mapping between keys and columns

# Run our KEGG enrichment
kegg_results <- enrichKEGG(gene = DE_genes, #entrez_ids$ENTREZID, 
                           organism = 'sce' #options: https://www.genome.jp/kegg/catalog/org_list.html
                           )

## Reading KEGG annotation online: "https://rest.kegg.jp/link/sce/pathway"...

## Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/sce"...

# take a peak at the results
head(kegg_results)

##                    category                          subcategory       ID
## sce01110         Metabolism             Global and overview maps sce01110
## sce00500         Metabolism              Carbohydrate metabolism sce00500
## sce04213 Organismal Systems                                Aging sce04213
## sce00620         Metabolism              Carbohydrate metabolism sce00620
## sce01200         Metabolism             Global and overview maps sce01200
## sce00770         Metabolism Metabolism of cofactors and vitamins sce00770
##                                                                                         Description
## sce01110           Biosynthesis of secondary metabolites - Saccharomyces cerevisiae (budding yeast)
## sce00500                   Starch and sucrose metabolism - Saccharomyces cerevisiae (budding yeast)
## sce04213 Longevity regulating pathway - multiple species - Saccharomyces cerevisiae (budding yeast)
## sce00620                             Pyruvate metabolism - Saccharomyces cerevisiae (budding yeast)
## sce01200                               Carbon metabolism - Saccharomyces cerevisiae (budding yeast)
## sce00770               Pantothenate and CoA biosynthesis - Saccharomyces cerevisiae (budding yeast)
##          GeneRatio  BgRatio       pvalue     p.adjust       qvalue
## sce01110   108/385 351/2467 4.483267e-15 4.259103e-13 3.114691e-13
## sce00500    20/385  41/2467 5.180046e-07 1.960553e-05 1.433756e-05
## sce04213    19/385  38/2467 6.191220e-07 1.960553e-05 1.433756e-05
## sce00620    22/385  52/2467 2.906810e-06 6.903673e-05 5.048669e-05
## sce01200    36/385 112/2467 5.890900e-06 1.092427e-04 7.988940e-05
## sce00770    13/385  23/2467 6.899539e-06 1.092427e-04 7.988940e-05
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     geneID
## sce01110 YPL028W/YER091C/YKR067W/YMR169C/YDR178W/YER026C/YKL085W/YGR088W/YNL274C/YIL145C/YJR148W/YBL011W/YJR016C/YHR137W/YLR355C/YBR052C/YPL061W/YFR015C/YML110C/YER073W/YGR155W/YGR060W/YOR347C/YOR374W/YMR108W/YOR108W/YDR256C/YLR258W/YLL041C/YLR432W/YKL035W/YDR074W/YOR176W/YDR300C/YBR001C/YJL200C/YNR001C/YHR063C/YLR372W/YLL062C/YOR184W/YMR105C/YER141W/YAL054C/YKL067W/YPR145W/YIL094C/YCR073W-A/YLR303W/YLR174W/YGR248W/YAL061W/YDR502C/YPR184W/YDL131W/YJR010W/YMR189W/YIL155C/YDR032C/YNR043W/YCR004C/YIL125W/YDR019C/YPR160W/YBR117C/YML126C/YDR001C/YEL011W/YJL052W/YGR255C/YNL280C/YGL009C/YDL166C/YML035C/YDL022W/YDR516C/YMR261C/YKL148C/YHR208W/YAL060W/YNR034W/YGR157W/YBR126C/YEL046C/YBR145W/YER081W/YLR180W/YMR278W/YML100W/YLR153C/YDL182W/YMR250W/YBL068W/YJR073C/YOL058W/YPR026W/YFL030W/YGL256W/YDR148C/YGR205W/YKL141W/YKR058W/YGR256W/YFR053C/YDR234W/YDL021W/YOR136W/YCL040W
## sce00500                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   YFR015C/YLR258W/YKL035W/YDR074W/YBR001C/YMR105C/YGR287C/YPR184W/YPR160W/YDR001C/YEL011W/YDR516C/YMR261C/YBR126C/YMR278W/YML100W/YPR026W/YKR058W/YFR053C/YCL040W
## sce04213                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           YGR088W/YPL203W/YJL164C/YHR008C/YDR256C/YLL024C/YER103W/YDR258C/YFL033C/YPL015C/YLL026W/YDR096W/YBL075C/YOR025W/YAL005C/YGL037C/YNL098C/YJL005W/YJR104C
## sce00620                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   YPL028W/YMR169C/YKL085W/YPL061W/YER073W/YOR347C/YOR374W/YOR108W/YGL062W/YML004C/YAL054C/YDL174C/YDL131W/YDR272W/YBR145W/YEL071W/YLR153C/YDL182W/YBL015W/YML054C/YGL256W/YDR533C
## sce01200                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 YPL028W/YDR178W/YKL085W/YGR088W/YML070W/YOR347C/YDR256C/YLL041C/YGL062W/YNR001C/YOR184W/YAL054C/YCR073W-A/YLR303W/YLR174W/YGR248W/YMR189W/YIL125W/YDR019C/YBR117C/YJL052W/YDR516C/YKL148C/YNR034W/YER081W/YLR153C/YBL068W/YFL030W/YDR148C/YGR205W/YKL141W/YGR256W/YFR053C/YDL021W/YOR136W/YCL040W
## sce00770                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           YMR169C/YIL145C/YJR148W/YJR016C/YLR355C/YPL061W/YER073W/YOR374W/YMR108W/YHR063C/YGL154C/YHR208W/YMR020W
##          Count
## sce01110   108
## sce00500    20
## sce04213    19
## sce00620    22
## sce01200    36
## sce00770    13

# create a table for the html file
data.frame(kegg_results) %>% reactable()

# Remove " - Saccharomyces cerevesiae" from each description entry
kegg_results@result$Description <- kegg_results@result$Description %>% print() %>% str_replace_all(., fixed(" - Saccharomyces cerevisiae"), "")

##  [1] "Biosynthesis of secondary metabolites - Saccharomyces cerevisiae (budding yeast)"                 
##  [2] "Starch and sucrose metabolism - Saccharomyces cerevisiae (budding yeast)"                         
##  [3] "Longevity regulating pathway - multiple species - Saccharomyces cerevisiae (budding yeast)"       
##  [4] "Pyruvate metabolism - Saccharomyces cerevisiae (budding yeast)"                                   
##  [5] "Carbon metabolism - Saccharomyces cerevisiae (budding yeast)"                                     
##  [6] "Pantothenate and CoA biosynthesis - Saccharomyces cerevisiae (budding yeast)"                     
##  [7] "2-Oxocarboxylic acid metabolism - Saccharomyces cerevisiae (budding yeast)"                       
##  [8] "Ribosome biogenesis in eukaryotes - Saccharomyces cerevisiae (budding yeast)"                     
##  [9] "Valine, leucine and isoleucine degradation - Saccharomyces cerevisiae (budding yeast)"            
## [10] "Glyoxylate and dicarboxylate metabolism - Saccharomyces cerevisiae (budding yeast)"               
## [11] "beta-Alanine metabolism - Saccharomyces cerevisiae (budding yeast)"                               
## [12] "Valine, leucine and isoleucine biosynthesis - Saccharomyces cerevisiae (budding yeast)"           
## [13] "Biosynthesis of amino acids - Saccharomyces cerevisiae (budding yeast)"                           
## [14] "RNA polymerase - Saccharomyces cerevisiae (budding yeast)"                                        
## [15] "Ribosome - Saccharomyces cerevisiae (budding yeast)"                                              
## [16] "Tryptophan metabolism - Saccharomyces cerevisiae (budding yeast)"                                 
## [17] "Glycerolipid metabolism - Saccharomyces cerevisiae (budding yeast)"                               
## [18] "Biosynthesis of cofactors - Saccharomyces cerevisiae (budding yeast)"                             
## [19] "Glycine, serine and threonine metabolism - Saccharomyces cerevisiae (budding yeast)"              
## [20] "Citrate cycle (TCA cycle) - Saccharomyces cerevisiae (budding yeast)"                             
## [21] "Lysine degradation - Saccharomyces cerevisiae (budding yeast)"                                    
## [22] "Fatty acid degradation - Saccharomyces cerevisiae (budding yeast)"                                
## [23] "Oxidative phosphorylation - Saccharomyces cerevisiae (budding yeast)"                             
## [24] "Glycolysis / Gluconeogenesis - Saccharomyces cerevisiae (budding yeast)"                          
## [25] "Cysteine and methionine metabolism - Saccharomyces cerevisiae (budding yeast)"                    
## [26] "Ubiquinone and other terpenoid-quinone biosynthesis - Saccharomyces cerevisiae (budding yeast)"   
## [27] "Ascorbate and aldarate metabolism - Saccharomyces cerevisiae (budding yeast)"                     
## [28] "Purine metabolism - Saccharomyces cerevisiae (budding yeast)"                                     
## [29] "Lysine biosynthesis - Saccharomyces cerevisiae (budding yeast)"                                   
## [30] "Alanine, aspartate and glutamate metabolism - Saccharomyces cerevisiae (budding yeast)"           
## [31] "Fructose and mannose metabolism - Saccharomyces cerevisiae (budding yeast)"                       
## [32] "Pentose phosphate pathway - Saccharomyces cerevisiae (budding yeast)"                             
## [33] "Galactose metabolism - Saccharomyces cerevisiae (budding yeast)"                                  
## [34] "Methane metabolism - Saccharomyces cerevisiae (budding yeast)"                                    
## [35] "Lipoic acid metabolism - Saccharomyces cerevisiae (budding yeast)"                                
## [36] "Tyrosine metabolism - Saccharomyces cerevisiae (budding yeast)"                                   
## [37] "Propanoate metabolism - Saccharomyces cerevisiae (budding yeast)"                                 
## [38] "Arginine and proline metabolism - Saccharomyces cerevisiae (budding yeast)"                       
## [39] "Biosynthesis of nucleotide sugars - Saccharomyces cerevisiae (budding yeast)"                     
## [40] "Glycerophospholipid metabolism - Saccharomyces cerevisiae (budding yeast)"                        
## [41] "Peroxisome - Saccharomyces cerevisiae (budding yeast)"                                            
## [42] "Histidine metabolism - Saccharomyces cerevisiae (budding yeast)"                                  
## [43] "Nicotinate and nicotinamide metabolism - Saccharomyces cerevisiae (budding yeast)"                
## [44] "Sulfur metabolism - Saccharomyces cerevisiae (budding yeast)"                                     
## [45] "Nucleotide metabolism - Saccharomyces cerevisiae (budding yeast)"                                 
## [46] "Arginine biosynthesis - Saccharomyces cerevisiae (budding yeast)"                                 
## [47] "Amino sugar and nucleotide sugar metabolism - Saccharomyces cerevisiae (budding yeast)"           
## [48] "Glutathione metabolism - Saccharomyces cerevisiae (budding yeast)"                                
## [49] "Fatty acid metabolism - Saccharomyces cerevisiae (budding yeast)"                                 
## [50] "Pentose and glucuronate interconversions - Saccharomyces cerevisiae (budding yeast)"              
## [51] "Fatty acid biosynthesis - Saccharomyces cerevisiae (budding yeast)"                               
## [52] "Vitamin B6 metabolism - Saccharomyces cerevisiae (budding yeast)"                                 
## [53] "Polycomb repressive complex - Saccharomyces cerevisiae (budding yeast)"                           
## [54] "Pyrimidine metabolism - Saccharomyces cerevisiae (budding yeast)"                                 
## [55] "Meiosis - yeast - Saccharomyces cerevisiae (budding yeast)"                                       
## [56] "MAPK signaling pathway - yeast - Saccharomyces cerevisiae (budding yeast)"                        
## [57] "Steroid biosynthesis - Saccharomyces cerevisiae (budding yeast)"                                  
## [58] "ABC transporters - Saccharomyces cerevisiae (budding yeast)"                                      
## [59] "Efferocytosis - Saccharomyces cerevisiae (budding yeast)"                                         
## [60] "Selenocompound metabolism - Saccharomyces cerevisiae (budding yeast)"                             
## [61] "Terpenoid backbone biosynthesis - Saccharomyces cerevisiae (budding yeast)"                       
## [62] "Other types of O-glycan biosynthesis - Saccharomyces cerevisiae (budding yeast)"                  
## [63] "Sphingolipid metabolism - Saccharomyces cerevisiae (budding yeast)"                               
## [64] "Protein processing in endoplasmic reticulum - Saccharomyces cerevisiae (budding yeast)"           
## [65] "One carbon pool by folate - Saccharomyces cerevisiae (budding yeast)"                             
## [66] "Porphyrin metabolism - Saccharomyces cerevisiae (budding yeast)"                                  
## [67] "RNA degradation - Saccharomyces cerevisiae (budding yeast)"                                       
## [68] "Endocytosis - Saccharomyces cerevisiae (budding yeast)"                                           
## [69] "Biosynthesis of unsaturated fatty acids - Saccharomyces cerevisiae (budding yeast)"               
## [70] "Folate biosynthesis - Saccharomyces cerevisiae (budding yeast)"                                   
## [71] "Various types of N-glycan biosynthesis - Saccharomyces cerevisiae (budding yeast)"                
## [72] "DNA replication - Saccharomyces cerevisiae (budding yeast)"                                       
## [73] "Mitophagy - yeast - Saccharomyces cerevisiae (budding yeast)"                                     
## [74] "Autophagy - other - Saccharomyces cerevisiae (budding yeast)"                                     
## [75] "Motor proteins - Saccharomyces cerevisiae (budding yeast)"                                        
## [76] "Glycosylphosphatidylinositol (GPI)-anchor biosynthesis - Saccharomyces cerevisiae (budding yeast)"
## [77] "Viral life cycle - HIV-1 - Saccharomyces cerevisiae (budding yeast)"                              
## [78] "Cell cycle - yeast - Saccharomyces cerevisiae (budding yeast)"                                    
## [79] "Riboflavin metabolism - Saccharomyces cerevisiae (budding yeast)"                                 
## [80] "Phenylalanine, tyrosine and tryptophan biosynthesis - Saccharomyces cerevisiae (budding yeast)"   
## [81] "Thiamine metabolism - Saccharomyces cerevisiae (budding yeast)"                                   
## [82] "Nucleocytoplasmic transport - Saccharomyces cerevisiae (budding yeast)"                           
## [83] "Autophagy - yeast - Saccharomyces cerevisiae (budding yeast)"                                     
## [84] "N-Glycan biosynthesis - Saccharomyces cerevisiae (budding yeast)"                                 
## [85] "Spliceosome - Saccharomyces cerevisiae (budding yeast)"                                           
## [86] "Inositol phosphate metabolism - Saccharomyces cerevisiae (budding yeast)"                         
## [87] "Homologous recombination - Saccharomyces cerevisiae (budding yeast)"                              
## [88] "Phosphatidylinositol signaling system - Saccharomyces cerevisiae (budding yeast)"                 
## [89] "Base excision repair - Saccharomyces cerevisiae (budding yeast)"                                  
## [90] "Basal transcription factors - Saccharomyces cerevisiae (budding yeast)"                           
## [91] "mRNA surveillance pathway - Saccharomyces cerevisiae (budding yeast)"                             
## [92] "ATP-dependent chromatin remodeling - Saccharomyces cerevisiae (budding yeast)"                    
## [93] "Nucleotide excision repair - Saccharomyces cerevisiae (budding yeast)"                            
## [94] "Ubiquitin mediated proteolysis - Saccharomyces cerevisiae (budding yeast)"                        
## [95] "Aminoacyl-tRNA biosynthesis - Saccharomyces cerevisiae (budding yeast)"

11.5 Visualize on the KEGG website

The first way we might want to visualize this plot is part of a KEGG pathway. We can open an html window with genes that are enriched highlighted, like this. Run one of these lines below and it will open a new window in your browser.

browseKEGG(kegg_results, 'sce00500')  # starch & sucrose metabolism
browseKEGG(kegg_results, 'sce04213')  # longevity

11.6 Comparing Paralogs in common pathways

Using the res_all object instead, we can compare logFC across contrasts. We can create a side by side of the WT EtOH response and the Msn2/4∆∆ EtOH response.

# choose the pathway we want to visualize:

pathway_to_graph <- "sce00010" # glycolysis/gluconeogenesis
# pathway_to_graph <- "sce00020" # TCA cycle
# pathway_to_graph <- "sce04213" # longevity
# pathway_to_graph <- "sce00500" # starch & sucrose metabolism
# pathway_to_graph <- "sce00620" # pyruvate metabolism

# get the ID for the pathway we want to see
pathway_number <- kegg_results %>%
  data.frame() %>%
  mutate(row_number = row_number()) %>%
  filter(ID == pathway_to_graph) %>%
  pull(row_number)


# this saves the kegg list reaction mappings to KEGG_reactions for plotting
# where these came from: "https://rest.kegg.jp/list/reaction/"
if (!exists("KEGG_reactions")) {
  # save the kegglist reaction infromation
  KEGG_reactions <- KEGGREST::keggList("reaction") %>%
    as.data.frame() 
  colnames(KEGG_reactions) <- "long_reaction_description"
  KEGG_reactions <- KEGG_reactions %>%
    tibble::rownames_to_column("reaction") %>%
    dplyr::mutate(reaction_description = str_split_i(
      long_reaction_description, ";", 1)
      ) %>%
    dplyr::select(reaction, reaction_description) %>%
    dplyr::mutate(reaction = paste0('rn:', reaction))
}

# save the kegg_data as a ggkegg object
KEGG_data <- kegg_results %>%
  ggkegg(
    convert_first = FALSE,
    convert_collapse = "\n",
    pathway_number = pathway_number, # changing this to change the pathway.
    convert_org = c("sce"),
    delete_zero_degree = TRUE,
    return_igraph = FALSE
  )

# process thsi data to visualize
graph.data <- KEGG_data$data %>%
  filter(type == "gene") %>%
  mutate(showname = strsplit(name, " ") %>% str_remove_all("sce:")) %>%
  mutate(showname = gsub('c\\(|\\)|"|"|,', '', showname)) %>%
  separate_rows(showname, sep = " ") %>%
  left_join(rownames_to_column(res_all$table),
            by = c("showname" = "rowname")) %>%
  left_join(KEGG_reactions, by = "reaction") %>%
  mutate(gene_name = AnnotationDbi::select(org.Sc.sgd.db, 
                                           keys = showname, 
                                           columns = "GENENAME")$GENENAME) %>%
  mutate(gene_name = coalesce(gene_name, showname))

## 'select()' returned many:1 mapping between keys and columns

# find our fc values needed for color scale
max_fc <-
  ceiling(max(c(
    abs(graph.data$logFC.EtOHvsMOCK.MSN24dd),
    abs(graph.data$logFC.EtOHvsMOCK.WT)
  ), na.rm = TRUE))

# create graph for WT stress response
WT.graph <- graph.data %>%
  ggplot(aes(x = x, y = y)) +
  overlay_raw_map(pathway_to_graph) +
  ggrepel::geom_label_repel(
    aes(label = gene_name, fill = logFC.EtOHvsMOCK.WT),
    box.padding = 0.05,
    label.padding = 0.05,
    direction = "y",
    size = 2,
    max.overlaps = 100,
    label.r = 0.002,
    seed = 123
  ) +
  theme_void() +
  scale_fill_gradientn(
    colours = rev(RColorBrewer::brewer.pal(n = 10, name = "RdYlBu")),
    limits = c(-max_fc, max_fc)
  ) +
  theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) +
  labs(fill = "logFC", title = "      WT Response to Ethanol")

# create graph for mutant stress response
msn24.graph <- graph.data %>%
  ggplot(aes(x = x, y = y)) +
  overlay_raw_map(pathway_to_graph) +
  ggrepel::geom_label_repel(
    aes(label = gene_name, fill = logFC.EtOHvsMOCK.MSN24dd),
    box.padding = 0.05,
    label.padding = 0.05,
    direction = "y",
    size = 2,
    max.overlaps = 100,
    label.r = 0.002,
    seed = 123
  ) +
  theme_void() +
  guides(fill = FALSE) +
  scale_fill_gradientn(
    colours = rev(RColorBrewer::brewer.pal(n = 10, name = "RdYlBu")),
    limits = c(-max_fc, max_fc)
  ) +
  theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) +
  labs(fill = "logFC", title = "      Msn2/4∆∆ Response to Ethanol")



# generate the graph
side_by_side_graph <- WT.graph + msn24.graph +
  patchwork::plot_annotation(tag_levels = 'A')

# display the graph
side_by_side_graph

Once we are happy with our graph. We might want to save it. We can save it like this

# command to save
ggsave(paste0(path_to_kegg_images,pathway_to_graph,".sidebyside.pdf"), 
       plot = side_by_side_graph,
       device = "pdf",
       width=10,
       height=6,
       dpi=300
       )

11.8 Dotplot

We can create a dotplot from this object as shown below. The dotplot shows KEGG categories that are enriched, the adjusted p.values, the number of genes in the KEGG category, and the proportion of genes in the KEGG pathway that are included in the DE gene list.

# Plot the KEGG pathway enrichment results
dotplot(kegg_results, showCategory = 10)

11.9 cnetplot

The dotplot() only shows the most significant or selected enriched terms, while we may want to know which genes are involved in these significant terms. In order to consider the potentially biological complexities in which a gene may belong to multiple annotation categories and provide log2FC information, we can use the cnetplot() function to extract the complex association. The cnetplot() depicts the linkages of genes and KEGG pathways as a network. We can project the network into 2D space, or we can create a circular version of the graph. See each below.

# cnetplot
cnetplot(kegg_results,  
         showCategory=10,
         node_label="category",
         color.params=list(foldChange = fold_change_geneList),
         cex.params=list(gene_label = 0.1,
                         category_label=0.5), 
         max.overlaps=100) +
  # change the color scale
  scale_colour_gradientn(colours = rev(RColorBrewer::brewer.pal(n = 10, name = "RdYlBu")), limits=c(-6, 6)) +
  labs(colour="logFC")

## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

## Warning: Removed 167 rows containing missing values (`geom_text_repel()`).

# circular cnetplot
cnetplot(kegg_results,  
         showCategory=6,
         circular=TRUE, # this changes the output graphing
         node_label="category",
         color.params=list(foldChange = fold_change_geneList),
         cex.params=list(gene_label = 0.5,
                         category_label=0.5), 
         max.overlaps=100) +
  # change the color scale
  scale_colour_gradientn(colours = rev(RColorBrewer::brewer.pal(n = 10, name = "RdYlBu")), limits=c(-6, 6)) +
  labs(colour="logFC")

## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

## Warning: Removed 137 rows containing missing values (`geom_text_repel()`).

## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

11.10 heatplot

The heatplot is similar to cnetplot, while displaying the relationships as a heatmap. The gene-concept network may become too complicated if you want to show a large number significant terms. The heatplot can simplify the result and more easy to identify expression patterns.

# heatplot
heatplot(kegg_results, showCategory=10, foldChange = fold_change_geneList) +
  # the below code hides the messy overlapping gene names.
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

11.11 upsetplot

The upsetplot shows the the overlaps and unique contributions of KEGG pathways across multiple gene sets. It helps to identify which KEGG pathways are shared or distinct between different conditions or experimental groups, providing insights into the common and specific KEGG pathways enriched in each set of genes. Here, the bars show number of genes in the category.

enrichplot::upsetplot(kegg_results)

11.12 emapplot

The emmapplot enrichment map organizes enriched terms into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional modules.

The emapplot function supports results obtained from hypergeometric test and gene set enrichment analysis. We have to first use the pairwise_termsim function from the enrichplot package on the kegg_results object in order to get a similarity matrix.

x2 = enrichplot::pairwise_termsim(kegg_results)
emapplot(x2, 
         showCategory = 30,
         cex.params = list(category_label = 0.4))

11.13 GSEA

KEGG pathway gene set enrichment analysis allows us to identify whether a particular set of genes associated with a KEGG pathway is enriched in a given gene list compared to what would be expected by chance.

# For gseKEGG, we need a gene list with na removed and sorted. let's do that now.
# omit any NA values 
kegg_gene_list = na.omit(fold_change_geneList)

# sort the list in decreasing order by FOLD CHANGE (required for this analysis)
kegg_gene_list = sort(kegg_gene_list, decreasing = TRUE)

gse_kegg <- gseKEGG(geneList     = kegg_gene_list,
               organism     = 'sce',
               # minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)

## Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
## minSize, : For some pathways, in reality P-values are less than 1e-10. You can
## set the `eps` argument to zero for better estimation.

head(gse_kegg)

##                ID
## sce03010 sce03010
## sce03008 sce03008
## sce00500 sce00500
## sce03020 sce03020
## sce04213 sce04213
## sce00520 sce00520
##                                                                                         Description
## sce03010                                        Ribosome - Saccharomyces cerevisiae (budding yeast)
## sce03008               Ribosome biogenesis in eukaryotes - Saccharomyces cerevisiae (budding yeast)
## sce00500                   Starch and sucrose metabolism - Saccharomyces cerevisiae (budding yeast)
## sce03020                                  RNA polymerase - Saccharomyces cerevisiae (budding yeast)
## sce04213 Longevity regulating pathway - multiple species - Saccharomyces cerevisiae (budding yeast)
## sce00520     Amino sugar and nucleotide sugar metabolism - Saccharomyces cerevisiae (budding yeast)
##          setSize enrichmentScore       NES       pvalue     p.adjust
## sce03010     174       0.6197685  2.654732 1.000000e-10 1.010000e-08
## sce03008      74       0.6595249  2.519163 3.984710e-10 2.012279e-08
## sce00500      39      -0.8569869 -2.240346 9.406530e-10 3.166865e-08
## sce03020      31       0.7228139  2.305050 3.938205e-06 9.943967e-05
## sce04213      38      -0.7695484 -2.008026 1.063327e-05 2.147920e-04
## sce00520      31      -0.7964672 -2.004683 1.740995e-05 2.930675e-04
##                qvalue rank                   leading_edge
## sce03010 7.894737e-09 1481 tags=68%, list=26%, signal=52%
## sce03008 1.572912e-08 1094 tags=68%, list=19%, signal=55%
## sce00500 2.475403e-08  319  tags=49%, list=6%, signal=46%
## sce03020 7.772772e-05 1091 tags=68%, list=19%, signal=55%
## sce04213 1.678937e-04  615 tags=50%, list=11%, signal=45%
## sce00520 2.290783e-04  166  tags=29%, list=3%, signal=28%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  core_enrichment
## sce03010 YLR406C/YIL069C/YKL156W/YNL002C/YDR471W/YMR194W/YOR234C/YGR085C/YBR189W/YPL198W/YGR214W/YDL082W/YFL034C-A/YNL096C/YHL033C/YJL177W/YIL133C/YNL067W/YHL001W/YOL121C/YOR312C/YLR448W/YDR025W/YOR293W/YPL079W/YLR061W/YJL190C/YML063W/YDR447C/YGL076C/YDR500C/YLR344W/YLR048W/YER131W/YOR096W/YDR418W/YML026C/YDL083C/YNL301C/YEL054C/YBL027W/YGR148C/YDL136W/YBR181C/YER117W/YMR142C/YNL162W/YML073C/YER102W/YLR325C/YKR057W/YBR191W/YLR441C/YGR034W/YFR031C-A/YDL075W/YKR094C/YER074W/YCR003W/YHR010W/YLR009W/YPL090C/YHR141C/YGL123W/YLL045C/YER056C-A/YHR203C/YMR230W/YDR405W/YOR063W/YMR242C/YJL136C/YLR167W/YGR118W/YLR264W/YNL302C/YPR102C/YKL180W/YPR132W/YBR084C-A/YLR029C/YPL249C-A/YDR450W/YJL191W/YJR123W/YLR388W/YDL191W/YBL072C/YIL052C/YMR143W/YDR064W/YNL178W/YPL081W/YBL087C/YNL069C/YOL127W/YGL031C/YBL092W/YOL120C/YML024W/YPR043W/YJR145C/YOR182C/YDL133C-A/YLR333C/YBR048W/YFR032C-A/YLR340W/YHL004W/YPL143W/YIL018W/YLR367W/YGL030W/YDL202W/YML025C/YJR094W-A/YGL189C/YJL189W/YLR185W
## sce03008                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         YOR119C/YDR449C/YNL132W/YLR106C/YLL034C/YLR186W/YJL069C/YOR310C/YKL186C/YLR175W/YLR197W/YDL014W/YHR170W/YNL282W/YDL208W/YHR089C/YPL093W/YDL166C/YOR056C/YGR030C/YNL075W/YCR057C/YJL109C/YLR222C/YML093W/YHR148W/YGR128C/YOL010W/YLR129W/YLR409C/YLR022C/YHR196W/YDR339C/YPL217C/YMR239C/YBL018C/YOR048C/YDR324C/YLR397C/YGR090W/YBR167C/YEL026W/YPL043W/YNL207W/YBR257W/YDR398W/YJR002W/YGL099W/YIL035C/YNR053C
## sce00500                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 YBR001C/YOL157C/YKR058W/YPR026W/YCL040W/YMR261C/YPR184W/YDR074W/YDR001C/YBR126C/YKL035W/YDR516C/YEL011W/YLR258W/YFR015C/YMR105C/YPR160W/YML100W/YFR053C
## sce03020                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               YOR340C/YPR010C/YNL113W/YKL144C/YPR110C/YOR341W/YNL151C/YNL248C/YJL148W/YJR063W/YBR154C/YHR143W-A/YIL021W/YOR207C/YOR224C/YOR210W/YPR187W/YDL150W/YDR045C/YKR025W/YGL070C
## sce04213                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 YER103W/YFL033C/YDR256C/YPL203W/YDR096W/YLL024C/YJL005W/YHR008C/YMR037C/YKL062W/YNL098C/YPL015C/YJL164C/YBL075C/YAL005C/YLL026W/YGL037C/YDR258C/YGR088W
## sce00520                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 YML087C/YMR084W/YMR278W/YCL040W/YKL150W/YKL035W/YDR516C/YMR105C/YFR053C

# let's turn this into a searchable table for the knit file.
data.frame(gse_kegg) %>% reactable()

# shorten the gse_kegg descriptions that are too long
gse_kegg@result$Description <- gse_kegg@result$Description %>% 
  str_replace_all(., fixed(" - Saccharomyces cerevisiae"), "")

11.13.1 `gseKEGG` vs `enrichKEGG`

What is the difference, and when to use each?

Both gseKEGG and enrichKEGG functions are used for gene set enrichment analysis (GSEA) focusing on KEGG pathways, but they have different underlying methodologies and purposes.

gseKEGG:

Methodology: Gene Set Test (GSE): gseKEGG employs a gene set test approach. It calculates an enrichment score for each KEGG pathway based on the distribution of genes within the pathway in the ranked list of genes (usually by their differential expression values). Permutations: The method involves permutations to assess the statistical significance of the enrichment scores.

Output: The output is a GSEAResult object containing information about the enriched KEGG pathways, including their names, enrichment scores, and p-values.

Use Case: gseKEGG is suitable when you have a ranked list of genes based on some criteria (e.g., differential expression) and want to perform GSEA to identify KEGG pathways associated with these genes.
enrichKEGG:

Methodology: Over-Representation Analysis (ORA): enrichKEGG uses an over-representation analysis approach. It tests whether a predefined set of genes associated with a KEGG pathway is overrepresented in a given gene list compared to what would be expected by chance. Hypergeometric Test: The statistical significance is often assessed using a hypergeometric test.

Output: The output is a data frame containing information about the enriched KEGG pathways, including names, p-values, and other statistics.

Use Case: enrichKEGG is appropriate when you have a gene list and want to identify KEGG pathways that are significantly enriched in that list. It doesn’t require a ranked list of genes.

Which to Choose:

If you have a ranked list of genes (e.g., based on differential expression) and want to perform GSEA, use `gseKEGG`.

If you have a gene list and want to identify overrepresented pathways without ranking genes, use `enrichKEGG`.

In summary, the choice between gseKEGG and enrichKEGG depends on your data and the analysis you want to perform. Both methods can provide insights into the functional enrichment of gene sets (KEGG pathways, in this case) in the context of different experimental conditions.

11.13.2 GSEA visualization

We can also visualize the outputs of gseKEGG using the same functions as above.

dotplot(gse_kegg, showCategory = 3, title = "Enriched Pathways" , split=".sign") + facet_grid(.~.sign)

# enrichplots of the gseaResult object
enrichplot::upsetplot(gse_kegg)

There are also visualizations specific to the gseaResult output of gseKEGG, shown below. We can see where in the rankings the genes belonging to a given KEGG group are found relative to the entire ranked list. We can change the geneSetID value to get other KEGG pathways.

# Visualize the ranking of the genes by set
gseaplot(gse_kegg, geneSetID = 1, by = "preranked", title = gse_kegg$Description[1])

gseaplot(gse_kegg, geneSetID = 2, by = "preranked", title = gse_kegg$Description[2])

gseaplot(gse_kegg, geneSetID = 3, by = "preranked", title = gse_kegg$Description[3])

gseaplot(gse_kegg, geneSetID = 4, by = "preranked", title = gse_kegg$Description[4])

In addition to the plot above, we can create plots with more details by not included the by="preranked" argument. We can now see the calculated enrichment score across the list for this gene set.

# take a closer look at the RNA polymerase genes
gseaplot(gse_kegg, geneSetID = 4, title = gse_kegg$Description[4])

What if we want to look at multiple together KEGG pathways simultaneously? We can do that with gseaplot2 from the enrichplot package. Here we look at the first 3 KEGG pathways together.

# gseaplot2 multiple KEGG pathway enrichments
enrichplot::gseaplot2(gse_kegg, geneSetID = 1:3, pvalue_table = TRUE,
          color = c("#E495A5", "#86B875", "#7DB0DD"), ES_geom = "dot")

11.14 Questions

Can you explain the concept of gene set enrichment analysis and its relevance in functional genomics?
How would you interpret the results obtained from clusterProfiler, specifically focusing on enriched terms and associated p-values?
Compare and contrast KEGG enrichment with other tools or methods used for functional enrichment analysis.
Can you find the KEGG organism for the organism that you are interested in?

Be sure to knit this file into a pdf or html file once you’re finished.

System information for reproducibility:

pander::pander(sessionInfo())

R version 4.3.1 (2023-06-16)

Platform: aarch64-apple-darwin20 (64-bit)

locale: en_US.UTF-8||en_US.UTF-8||en_US.UTF-8||C||en_US.UTF-8||en_US.UTF-8

attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: NbClust(v.3.0.1), factoextra(v.1.0.7), ggfx(v.1.0.1), patchwork(v.1.1.3), ggkegg(v.0.99.4), tidygraph(v.1.2.3), XML(v.3.99-0.14), ggraph(v.2.1.0), KEGGgraph(v.1.62.0), ggupset(v.0.3.0), pathview(v.1.42.0), BiocFileCache(v.2.10.0), dbplyr(v.2.3.4), DOSE(v.3.28.0), ggrepel(v.0.9.4), viridis(v.0.6.4), viridisLite(v.0.4.2), scales(v.1.2.1), Glimma(v.2.12.0), DESeq2(v.1.41.12), edgeR(v.3.99.5), limma(v.3.58.0), reactable(v.0.4.4), webshot2(v.0.1.1), statmod(v.1.5.0), Rsubread(v.2.16.0), ShortRead(v.1.60.0), GenomicAlignments(v.1.38.0), SummarizedExperiment(v.1.32.0), MatrixGenerics(v.1.14.0), matrixStats(v.1.0.0), Rsamtools(v.2.18.0), GenomicRanges(v.1.54.0), Biostrings(v.2.70.1), GenomeInfoDb(v.1.38.0), XVector(v.0.42.0), BiocParallel(v.1.36.0), Rfastp(v.1.12.0), org.Sc.sgd.db(v.3.18.0), AnnotationDbi(v.1.64.0), IRanges(v.2.36.0), S4Vectors(v.0.40.0), Biobase(v.2.62.0), BiocGenerics(v.0.48.0), clusterProfiler(v.4.10.0), ggVennDiagram(v.1.2.3), tidytree(v.0.4.5), igraph(v.1.5.1), janitor(v.2.2.0), BiocManager(v.1.30.22), pander(v.0.6.5), knitr(v.1.44), here(v.1.0.1), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.0), dplyr(v.1.1.3), purrr(v.1.0.2), readr(v.2.1.4), tidyr(v.1.3.0), tibble(v.3.2.1), ggplot2(v.3.4.4), tidyverse(v.2.0.0) and pacman(v.0.5.1)

loaded via a namespace (and not attached): fs(v.1.6.3), bitops(v.1.0-7), enrichplot(v.1.22.0), HDO.db(v.0.99.1), httr(v.1.4.7), RColorBrewer(v.1.1-3), Rgraphviz(v.2.46.0), tools(v.4.3.1), utf8(v.1.2.4), R6(v.2.5.1), lazyeval(v.0.2.2), GetoptLong(v.1.0.5), withr(v.2.5.1), gridExtra(v.2.3), textshaping(v.0.3.7), cli(v.3.6.1), Cairo(v.1.6-1), scatterpie(v.0.2.1), labeling(v.0.4.3), sass(v.0.4.7), systemfonts(v.1.0.5), yulab.utils(v.0.1.0), gson(v.0.1.0), rstudioapi(v.0.15.0), RSQLite(v.2.3.1), generics(v.0.1.3), gridGraphics(v.0.5-1), hwriter(v.1.3.2.1), crosstalk(v.1.2.0), GO.db(v.3.18.0), Matrix(v.1.6-1.1), interp(v.1.1-4), fansi(v.1.0.5), abind(v.1.4-5), lifecycle(v.1.0.3), yaml(v.2.3.7), snakecase(v.0.11.1), qvalue(v.2.34.0), SparseArray(v.1.2.0), grid(v.4.3.1), blob(v.1.2.4), promises(v.1.2.1), crayon(v.1.5.2), lattice(v.0.22-5), cowplot(v.1.1.1), chromote(v.0.1.2), KEGGREST(v.1.42.0), magick(v.2.8.1), pillar(v.1.9.0), fgsea(v.1.28.0), rjson(v.0.2.21), codetools(v.0.2-19), fastmatch(v.1.1-4), glue(v.1.6.2), ggfun(v.0.1.3), data.table(v.1.14.8), remotes(v.2.4.2.1), vctrs(v.0.6.4), png(v.0.1-8), treeio(v.1.26.0), gtable(v.0.3.4), reactR(v.0.5.0), cachem(v.1.0.8), xfun(v.0.40), S4Arrays(v.1.2.0), mime(v.0.12), RVenn(v.1.1.0), interactiveDisplayBase(v.1.40.0), ellipsis(v.0.3.2), nlme(v.3.1-163), ggtree(v.3.10.0), bit64(v.4.0.5), filelock(v.1.0.2), rprojroot(v.2.0.3), bslib(v.0.5.1), colorspace(v.2.1-0), DBI(v.1.1.3), tidyselect(v.1.2.0), processx(v.3.8.2), bit(v.4.0.5), compiler(v.4.3.1), curl(v.5.1.0), graph(v.1.80.0), DelayedArray(v.0.28.0), bookdown(v.0.36), shadowtext(v.0.1.2), rappdirs(v.0.3.3), digest(v.0.6.33), rmarkdown(v.2.25), htmltools(v.0.5.6.1), pkgconfig(v.2.0.3), jpeg(v.0.1-10), fastmap(v.1.1.1), rlang(v.1.1.1), GlobalOptions(v.0.1.2), htmlwidgets(v.1.6.2), shiny(v.1.7.5.1), farver(v.2.1.1), jquerylib(v.0.1.4), jsonlite(v.1.8.7), GOSemSim(v.2.28.0), RCurl(v.1.98-1.12), magrittr(v.2.0.3), GenomeInfoDbData(v.1.2.11), ggplotify(v.0.1.2), munsell(v.0.5.0), Rcpp(v.1.0.11), ggnewscale(v.0.4.9), ape(v.5.7-1), stringi(v.1.7.12), zlibbioc(v.1.48.0), MASS(v.7.3-60), AnnotationHub(v.3.10.0), plyr(v.1.8.9), org.Hs.eg.db(v.3.18.0), parallel(v.4.3.1), HPO.db(v.0.99.2), deldir(v.1.0-9), graphlayouts(v.1.0.1), splines(v.4.3.1), hms(v.1.1.3), locfit(v.1.5-9.8), ps(v.1.7.5), reshape2(v.1.4.4), BiocVersion(v.3.18.0), evaluate(v.0.22), latticeExtra(v.0.6-30), tzdb(v.0.4.0), tweenr(v.2.0.2), httpuv(v.1.6.12), polyclip(v.1.10-6), ggforce(v.0.4.1), xtable(v.1.8-4), MPO.db(v.0.99.7), later(v.1.3.1), ragg(v.1.2.6), websocket(v.1.4.1), aplot(v.0.2.2), memoise(v.2.0.1) and timechange(v.0.2.0)