Correlation analysis heat map: ggplot2 package draws environmental factors and species correlation heat map/bubble map

Hello everyone, today I will introduce how to use R language to analyze the correlation between environmental factors and species, and use the ggplot2 package to draw heat maps and bubble maps to visualize the results of correlation analysis.

The correlation heatmap style was inspired by an article in Microbiome magazine:

Step1: Correlation analysis between environmental factors and species data

rm(list=ls())
pacman::p_load(tidyverse, reshape2, psych)

# load environment/species data
env <- read.delim('env_table.txt', sep = '\t', row.names = 1)
taxonomy <- read.delim('spe_table.txt', sep = '\t', row.names = 1)
env <- env[rownames(taxonomy), ]

# Perform correlation analysis
r_result <- corr.test(taxonomy, env, method = 'spearman')

r <-
  r_result$r %>%
  melt() %>%
  set_names(c('tax', 'Index', 'r'))

p <-
  r_result$p %>%
  melt() %>%
  set_names(c('tax', 'Index', 'P_value')) %>%
  mutate(P_value_sig = case_when(P_value > 0.05 ~ " ",
                             P_value <= 0.05 & P_value > 0.01 ~ "*",
                             P_value <= 0.01 & P_value > 0.001 ~ "**",
                             P_value <= 0.001 ~ "***",
                             TRUE ~ NA_character_))

data <- cbind(r,p) %>% select(-(4:5))
head(data)

In the above code: we loaded the required R package and read the environmental factors and species data. We then performed a correlation analysis using the corr.test function of the psych package and saved the results in the r_result variable. Next, we sorted out the data of correlation and p-values, and constructed “***” and marked them according to the different ranges of P-values; after that, we used the melt function to reshape the data for subsequent visualization.

Step2: A simple heatmap display of the overall correlation of all samples

# Drawing theme
theme <- theme_bw(base_size = 7) +
  theme(panel. grid. major = element_blank(),
        text = element_text(face='bold'),
        legend.key.size = unit(5, "pt"),
        panel.grid.minor = element_blank(),
        axis.ticks.y = element_blank(),
        axis.text.x=element_text(angle=45, vjust=1, hjust=1))

ggplot(data, aes(x=tax, y=Index)) +
  geom_tile(aes(fill=r), color = 'white', alpha = 0.6) +
  geom_text(aes(label = P_value_sig), color = 'black', size = 2) +
  ggsci::scale_fill_gsea() +
  theme(legend. position="top") +
  theme+
  xlab('') +
  ylab('')
ggsave('heatmap_corr.png', height = 2, width = 2)

In this code, we create a basic plot object using the ggplot function and draw a heatmap using the geom_tile function. With the geom_text function, we also add the corresponding p-value significance markers. This heatmap will show correlations between environmental factors and species, with color fills to indicate the significance of the p-values.

Next, we define a function cal to calculate the correlation between species and environmental factors within different groups. code show as below:

Step3: Define group correlation calculation function

# Define a function to calculate the correlation between species and environmental factors in different groups
cal <- function(group){
  # Extract grouped sample names
  g <- grep(group, rownames(env), value = T)
  # Perform group correlation analysis
  r_result <- corr.test(taxonomy[g, ], env[g, ], method = 'spearman')
  # extract the r value
  r <-
    r_result$r %>%
    melt() %>%
    set_names(c('tax', 'Index', 'r'))
  # extract the p-value
  p <-
    r_result$p %>%
    melt() %>%
    set_names(c('tax', 'Index', 'P_value')) %>%
    mutate(P_value_sig = case_when(P_value > 0.05 ~ " ",
                                   P_value <= 0.05 & P_value > 0.01 ~ "*",
                                   P_value <= 0.01 & P_value > 0.001 ~ "**",
                                   P_value <= 0.001 ~ "***",
                                   TRUE ~ NA_character_))
  # merge data
  data <- cbind(r,p) %>%
    select(-(4:5)) %>%
    mutate(group)
  data
}

cal('A') %>% head()
cal('B') %>% head()
cal('C') %>% head()

Step4: Correlation heat map (“*” marks significance)

# data merge
data <- rbind(cal('A'), cal('B'), cal('C'))
# Specify cover order
data$group <- factor(data$group, levels = c('A', 'B', 'C'), ordered=TRUE)
# plot
ggplot(data,aes(x=Index, y=tax)) +
  geom_tile(aes(fill=r), color = 'white', alpha = 0.6) +
  geom_text(aes(label = P_value_sig), color = 'black', size = 2) +
  ggsci::scale_fill_gsea() +
  theme(legend. position="top") +
  theme+
  facet_grid( ~ group, scales = 'free') +
  labs(y = 'Taxonomy', x = 'Environment', title = 'Spearman Correlation Plot', size = '|rho|', color = 'rho')
ggsave('heatmap_corr1.png', height = 2, width = 6)

Step5: Correlation heat map (P value and “*” mark significance at the same time)

# data collation
data <- rbind(cal('A'), cal('B'), cal('C')) %>%
  mutate(P_value_sig2 = paste0(round(P_value, 2), ' ', P_value_sig))

theme_heatmap <-
  theme_bw(base_size = 7) +
  theme(panel. grid. major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.key.size = unit(5, "pt"),
        panel. border = element_blank(),
        axis.ticks.x = element_blank(),
        axis.ticks.y = element_blank(),
        text = element_text(face='bold'),
        axis.text.x = element_text(angle=45, vjust=1, hjust=1))

ggplot(data,aes(x=Index, y=tax)) +
  geom_tile(aes(fill=r), alpha = 0.6) +
  geom_text(aes(label = P_value_sig2), color = 'black', size = 1) +
  ggsci::scale_fill_gsea() +
  theme_heatmap +
  coord_flip() +
  facet_grid( ~ group, scale = 'free') +
  labs(y = 'Taxonomy', x = 'Environment', title = 'Spearman Correlation Plot', size = '|rho|', color = 'rho')
ggsave('heatmap_corr2.png', height = 2.5, width = 6)

Finally, we can display the correlation analysis results by drawing a beautiful bubble chart:

Step6: The bubble chart displays the correlation analysis results

ggplot(data,aes(x=Index, y=tax)) +
  geom_point(aes(color = r, size = abs(r))) +
  geom_text(aes(label = P_value_sig2), color = 'black', size = 1) +
  scale_size_area(max_size = 5) +
  ggsci::scale_color_material('teal') +
  theme_heatmap +
  theme(legend. key. size = unit(6, "pt"),
        strip. background = element_rect(fill = '#99CCCC')) +
  facet_wrap(~group,strip. position = 'bottom') +
  labs(y = 'Taxonomy', x = 'Environment', title = 'Spearman Correlation Plot', size = '|rho|', color = 'rho')
ggsave('heatmap_corr3.png', height = 2, width = 6)

Keyword “Correlation Analysis Chart” to obtain code and data