Model diversification dynamics

This vignette presents the different options available to model diversification dynamics with BAMM within deepSTRAPP.

BAMM (Bayesian Analysis of Macroevolutionary Mixtures) is a fully independent C++ software developed by Daniel L. Rabosky. You need to install BAMM first on your machine to be able to run this section of deepSTRAPP.

Source: Rabosky, D. L. (2014). Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PloS one, 9(2), e89543.

DOI: 10.1371/journal.pone.0089543
BAMM website: http://bamm-project.org/

deepSTRAPP builds upon BAMM and functions from R package [BAMMtools] to offer a simplified framework to model and visualize diversification dynamics on a time-calibrated phylogeny.

## Goal: Map evolution of diversification rates and regime shifts on the time-calibrated phylogeny

# Run a BAMM (Bayesian Analysis of Macroevolutionary Mixtures)

# Step 1/ Set BAMM - Record BAMM settings and generate all input files needed for BAMM.
# Step 2/ Run BAMM - Run BAMM and move output files in dedicated directory.
# Step 3/ Evaluate BAMM - Produce evaluation plots and ESS data.
# Step 4/ Import BAMM outputs - Load `BAMM_object` in R and subset posterior samples.
# Step 5/ Clean BAMM files - Remove files generated during the BAMM run.

# All these actions are performed by a single function: deepSTRAPP::prepare_diversification_data()
?deepSTRAPP::prepare_diversification_data()

# This function perform a single BAMM run to infer diversification rates and regime shifts.
# Due to the stochastic nature of the exploration of the parameter space with MCMC process,
# best practice recommend to ran multiple runs and check for convergence of the MCMC traces,
# ensuring that the region of high probability has been reached by your MCMC runs.


## Parametrize BAMM

# BAMM accepts numerous arguments that control the modeling process.
# The main arguments are listed in the function deepSTRAPP::prepare_diversification_data().
# Additional arguments can be provided as a named list under 'additional_BAMM_settings'.
# To see all available settings, load and print the BAMM template file provided within deepSTRAPP.
data(BAMM_template_diversification)
print(BAMM_template_diversification)


## Load time-calibrated phylogeny
library(phytools)
data(whale.tree)

# Source: Steeman, M. E., M. B. Hebsgaard, R. E. Fordyce, S. Y. W. Ho, D. L. Rabosky,
# R. Nielsen, C. Rahbek, H. Glenner, M. V. Sorensen, and E. Willerslev (2009) 
# Radiation of extant cetaceans driven by restructuring of the oceans. Systematic Biology, 58, 573-585.


## Run BAMM workflow with deepSTRAPP
whale_BAMM_object <- prepare_diversification_data(
   BAMM_install_directory_path = "./software/bamm-2.5.0/", # To provide path to BAMM directory
   phylo = whale.tree,
   prefix_for_files = "whale",
   seed = 1234, # Set seed for reproducibility
   numberOfGenerations = 10^5, # Set low for the example (Default = 10^7)
   # Set the overall proportion of terminals in the phylogeny compared to
   # the estimated overall richness in the clade
   globalSamplingFraction = 1.0, 
   # The path to the `.txt` file used to provide clade-specific sampling fractions. 
   # Here, we use a global sampling fraction.
   sampleProbsFilename = NULL, 
   # Set the expected number of regime shifts. It acts as an hyperparameter controlling
   # the exponential prior distribution used to modulate reversible jumps across
   # model configurations in the rjMCMC run.
   # When set to 'NULL', an empirical rule is used to define this value: 1 regime shift expected
   # for every 100 tips in the phylogeny, with a minimum of 1.
   expectedNumberOfShifts = NULL, 
   # Set the frequency in which to write the event data to the output file = 
   # the sampling frequency of posterior samples.     
   # When set to `NULL`, the frequency is set such as 2000 posterior samples
   # are recorded (before removing the burn-in).
   eventDataWriteFreq = NULL, 
   # Proportion of posterior samples removed from the BAMM output to ensure that the remaining samples
   # where drawn once the equilibrium distribution was reached.                    
   burn_in = 0.25,
   # Number of posterior samples to extract, after removing the burn-in, 
   # in the final `BAMM_object` to use for downstream analyses.
   nb_posterior_samples = 1000, 
   # List of additional arguments as found in `BAMM_template_diversification`.
   additional_BAMM_settings = list(), 
   # Output directory used to store input/output files generated
   BAMM_output_directory_path = "./BAMM_outputs/", 
   keep_BAMM_outputs = TRUE, # To keep the BAMM_output directory after the run
   # Controls the definition of 'core-shifts' used to distinguish across configurations 
   # when identifying the most frequent regime shift configuration (MAP) across samples.
   MAP_odd_ratio_threshold = 5, 
   # To include (or not) the evaluation step and produce MCMC trace, ESS, 
   # and prior/posterior comparisons for expected number of shifts.
   skip_evaluations = FALSE, 
   plot_evaluations = TRUE, # To plot the three outputs of the evaluation step
   # To save in a table (ESS), and PDFs (MCMC trace, and prior/posterior comparisons
   # for expected number of shifts) the evaluation outputs.
   save_evaluations = TRUE) 

## Inspect evaluations

# This function produces two evaluation plots and one table to check the quality of the BAMM run.

# 1/ Plot the MCMC trace = evolution of logLik across MCMC generations. 
   # Output file = 'MCMC_trace_logLik.pdf'.

# 2/ Compute the Effective Sample Size (ESS) across posterior samples (after removing burn-in) 
   # using coda::effectiveSize().
   # This is a way to evaluate if your MCMC runs has enough generations to produce robust estimates.
   # Ideally, ESS should be higher than 200. Output file = 'ESS_df.csv'.

# 3/ Plot the comparison of prior and posterior distributions of the number of regime shifts
   # with BAMMtools::plotPrior. Output file = 'PP_nb_shifts_plot.pdf'. 
   # A good value for expectedNumberOfShifts is one with high similarities between the distributions
   # hinting that the information in the data coincides
   # with your expectations for the number of regime shifts. 
   # The best practice consists in trying several values to control if it affects or not the final output.

# An example of these plots and table is shown below:

#> Effective Sample Size (ESS) across posterior samples
#>     ESS_logLik   ESS_logPrior   ESS_N_shifts  ESS_eventRate ESS_acceptRate 
#>       277.0866       301.4833       138.8292      1158.0859       235.7291

### Explore diversity dynamics

# Load directly the result
data(whale_BAMM_object, package = "deepSTRAPP")

# Explore output
str(whale_BAMM_object, 1)

# Record the regime shift events and macroevolutionary regimes parameters across posterior samples
str(whale_BAMM_object$eventData, 1)
# Mean speciation rates at tips aggregated across all posterior samples
head(whale_BAMM_object$meanTipLambda) 
# Mean extinction rates at tips aggregated across all posterior samples
head(whale_BAMM_object$meanTipMu) 


### Explore posterior probabilities of regime shifts

# Each posterior sample has its own diversification history with 
# time-varying diversification rates and regime shift locations.
# To visualize the expected location of regime shifts, 
# we compute the Marginal posterior Shift Probability (MSP) of each branch,
# as the probability to observe a regime shift on each individual branch across all posterior samples.
# For more details, see `[BAMMtools::marginalShiftProbsTree()]`.
# This branch-specific information is stored as phylogenetic tree with branch scaled to their MSP score.
whale_BAMM_object$MSP_tree # Tree with branch scaled to MSP scores.

# Plot the MSP tree
plot(whale_BAMM_object$MSP_tree, cex = 0.25)
title(main = "Marginal posterior Shift Probabilities per branches")
# Please note that a series of long branch does not indicate that a regime shift
# is likely to occur on each branch, but rather that the location of the regime shift
# is uncertain and shared across closely related branches, which is illustrated by the next plots.

# This information can be used to adjust the size of the symbols showing the location 
# of regime shifts on the tree using the `adjust_size_to_prob = TRUE` argument
# in `deepSTRAPP::plot_BAMM_rates()`.


### Define regime shift location

# deepSTRAPP allows to plot the location of regime shifts over mean diversification rates mapped on a phylogeny
?deepSTRAPP::plot_BAMM_rates()

# Three options are available based on the regime shifts recorded across all BAMM posterior samples
 
 # `index` = Posterior sample index.
   # Used to select a unique posterior BAMM sample and map regime shifts as observed in this sample.
 
 # `MAP` = Maximum A Posteriori probability.
   # Shows location of regime shifts according to the samples exhibiting 
   # the Maximum A Posteriori probability (MAP) configuration =  the most frequent configuration
   # for the location of regime shifts across all BAMM posterior samples.
   # This only accounts for which branches the regimes occur on, not the location of shifts
   # along the branches.
   # For more details, see `[BAMMtools::credibleShiftSet()]`.
   # Only regime shifts with an odd-ratio of marginal posterior shift probability / prior shift probability
   # higher than the threshold provided in the `MAP_odd_ratio_threshold` argument (default = 5) 
   # are considered as core-shifts and used to identify the MAP configuration.
   # The location of the regimes along each branch is then averaged across all the identified MAP samples.

# Provide the indices of all MAP samples
whale_BAMM_object$MAP_indices 
# BAMM object summarizing diversification dynamics for MAP samples only. Used to plot the MAP mean rates.
whale_BAMM_object$MAP_BAMM_object 

 # `MSC` = Maximum Shift Credibility.
   # Shows location of regime shifts according to the samples exhibiting 
   # the Maximum Shift Credibility (MCS) configuration = the configuration with the highest Posterior 
   # probability to occur as computed from the product of the marginal branch-specific shift probabilities.
   # This only accounts for which branches the regimes occur on, not the location of shifts
   # along the branches.
   # For more details, see `[BAMMtools::maximumShiftCredibility()]`.
   # This option can be useful if multiple configurations are present in relatively close frequencies,
   # making it ambiguous to identify the MAP configuration.
   # The location of the regimes along each branch is then averaged across all the identified MSC samples.

# Provide the indices of all MSC samples
whale_BAMM_object$MSC_indices 
# BAMM object summarizing diversification dynamics for MSC samples only. Used to plot the MSC mean rates.
whale_BAMM_object$MSC_BAMM_object 


### Plot rates and regimes

## Plot two samples

# Plot configuration in BAMM posterior sample n°100
plot_BAMM_rates(whale_BAMM_object, 
                configuration_type = "index",
                sample_index = 100, 
                cex = 0.2, # Adjust tip label size
                labels = TRUE, legend = TRUE)
title(main = "Shift location: Sample n\u00B0100")

# Plot configuration in BAMM posterior sample n°700
plot_BAMM_rates(whale_BAMM_object, 
                configuration_type = "index",
                sample_index = 700,  
                cex = 0.2, # Adjust tip label size
                labels = TRUE, legend = TRUE)
title(main = "Shift location: Sample n\u00B0700")

## Plot the MAP configuration
plot_BAMM_rates(whale_BAMM_object, 
                configuration_type = "MAP",
                cex = 0.2, # Adjust tip label size
                labels = TRUE, legend = TRUE)
title(main = "Mean rates: Overall - Mean shift locations: MAP")

## Plot the MSC configuration
plot_BAMM_rates(whale_BAMM_object, 
                configuration_type = "MSC",
                regimes_pch = 23, # Use a diamond symbol
                regimes_fill = "purple", # Change fill color of symbols
                cex = 0.25, # Adjust tip label size
                labels = TRUE, legend = TRUE)
title(main = "Mean rates: Overall - Mean shift locations: MSC")


# Similarly to the two posterior samples used as examples, 
# the MAP and MSC configurations do not agree on the location of regime shifts in this case.
# However, they highlight the existence of a unique regime shift located on either of the two branches.

## Note on diversification models for time-calibrated phylogenies

# This function relies on BAMM to provide a reliable solution to map diversification rates 
# and regime shifts on a time-calibrated phylogeny and obtain the `BAMM_object` object needed 
# to run the deepSTRAPP workflow ([run_deepSTRAPP_for_focal_time], [run_deepSTRAPP_over_time]).
# However, it is one option among others for modeling diversification on phylogenies.
# You may wish to explore alternatives models such as LSBDS model in RevBayes (Höhna et al., 2016),
# the MTBD model (Barido-Sottani et al., 2020), or the ClaDS2 model (Maliet et al., 2019) for your own data.
# However, you will need Bayesian models that infer regime shifts
# to be able to perform STRAPP tests (Rabosky & Huang, 2016).
# Additionally, you need to format the model output such as in `BAMM_object`,
# so it can be used in a deepSTRAPP workflow.

Maël Doré

2025-10-22