
Compute STRAPP to test for a relationship between diversification rates and trait data
Source:R/compute_STRAPP_test_for_focal_time.R
compute_STRAPP_test_for_focal_time.RdCarries out the appropriate statistical method to test for a relationship between
diversification rates and trait data for a given point in the past (i.e. the focal_time).
Tests are based on block-permutations: rates data are randomized across tips following blocks
defined by the diversification regimes identified on each tip (typically from a BAMM).
Such tests are called STructured RAte Permutations on Phylogenies (STRAPP) as described in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. https://doi.org/10.1093/sysbio/syv066.
The function is an extension of the original BAMMtools::traitDependentBAMM() function used to
carry out STRAPP test on extant time-calibrated phylogenies.
Tests can be carried out on speciation, extinction and net diversification rates.
deepSTRAPP::compute_STRAPP_test_for_focal_time() can handle three types of statistical tests depending on the type of trait data provided:
Continuous trait data
Tests for correlations between trait and rates carried out with deepSTRAPP::compute_STRAPP_test_for_continuous_data().
The associated test is the Spearman's rank correlation test (See stats::cor.test).
Binary trait data
For categorical and biogeographic trait data that have only two states (ex: 'Nearctic' vs. 'Neotropics').
Tests for differences in rates between states are carried out with deepSTRAPP::compute_STRAPP_test_for_binary_data().
The associated test is the Mann-Whitney-Wilcoxon rank-sum test (See stats::wilcox.test).
Multinominal trait data
For categorical and biogeographic trait data with more than two states (ex: 'No leg' vs. 'Two legs' vs. 'Four legs').
Tests for differences in rates between states are carried out with deepSTRAPP::compute_STRAPP_test_for_multinominal_data().
The associated test for all states is the Kruskal-Wallis H test (See stats::kruskal.test).
If posthoc_pairwise_tests = TRUE, post hoc pairwise tests between pairs of states will be carried out too.
The associated test for post hoc pairwise tests is the Dunn's post hoc pairwise rank-sum test (See dunn.test::dunn.test).
Usage
compute_STRAPP_test_for_focal_time(
BAMM_object,
trait_data_list,
rate_type = "net_diversification",
seed = NULL,
nb_permutations = NULL,
replace_samples = FALSE,
alpha = 0.05,
two_tailed = TRUE,
one_tailed_hypothesis = NULL,
posthoc_pairwise_tests = FALSE,
p.adjust_method = "none",
return_perm_data = FALSE,
nthreads = 1,
print_hypothesis = TRUE
)Arguments
- BAMM_object
Object of class
"bammdata", typically generated withupdate_rates_and_regimes_for_focal_time(), that contains a phylogenetic tree and associated diversification rates across selected posterior samples updated to a specific time in the past (i.e. thefocal_time).- trait_data_list
List obtained from
extract_most_likely_trait_values_for_focal_time()that contains at least a$trait_dataelement, a$focal_timeelement, and a$trait_data_type.$trait_datais a named vector with the trait data found on the phylogeny atfocal_time.$focal_timeinforms on the time in the past at which the trait and rates data will be tested.$trait_data_typeinforms on the type of trait data: continuous, categorical, or biogeographic.- rate_type
A character string specifying the type of diversification rates to use. Must be one of 'speciation', 'extinction' or 'net_diversification' (default).
- seed
Integer. Set the seed to ensure reproducibility. Default is
NULL(a random seed is used).- nb_permutations
Integer. To select the number of random permutations to perform during the tests. If NULL (default), all posterior samples will be used once.
- replace_samples
Logical. To specify whether to allow 'replacement' (i.e., multiple use) of a posterior sample when drawing samples used to carry out the test. Default is
FALSE.- alpha
Numerical. Significance level to use to compute the
estimatecorresponding to the values of the test statistic used to assess significance of the test. This does NOT affect p-values. Default is0.05.- two_tailed
Logical. To define the type of tests. If
TRUE(default), tests for correlations/differences in rates will be carried out with a null hypothesis that rates are not correlated with trait values (continuous data) or equals between trait states (categorical and biogeographic data). IfFALSE, one-tailed tests are carried out.For continuous data, it involves defining a
one_tailed_hypothesistesting for either a "positive" or "negative" correlation under the alternative hypothesis.For binary data (two states), it involves defining a
one_tailed_hypothesisindicating which states have higher rates under the alternative hypothesis.For multinominal data (more than two states), it defines the type of post hoc pairwise tests to carry out between pairs of states. If
posthoc_pairwise_tests = TRUE, all two-tailed (iftwo_tailed = TRUE) or one-tailed (iftwo_tailed = FALSE) tests are automatically carried out.
- one_tailed_hypothesis
A character string specifying the alternative hypothesis in the one-tailed test. For continuous data, it is either "negative" or "positive" correlation. For binary data, it lists the trait states with states ordered in increasing rates under the alternative hypothesis, separated by a greater-than such as c('A > B').
- posthoc_pairwise_tests
Logical. Only for multinominal data (with more than two states). If
TRUE, all possible post hoc pairwise (Dunn) tests will be computed across all pairs of states. This is a way to detect which pairs of states have significant differences in rates if the overall test (Kruskal-Wallis) is significant. Default isFALSE.- p.adjust_method
A character string. Only for multinominal data (with more than two states). It specifies the type of correction to apply to the p-values in the post hoc pairwise tests to account for multiple comparisons. See
stats::p.adjust()for the available methods. Default isnone.- return_perm_data
Logical. Whether to return the stats data computed from the posterior samples for observed and permuted data in the output. This is needed to plot the histogram of the null distribution used to assess significance of the test with
plot_histogram_STRAPP_test_for_focal_time(). Default isFALSE.- nthreads
Integer. Number of threads to use for paralleled computing of the tests across the permutations. The R package
parallelmust be loaded fornthreads > 1. Default is1.- print_hypothesis
Logical. Whether to print information on what test is carried out, detailing the null and alternative hypotheses, and what significant level is used to rejected or not the null hypothesis. Default is
TRUE.
Value
The function returns a list with at least eight elements.
Summary elements for the main test:
$estimateNamed numerical. Value of the test statistic used to assess significance of the test according to the significance level provided (alpha). The test is significant if$estimateis higher than zero.$stats_medianNumerical. Median value of the distribution of test statistics across all selected posterior samples.$p-valueNumerical. P-value of the test. The test is considered significant if$p-valueis lower thanalpha.$methodCharacter string. The statistical method used to carry out the test.$rate_typeCharacter string. The type of diversification rates tested. One of 'speciation', 'extinction' or 'net_diversification'.$trait_data_typeCharacter string. The type of trait data as found in 'trait_data_list$trait_data_type'. One of 'continuous', 'categorical', or 'biogeographic'.$trait_data_type_for_statsCharacter string. The type of trait data used to select statistical method. One of 'continuous', 'binary', or 'multinominal'.$focal_timeThe time in the past at which the trait and rates data were tested.
If using continuous or binary data:
$two-tailedLogical. Record the type of test used: two-tailed ifTRUE, one-tailed ifFALSE. Ifone_tailed_hypothesisis provided (only for continuous and binary trait data):$one_tailed_hypothesisCharacter string. Record of the alternative hypothesis used for the one-tailed tests.
If posthoc_pairwise_tests = TRUE (only for multinomial trait data):
$posthoc_pairwise_testsList of at least 3 sub-elements:$summary_dfData.frame of five variables providing the summary results of post hoc pairwise tests$methodCharacter string. The statistical method used to carry out the test. Here, "Dunn".$two-tailedLogical. Record the type of post hoc pairwise tests used: two-tailed ifTRUE, one-tailed ifFALSE.
If return_perm_data = TRUE, the stats data computed from the posterior samples for observed and permuted data are provided.
This is needed to plot the histogram of the null distribution used to assess significance of the test with plot_histogram_STRAPP_test_for_focal_time().
$perm_data_dfA data.frame with four variables summarizing the data generated during the STRAPP test:$posterior_samples_random_IDInteger. ID of the posterior samples randomly drawn and used for the STRAPP test.$*_obsNumerical. Test stats computed from the observed data in the posterior samples. Name depends on the test used.$*_permNumerical. Test stats computed from the permuted data in the posterior samples. Name depends on the test used.$delta_*OR$abs_delta_*Numerical. Test stats computed for the STRAPP test comparing observed stats and permuted stats. Name depends on the test used and the type of tests (two-tailed compare absolute values; one-tailed compare raw values). Combined withposthoc_pairwise_tests = TRUE, the stats data are also provided for the post hoc pairwise tests:
$posthoc_pairwise_tests$perm_data_arrayA 3D array containing stats data for all post hoc pairwise tests in a similar format that$perm_data_df.
If no STRAPP test was performed in the case of categorical/biogeographic data with a single state/range at focal_time,
only the $trait_data_type, $trait_data_type_for_stats = "none", and $focal_time are returned.
Details
These set of functions carries out the STructured RAte Permutations on Phylogenies (STRAPP) test as defined in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193.
It is an extension of the original BAMMtools::traitDependentBAMM() function used to
carry out STRAPP test on extant time-calibrated phylogenies, but allowing here to test for
differences/correlations at any point in the past (i.e. the focal_time).
It takes an object of class "bammdata" (BAMM_object) that was updated such as
its diversification rates ($tipLambda and $tipMu) and regimes ($tipStates) are reflecting
values observed at at a specific time in the past (i.e. the $focal_time).
Similarly, it takes a list (trait_data_list) that provides $trait_data as observed on branches
at the same focal_time than the diversification rates and regimes.
A STRAPP test is carried out by drawing a random set of posterior samples from the BAMM_object, then randomly permuting rates
across blocks of tips defined by the macroevolutionary regimes. Test statistics are then computed across the initial observed data
and the permuted data for each sample.
In a two-tailed test, the p-value is the proportion of posterior samples in which the test stats is as extreme in the permuted than in the observed data.
In a one-tailed test, the p-value is the proportion of posterior samples in which the test stats is higher in the permuted than in the observed data.
———- Major changes compared to BAMMtools::traitDependentBAMM() ———-
Allow to choose if random sampling of posterior configurations must be done with replacement or not with
replace_samples.Add post hoc pairwise tests (Dunn test) for multinominal data. Use
posthoc_pairwise_tests = TRUE.Provide outputs tailored for histogram plots
plot_histogram_STRAPP_test_for_focal_time()and p-value time-series plotsplot_STRAPP_pvalues_over_time().Add prints detailing what test is carried out, what are the null and alternative hypotheses, and what significant level is used to rejected or not the null hypothesis. (Enabled with
print_hypothesis = TRUE).Split the function in multiple sub-functions according to the type of data (
$trait_data_type).Prevent using Pearson's correlation tests and applying log-transformation for continuous data. The rationale is that there is no reason to assume that tip rates are distributed normally or log-normally. Thus, a Spearman's rank correlation test is favored.
References
For STRAPP: Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. https://doi.org/10.1093/sysbio/syv066.
For STRAPP in deep times: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B., (2025), Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity, Nature Communications. https://doi.org/10.1038/s41467-025-63709-3
See also
Associated functions in deepSTRAPP: extract_most_likely_trait_values_for_focal_time() update_rates_and_regimes_for_focal_time()
Original function in BAMMtools: BAMMtools::traitDependentBAMM()
Statistical tests: stats::cor.test() stats::wilcox.test() stats::kruskal.test() dunn.test::dunn.test()
For a guided tutorial, see this vignette: vignette("explore_STRAPP_test_types", package = "deepSTRAPP")