
Run deepSTRAPP to test for a relationship between diversification rates and trait data over multiple time steps
Source:R/run_deepSTRAPP_over_time.R
run_deepSTRAPP_over_time.RdWrapper function to run deepSTRAPP workflows over multiple time steps in the past.
It starts from traits mapped on a phylogeny (trait data) and BAMM output (diversification data)
and carries out the appropriate statistical method to test for a relationship between diversification rates and trait data.
The workflow is repeated over multiple points in time (i.e. the time_steps) and results are summarized in a data.frame.
The function can also provide summaries of trait values and diversification rates
extracted along branches over the different time_steps.
Statistical tests are based on block-permutations: rates data are randomized across tips following blocks defined by the diversification regimes identified on each tip (typically from a BAMM). Such tests are called STructured RAte Permutations on Phylogenies (STRAPP) as described in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. https://doi.org/10.1093/sysbio/syv066.
See the original BAMMtools::traitDependentBAMM() function used to
carry out STRAPP test on extant time-calibrated phylogenies.
Tests can be carried out on speciation, extinction and net diversification rates.
Usage
run_deepSTRAPP_over_time(
contMap = NULL,
densityMaps = NULL,
ace = NULL,
tip_data = NULL,
trait_data_type,
BAMM_object,
time_steps = NULL,
time_range = NULL,
nb_time_steps = NULL,
time_step_duration = NULL,
keep_tip_labels = TRUE,
rate_type = "net_diversification",
seed = NULL,
nb_permutations = NULL,
replace_samples = FALSE,
alpha = 0.05,
two_tailed = TRUE,
one_tailed_hypothesis = NULL,
posthoc_pairwise_tests = FALSE,
p.adjust_method = "none",
return_perm_data = FALSE,
nthreads = 1,
print_hypothesis = TRUE,
extract_trait_data_melted_df = FALSE,
extract_diversification_data_melted_df = FALSE,
return_STRAPP_results = FALSE,
return_updated_trait_data_with_Map = FALSE,
return_updated_BAMM_object = FALSE,
verbose = TRUE,
verbose_extended = FALSE
)Arguments
- contMap
For continuous trait data. Object of class
"contMap", typically generated withprepare_trait_data()orphytools::contMap(), that contains a phylogenetic tree and associated continuous trait mapping. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).- densityMaps
For categorical trait or biogeographic data. List of objects of class
"densityMap", typically generated withprepare_trait_data(), that contains a phylogenetic tree and associated posterior probability of being in a given state/range along branches. Each object (i.e.,densityMap) corresponds to a state/range. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).- ace
(Optional) Ancestral Character Estimates (ACE) at the internal nodes. Obtained with
prepare_trait_data()as output in the$aceslot.For continuous trait data: Named numerical vector typically generated with
phytools::fastAnc(),phytools::anc.ML(), orape::ace(). Names are nodes_ID of the internal nodes. Values are ACE of the trait.For categorical trait or biogeographic data: Matrix that record the posterior probabilities of ancestral states/ranges. Rows are internal nodes_ID. Columns are states/ranges. Values are posterior probabilities of each state per node. Needed in all cases to provide accurate estimates of trait values.
- tip_data
(Optional) Named vector of tip values of the trait.
For continuous trait data: Named numerical vector of trait values.
For categorical trait or biogeographic data: Character string vector of states/ranges Names are nodes_ID of the internal nodes. Needed to provide accurate tip values.
For biogeographic data, ranges should follow the coding scheme of BioGeoBEARS with a unique CAPITAL letter per unique areas (ex: A, B), combined to form multi-area ranges (Ex: AB). Alternatively, you can provide tip_data as a matrix or data.frame of binary presence/absence in each area (coded as unique CAPITAL letter). In this case, columns are unique areas, rows are taxa, and values are integer (0/1) signaling absence or presence of the taxa in the area.
- trait_data_type
Character string. Specify the type of trait data. Must be one of "continuous", "categorical", "biogeographic".
- BAMM_object
Object of class
"bammdata", typically generated withprepare_diversification_data(), that contains a phylogenetic tree and associated diversification rate mapping across selected posterior samples. The phylogenetic tree must the same as the one associated with thecontMap,aceandtip_data.- time_steps
Numerical vector. Time steps at which the STRAPP tests should be carried out. If
NULL(the default),time_stepswill be generated from a combination of two arguments amongtime_range,nb_time_steps, and/ortime_step_duration.- time_range
Vector of two numerical values. Time boundaries within with the
time_stepsmust be defined if not provided. IfNULL(the default), andtime_rangeis needed to generate thetime_steps, the depth of the tree is used by default:c(0, root_age). However, no time step will be generated for the 'root_age'.- nb_time_steps, time_step_duration
Numerical. Number of time steps and duration of each time step used to generate
time_stepsif not provided. You must provide at least one of those two arguments to be able to generatetime_steps.- keep_tip_labels
Logical. Specify whether terminal branches with a single descendant tip must retained their initial
tip.labelon the updated phylogeny. Default isTRUE.- rate_type
A character string specifying the type of diversification rates to use. Must be one of 'speciation', 'extinction' or 'net_diversification' (default).
- seed
Integer. Set the seed to ensure reproducibility. Default is
NULL(a random seed is used).- nb_permutations
Integer. To select the number of random permutations to perform during the tests. If NULL (default), all posterior samples will be used once.
- replace_samples
Logical. To specify whether to allow 'replacement' (i.e., multiple use) of a posterior sample when drawing samples used to carry out the STRAPP test. Default is
FALSE.- alpha
Numerical. Significance level to use to compute the
estimatecorresponding to the values of the test statistic used to assess significance of the test. This does NOT affect p-values. Default is0.05.- two_tailed
Logical. To define the type of tests. If
TRUE(default), tests for correlations/differences in rates will be carried out with a null hypothesis that rates are not correlated with trait values (continuous data) or equals between trait states (categorical and biogeographic data). IfFALSE, one-tailed tests are carried out.For continuous data, it involves defining a
one_tailed_hypothesistesting for either a "positive" or "negative" correlation under the alternative hypothesis.For binary data (two states), it involves defining a
one_tailed_hypothesisindicating which states have higher rates under the alternative hypothesis.For multinominal data (more than two states), it defines the type of post hoc pairwise tests to carry out between pairs of states. If
posthoc_pairwise_tests = TRUE, all two-tailed (iftwo_tailed = TRUE) or one-tailed (iftwo_tailed = FALSE) tests are automatically carried out.
- one_tailed_hypothesis
A character string specifying the alternative hypothesis in the one-tailed test. For continuous data, it is either "negative" or "positive" correlation. For binary data, it lists the trait states with states ordered in increasing rates under the alternative hypothesis, separated by a greater-than such as c('A > B').
- posthoc_pairwise_tests
Logical. Only for multinominal data (with more than two states). If
TRUE, all possible post hoc pairwise (Dunn) tests will be computed across all pairs of states. This is a way to detect which pairs of states have significant differences in rates if the overall test (Kruskal-Wallis) is significant. Default isFALSE.- p.adjust_method
A character string. Only for multinominal data (with more than two states). It specifies the type of correction to apply to the p-values in the post hoc pairwise tests to account for multiple comparisons. See
stats::p.adjust()for the available methods. Default isnone.- return_perm_data
Logical. Whether to return the stats data computed from the posterior samples for observed and permuted data in the output. This is needed to plot the histograms of the null distribution used to assess significance of the tests with
plot_histogram_STRAPP_test_for_focal_time(). (for a singlefocal_time) andplot_histograms_STRAPP_tests_over_time()(for multipletime_steps). Default isFALSE.- nthreads
Integer. Number of threads to use for paralleled computing of the STRAPP tests across the permutations. The R package
parallelmust be loaded fornthreads > 1. Default is1.- print_hypothesis
Logical. Whether to print information on what test is carried out, detailing the null and alternative hypotheses, and what significant level is used to rejected or not the null hypothesis. Default is
TRUE.- extract_trait_data_melted_df
Logical. Specify whether trait data must be extracted from the
updated_contMap/updated_densityMapsobjects at each time step and returned in a melted data.frame. Default isFALSE.- extract_diversification_data_melted_df
Logical. Specify whether diversification data (regimes ID and tip rates) must be extracted from the
updated_BAMM_objectat each time step and returned in a melted data.frame. Default isFALSE.- return_STRAPP_results
Logical. Specify whether the
STRAPP_resultsobjects summarizing the results of the STRAPP tests carried out at each time step should be returned among the outputs in addition to the$pvalues_summary_dfalready providing test stat estimates and p-values obtained across alltime_steps.- return_updated_trait_data_with_Map
Logical. Specify whether the
trait_dataextracted for the givenfocal_timeand the updated version of mapped phylogeny (contMap/densityMaps) provided as input should be returned among the outputs. The updatedcontMap/densityMapsconsists in cutting off branches and mapping that are younger than thefocal_time. Default isFALSE.- return_updated_BAMM_object
Logical. Specify whether the
updated_BAMM_objectwith phylogeny and mapped diversification rates cut-off at thefocal_timeshould be returned among the outputs.- verbose
Logical. Should progression per
time_stepsbe displayed? Default isTRUE.- verbose_extended
Should progression per
time_stepsAND within each deepSTRAPP workflow de displayed? In addition to printing progress alongtime_steps, a message will be printed at each step of the deepSTRAPP workflow, and for every batch of 100 BAMM posterior samples whose rates are regimes are updated. Ifextract_diversification_data_melted_df = TRUE, a message for will also be printed when rates are extracted. Default isFALSE.
Value
The function returns a list with at least five elements.
$pvalues_summary_dfData.frame with three columns providing test stat$estimateand$p_valueobtained for each time step (i.e.,$focal_time), that can be passed down toplot_STRAPP_pvalues_over_time()to generate a plot showing the evolution of the test results across time.$time_stepsNumerical vector. Time steps at which the STRAPP tests were carried out in the same order as the objects returned in the output lists.$trait_data_typeCharacter string. Specify the type of trait data. Possible values are: "continuous", "categorical", "biogeographic".$trait_data_type_for_statsCharacter string. The type of trait data used to select statistical method. One of 'continuous', 'binary', or 'multinominal'.$rate_typeCharacter string. The type of diversification rates used in the tests: 'speciation', 'extinction' or 'net_diversification'.
Optional summary df for multinominal data, if posthoc_pairwise_tests = TRUE:
$pvalues_summary_df_for_posthoc_pairwise_testsData.frame with four or five columns providing test stat$estimate,$p_value, and$p_value_adjusted(ifp.adjust_methodused is not "none") for each$pairof states involved in post hoc Dunn's tests obtained for each time step (i.e.,$focal_time). This data.frame can be passed down toplot_STRAPP_pvalues_over_time()to generate a plot showing the evolution of the post hoc test results across time.
Optional melted data.frames:
$trait_data_df_over_timeData.frame with three columns providing$trait_valueassociated with each$tip_IDfound along each time step (i.e.,$focal_time). Setextract_trait_data_melted_df = TRUEto include it in the output.$diversification_data_df_over_timeData.frame with six columns providing diversification regimes ($regime_ID) and$ratessorted by$rate_typealong tips ($tip_ID) found across all posterior samples ($BAMM_sample_ID) over each time step (i.e.,$focal_time). Setextract_diversification_data_melted_df = TRUEto include it in the output.Those data.frames can be passed down to
plot_rates_through_time()to generate a plot showing the evolution diversification rates across trait values over time.
Optional objects generated for each time step (i.e., focal_time) and ordered as in $time_steps:
$STRAPP_results_over_timeList of objects summarizing the results of the STRAPP tests Seecompute_STRAPP_test_for_focal_time()for a detailed description of the elements in each object. Setreturn_STRAPP_results = TRUEto include it in the output. Combined withreturn_perm_data = TRUE, it allows to plot the histograms of the null distributions used to assess significance of the tests withplot_histogram_STRAPP_test_for_focal_time(). (for a singlefocal_time) andplot_histograms_STRAPP_tests_over_time()(for multipletime_steps).$updated_trait_data_with_Map_over_timeList of objects containing trait data and updatedcontMap/densityMaps. UpdatedcontMap/densityMapscan be respectively plotted withplot_contMap()orplot_densityMaps_overlay(), to display a phylogeny mapped with trait values with branches cut at eachfocal_time.$updated_BAMM_objects_over_timeList of objects containing rates and regimes ID mapped on phylogeny. UpdatedBAMM_objectcan be plotted withplot_BAMM_rates()to display a phylogeny mapped with diversification rates with branches cut at eachfocal_time.
Details
The function is a wrapper of run_deepSTRAPP_for_focal_time() that runs the
deepSTRAPP workflow over multiple time_steps.
The deepSTRAPP workflow is described step by step in the run_deepSTRAPP_for_focal_time() documentation.
Its main output is the $pvalues_summary_df: a data.frame providing test stat estimates and p-values obtained across all time_steps,
that can be passed down to plot_STRAPP_pvalues_over_time() to generate a plot showing the evolution of the test results across time.
If using multinominal data (with more than two states) and posthoc_pairwise_tests = TRUE, the output will also contain
a data.frame providing test stat estimates and p-values for post hoc pairwise tests in $pvalues_summary_df_for_posthoc_pairwise_tests.
The function offers options to generate summary data.frames of the data extracted across time_steps:
If
extract_trait_data_melted_df = TRUE, a data.frame of trait values found along branches at each time step is provided in$trait_data_df_over_time.If
extract_diversification_data_melted_df = TRUE, a data.frame of diversification data (regimes ID and tip rates) found along branches at each time step is provided in$diversification_data_df_over_time.Those data.frames can be passed down to
plot_rates_through_time()to generate a plot showing the evolution diversification rates across trait values over time.
The function also allows to keep records of the intermediate objects generated during the STRAPP workflow:
If
return_STRAPP_results = TRUE, a list of STRAPP test outputs is provided in$STRAPP_results_over_time. Combined withreturn_perm_data = TRUE, it allows to plot the histograms of the null distributions used to assess significance of the tests withplot_histogram_STRAPP_test_for_focal_time(). (for a singlefocal_time) andplot_histograms_STRAPP_tests_over_time()(for multipletime_steps).If
return_updated_trait_data_with_Map = TRUE, a list of objects containing trait data and updatedcontMapordensityMapsis provided in$updated_trait_data_with_Map_over_time. UpdatedcontMap/densityMapscan be respectively plotted withplot_contMap()orplot_densityMaps_overlay(), to display a phylogeny mapped with trait values with branches cut at eachfocal_time.If
return_updated_BAMM_object = TRUE, a list of updatedBAMM_objectof class"bammdata"that contains rates and regimes ID found at eachfocal_time. UpdatedBAMM_objectcan be plotted withplot_BAMM_rates()to display a phylogeny mapped with diversification rates with branches cut at eachfocal_time.
See also
To run the deepSTRAPP workflow for a single focal_time: run_deepSTRAPP_for_focal_time()
extract_most_likely_trait_values_for_focal_time() update_rates_and_regimes_for_focal_time()
extract_diversification_data_melted_df_for_focal_time() compute_STRAPP_test_for_focal_time()
For a guided tutorial on complete deepSTRAPP workflow, see the associated vignettes:
For continuous trait data:
vignette("deepSTRAPP_continuous_data", package = "deepSTRAPP")For categorical trait data:
vignette("deepSTRAPP_categorical_3lvl_data", package = "deepSTRAPP")For biogeographic range data:
vignette("deepSTRAPP_biogeographic_data", package = "deepSTRAPP")