--- title: "Explore STRAPP test options" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Explore STRAPP test options} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r set_options, include = FALSE} knitr::opts_chunk$set( eval = FALSE, # Chunks of codes will not be evaluated by default collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 # Set device size at rendering time (when plots are generated) ) ``` ```{r setup, eval = TRUE, include = FALSE} library(deepSTRAPP) is_dev_version <- function (pkg = "deepSTRAPP") { # # Check if ran on CRAN # not_cran <- identical(Sys.getenv("NOT_CRAN"), "true") # || interactive() # Version number check version <- tryCatch(as.character(utils::packageVersion(pkg)), error = function(e) "") dev_version <- grepl("\\.9000", version) # not_cran || dev_version return(dev_version) } ```
This vignette presents the different __types of STRAPP tests__ that can be carried out with deepSTRAPP across the different types of data (i.e., continuous, binary, and multinominal).
Two-tailed tests make no assumption on the direction of the effect tested. One-tailed tests make a hypothesis on the direction of the effect tested. * For continuous data, the direction of the correlation is either __"positive"__ or __"negative"__. * For binary data, the rate difference between state A and state B can be either __"A > B"__ or __"B > A"__. * For multinominal data, overall Kruskal-Wallis tests are one-tailed by definition. They test for a rate difference across all states/ranges at once. * Post hoc pairwise Dunn's tests carried out between pairs of states/ranges can be one-tailed or two-tailed, with all configurations tested by default.
Please note that the trait data and phylogeny calibration used in these examples are __NOT valid biological data__. They were modified in order to provide results illustrating the usefulness of deepSTRAPP.
```{r STRAPP_tests_cont_two_tailed} # ------ Example 1: Continuous trait data ------ # #### Step 1: Prepare data #### ## Load phylogeny with old time-calibration data("Ponerinae_tree_old_calib", package = "deepSTRAPP") ## Load trait df data("Ponerinae_trait_tip_data", package = "deepSTRAPP") ## Load BAMM diversification outputs data("Ponerinae_BAMM_object_old_calib", package = "deepSTRAPP") # This dataset is only available in development versions installed from GitHub. # It is not available in CRAN versions. # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version. # Extract continuous trait data as a named vector Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data, nm = Ponerinae_trait_tip_data$Taxa) # This not valid biological data. For the sake of example, we will assume this is size data. # Reorder tip data as in phylogeny Ponerinae_cont_tip_data <- Ponerinae_cont_tip_data[Ponerinae_tree_old_calib$tip.label] # Run prepare_trait_data with default options # For continuous trait, a BM model is assumed by default. Ponerinae_trait_object <- prepare_trait_data(tip_data = Ponerinae_cont_tip_data, trait_data_type = "continuous", phylo = Ponerinae_tree_old_calib, evolutionary_model = "BM", seed = 1234) # Set seed for reproducibility #### Step 2: Run a two-tailed test #### # No assumption is made on the direction of the effect tested. # The null hypothesis is that no correlation can be found between rates and trait values. # The alternative hypothesis is that there is a correlation between rates and trait values, # in any direction (i.e., "positive" or "negative"). # Here, we run a two-tailed STRAPP test for t = 10 Mya. focal_time <- 10 deepSTRAPP_output_two_tailed <- run_deepSTRAPP_for_focal_time( contMap = Ponerinae_trait_object$contMap, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cont_tip_data, trait_data_type = "continuous", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = focal_time, two_tailed = TRUE, # Select a two-tailed test (Default) rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE, # Keep permutation data to be able to plot histograms of tests # Needed to access traits and rates data to evaluate the direction of the correlation extract_diversification_data_melted_df = TRUE, return_updated_trait_data_with_Map = TRUE) ## Explore output str(deepSTRAPP_output_two_tailed, max.level = 1) # Access deepSTRAPP results deepSTRAPP_output_two_tailed$STRAPP_results[1:5] # We obtain a p-value = 0.016 leading us to discard the null hypothesis (no correlation). # The estimate is the quantile 5% of the distribution of differences in # absolute Spearman rho-stats between observed and permuted data. # The test is significant as the estimate is higher than zero. # However, without plotting the rates vs. trait data, we do not know the direction of this correlation. # Plot histogram of Spearman sum-rank test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_two_tailed) # The test is significant as the red line (quantile 5% = 0.063) is above the null expectation # that Δ abs(Spearman rho-stats) = 0. ``` ```{r STRAPP_tests_cont_two_tailed_eval_dev, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = is_dev_version(), echo = FALSE} ## Load phylogeny with old time-calibration data("Ponerinae_tree_old_calib", package = "deepSTRAPP") ## Load trait df data("Ponerinae_trait_tip_data", package = "deepSTRAPP") ## Load BAMM diversification outputs data("Ponerinae_BAMM_object_old_calib", package = "deepSTRAPP") # Extract continuous trait data as a named vector Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data, nm = Ponerinae_trait_tip_data$Taxa) # Reorder tip data as in phylogeny Ponerinae_cont_tip_data <- Ponerinae_cont_tip_data[Ponerinae_tree_old_calib$tip.label] # Run prepare_trait_data with default options Ponerinae_trait_object <- suppressWarnings(prepare_trait_data( tip_data = Ponerinae_cont_tip_data, trait_data_type = "continuous", phylo = Ponerinae_tree_old_calib, evolutionary_model = "BM", seed = 1234,# Set seed for reproducibility plot_map = FALSE, verbose = FALSE)) # Here, we run a two-tailed STRAPP test for t = 10 Mya. deepSTRAPP_output_two_tailed <- suppressWarnings(run_deepSTRAPP_for_focal_time( contMap = Ponerinae_trait_object$contMap, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cont_tip_data, trait_data_type = "continuous", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = 10, two_tailed = TRUE, # Select a two-tailed test (Default) rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE, # Keep permutation data to be able to plot histograms of tests # Needed to access traits and rates data to evaluate the direction of the correlation extract_diversification_data_melted_df = TRUE, return_updated_trait_data_with_Map = TRUE, verbose = FALSE)) # Plot histogram of Spearman sum-rank test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_two_tailed) ``` ```{r STRAPP_tests_cont_two_tailed_eval_CRAN, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = !is_dev_version(), echo = FALSE} # Plot pre-rendered graph knitr::include_graphics("figures/4_Explore_STRAPP_hypotheses_1.2_Example_continuous_two_tailed_1.PNG") ``` ```{r STRAPP_tests_cont_two_tailed_2} ## Plot rates vs. trait values # We can plot the mean rates vs. trait values inferred by deepSTRAPP for T = 10Mya # to unravel the direction of the correlation detected. ?plot_rates_vs_trait_data_for_focal_time() # Select a color scheme from lowest to highest values (i.e., small ants to large ants) color_scale = c("darkgreen", "limegreen", "orange", "red") # Plot mean rates vs. traits aggregated across all BAMM posterior samples plot_rates_vs_trait_data_for_focal_time(deepSTRAPP_outputs = deepSTRAPP_output_two_tailed, color_scale = color_scale) # The plot indicates that rates are globally negatively correlated with trait values as # "small" ants exhibit higher rates than "large" ants for T = 10 Mya. # The plot also displays summary statistics for the STRAPP test associated with the data shown: # * An observed statistic computed across the mean rates and trait values shown on the plot. # Here, rho obs = -0.743, indicating a negative correlation. # This is not the statistic of the STRAPP test itself, which is conducted across all BAMM posterior samples. # * The quantile of null statistic distribution at the significant threshold used to define test significance. # The test will be considered significant (i.e., the null hypothesis is rejected) # if this value is higher than zero, as shown on the histogram above. # Here, Q5% = 0.063, so the test is significant (according to this significance threshold). # * The p-value of the associated STRAPP test. Here, p = 0.016. # We could have directly tested for hypothesis with a one-tailed test (See Step 3 below). ``` ```{r STRAPP_tests_cont_two_tailed_eval_2_dev, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = is_dev_version(), echo = FALSE} ## Plot rates vs. trait values # Select a color scheme from lowest to highest values (i.e., small ants to large ants) color_scale = c("darkgreen", "limegreen", "orange", "red") # Plot mean rates vs. traits aggregated across all BAMM posterior samples plot_rates_vs_trait_data_for_focal_time(deepSTRAPP_outputs = deepSTRAPP_output_two_tailed, color_scale = color_scale) ``` ```{r STRAPP_tests_cont_two_tailed_eval_2_CRAN, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = !is_dev_version(), echo = FALSE} # Plot pre-rendered graph knitr::include_graphics("figures/4_Explore_STRAPP_hypotheses_1.2_Example_continuous_two_tailed_2.PNG") ``` ```{r STRAPP_tests_cont_one_tailed} #### Step 3: Run a one-tailed test #### # Here, we make an assumption on the direction of the effect tested. # The null hypothesis is that there is a positive or no correlation between rates and trait values. # The alternative hypothesis is that there is a negative correlation between rates and trait values. # We run a one-tailed STRAPP test for t = 10 Mya. focal_time <- 10 deepSTRAPP_output_one_tailed <- run_deepSTRAPP_for_focal_time( contMap = Ponerinae_trait_object$contMap, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cont_tip_data, trait_data_type = "continuous", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = focal_time, two_tailed = FALSE, # Select a one-tailed test one_tailed_hypothesis = "negative", # We select a "negative" correlation as the alternative hypothesis rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE) # Keep permutation data to be able to plot histograms of tests ## Explore output str(deepSTRAPP_output_one_tailed, max.level = 1) # Access deepSTRAPP results deepSTRAPP_output_one_tailed$STRAPP_results[1:5] # We obtain a p-value = 0.006 leading us to discard the null hypothesis # (no correlation or positive correlation). # Thus, the alternative hypothesis (negative correlation) is supported by the test. # The estimate is the quantile 95% of the distribution of differences in Spearman rho-stats # between observed and permuted data. # The test is significant as the estimate is lower than zero. # Plot histogram of Spearman sum-rank test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_one_tailed) # The test is significant as the red line (quantile 95% = -0.147) is below the null expectation # that Δ Spearman rho stat = 0. ``` ```{r STRAPP_tests_cont_one_tailed_eval_dev, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = is_dev_version(), echo = FALSE} # We run a one-tailed STRAPP test for t = 10 Mya. deepSTRAPP_output_one_tailed <- suppressWarnings(run_deepSTRAPP_for_focal_time( contMap = Ponerinae_trait_object$contMap, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cont_tip_data, trait_data_type = "continuous", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = 10, two_tailed = FALSE, # Select a one-tailed test one_tailed_hypothesis = "negative", # We select a "negative" correlation as the alternative hypothesis rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE, # Keep permutation data to be able to plot histograms of tests verbose = FALSE)) # Plot histogram of Spearman sum-rank test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_one_tailed) ``` ```{r STRAPP_tests_cont_one_tailed_eval_CRAN, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = !is_dev_version(), echo = FALSE} # Plot pre-rendered graph knitr::include_graphics("figures/4_Explore_STRAPP_hypotheses_1.3_Example_continuous_one_tailed.PNG") ``` ```{r STRAPP_tests_cat_2lvl_two_tailed} # ------ Example 2: Categorical binary trait data ------ # #### Step 1: Prepare data #### ## Load phylogeny with old time-calibration data("Ponerinae_tree_old_calib", package = "deepSTRAPP") ## Load trait df data("Ponerinae_trait_tip_data", package = "deepSTRAPP") ## Load BAMM diversification outputs data("Ponerinae_BAMM_object_old_calib", package = "deepSTRAPP") # This dataset is only available in development versions installed from GitHub. # It is not available in CRAN versions. # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version. # Extract continuous trait data as a named vector Ponerinae_cat_2lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_2lvl_tip_data, nm = Ponerinae_trait_tip_data$Taxa) # Reorder tip data as in phylogeny Ponerinae_cat_2lvl_tip_data <- Ponerinae_cat_2lvl_tip_data[Ponerinae_tree_old_calib$tip.label] # Select color scheme for states colors_per_states <- c("darkblue", "lightblue") names(colors_per_states) <- c("large", "small") # Run prepare_trait_data with default options Ponerinae_trait_object <- prepare_trait_data(tip_data = Ponerinae_cat_2lvl_tip_data, trait_data_type = "categorical", phylo = Ponerinae_tree_old_calib, evolutionary_model = "ARD", # Select an Equal-Rates model colors_per_states = colors_per_states, nb_simulations = 100, seed = 1234) # Set seed for reproducibility # Load directly output to save time data(Ponerinae_cat_2lvl_data_old_calib, package = "deepSTRAPP") Ponerinae_trait_object <- Ponerinae_cat_2lvl_data_old_calib #### Step 2: Run a two-tailed test #### # No assumption is made about which states exhibit the highest rates. # The null hypothesis is that there is no difference in rates between the two states. # The alternative hypothesis is that there is a difference in rates between the two states, # in any direction (i.e., "small > large" or "large > small"). # Here, we run a two-tailed STRAPP test for t = 10 Mya. focal_time <- 10 deepSTRAPP_output_two_tailed <- run_deepSTRAPP_for_focal_time( densityMaps = Ponerinae_trait_object$densityMaps, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cat_2lvl_tip_data, trait_data_type = "categorical", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = focal_time, two_tailed = TRUE, # Select a two-tailed test (Default) rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE, # Keep permutation data to be able to plot histograms of tests # Needed to access traits and rates data to evaluate the direction of the correlation extract_diversification_data_melted_df = TRUE, return_updated_trait_data_with_Map = TRUE) ## Explore output str(deepSTRAPP_output_two_tailed, max.level = 1) # Access deepSTRAPP results deepSTRAPP_output_two_tailed$STRAPP_results[1:5] # We obtain a p-value = 0.037 leading us to discard the null hypothesis (no differences). # The estimate is the quantile 5% of the distribution of differences # in absolute Mann-Whitney-Wilcoxon U-stats between observed and permuted data. # The test is significant as the estimate is higher than zero. # However, without plotting the rates vs. states, we do not know the direction of this difference. # Plot histogram of Mann-Whitney-Wilcoxon test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_two_tailed) # The test is significant as the red line (quantile 5% = 463.6) is above the null expectation # that Δ abs(Mann-Whitney-Wilcoxon U-stats) = 0. ``` ```{r STRAPP_tests_cat_2lvl_two_tailed_eval_dev, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = is_dev_version(), echo = FALSE} # Load phylogeny with old time-calibration data("Ponerinae_tree_old_calib", package = "deepSTRAPP") # Load trait df data("Ponerinae_trait_tip_data", package = "deepSTRAPP") ## Load BAMM diversification outputs data("Ponerinae_BAMM_object_old_calib", package = "deepSTRAPP") # Extract continuous trait data as a named vector Ponerinae_cat_2lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_2lvl_tip_data, nm = Ponerinae_trait_tip_data$Taxa) # Reorder tip data as in phylogeny Ponerinae_cat_2lvl_tip_data <- Ponerinae_cat_2lvl_tip_data[Ponerinae_tree_old_calib$tip.label] # Load directly deepSTRAPP output to save time data(Ponerinae_cat_2lvl_data_old_calib, package = "deepSTRAPP") Ponerinae_trait_object <- Ponerinae_cat_2lvl_data_old_calib # Select color scheme for states colors_per_states <- c("darkblue", "lightblue") names(colors_per_states) <- c("large", "small") # Here, we run a two-tailed STRAPP test for t = 10 Mya. deepSTRAPP_output_two_tailed <- suppressWarnings(run_deepSTRAPP_for_focal_time( densityMaps = Ponerinae_trait_object$densityMaps, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cat_2lvl_tip_data, trait_data_type = "categorical", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = 10, two_tailed = TRUE, # Select a two-tailed test (Default) rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE, # Keep permutation data to be able to plot histograms of tests # Needed to access traits and rates data to evaluate the direction of the correlation extract_diversification_data_melted_df = TRUE, return_updated_trait_data_with_Map = TRUE, verbose = FALSE)) # Plot histogram of Mann-Whitney-Wilcoxon test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_two_tailed) ``` ```{r STRAPP_tests_cat_2lvl_two_tailed_eval_CRAN, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = !is_dev_version(), echo = FALSE} # Plot pre-rendered graph knitr::include_graphics("figures/4_Explore_STRAPP_hypotheses_2.2_Example_cat_2lvl_two_tailed.PNG") ``` ```{r STRAPP_tests_cat_2lvl_two_tailed_2} ## Plot rates vs. trait values # We can plot the mean rates vs. states inferred by deepSTRAPP for T = 10Mya # to unravel the direction of the difference detected. ?plot_rates_vs_trait_data_for_focal_time() # Plot mean rates vs. trait states aggregated across all BAMM posterior samples plot_rates_vs_trait_data_for_focal_time(deepSTRAPP_outputs = deepSTRAPP_output_two_tailed, colors_per_levels = colors_per_states) # The plot highlights that "small" ants exhibited higher rates than "large" ants for T = 10 Mya. # The plot also displays summary statistics for the STRAPP test associated with the data shown: # * An observed statistic computed across the mean rates and trait states shown on the plot. # Here, U-stats obs = -23048, indicating the first group "large" ants have lower rates than "small" ants. # This is not the statistic of the STRAPP test itself, which is conducted across all BAMM posterior samples. # * The quantile of null statistic distribution at the significant threshold used to define test significance. # The test will be considered significant (i.e., the null hypothesis is rejected) # if this value is higher than zero, as shown on the histogram above. # Here, Q5% = 463.6, so the test is significant (according to this significance threshold). # * The p-value of the associated STRAPP test. Here, p = 0.037. # We could have directly tested for this hypothesis (i.e., "small" > "large") with a one-tailed test (See Step 3 below). ``` ```{r STRAPP_tests_cat_2lvl_two_tailed_eval_2_dev, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = is_dev_version(), echo = FALSE} ## Plot rates vs. trait values # Plot mean rates vs. trait states aggregated across all BAMM posterior samples plot_rates_vs_trait_data_for_focal_time(deepSTRAPP_outputs = deepSTRAPP_output_two_tailed, colors_per_levels = colors_per_states) ``` ```{r STRAPP_tests_cat_2lvl_two_tailed_eval_2_CRAN, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = !is_dev_version(), echo = FALSE} # Plot pre-rendered graph knitr::include_graphics("figures/4_Explore_STRAPP_hypotheses_2.2_Example_cat_2lvl_two_tailed_2.PNG") ``` ```{r STRAPP_tests_cat_2lvl_one_tailed} #### Step 3: Run a one-tailed test #### # Here, we make an assumption on the direction of the effect tested. # The null hypothesis is that diversification rates are lower or equal # for "small" ants compared to "large" ants. # The alternative hypothesis is that diversification rates are higher # for "small" ants than "large" ants. # We run a one-tailed STRAPP test for t = 10 Mya. focal_time <- 10 deepSTRAPP_output_one_tailed <- run_deepSTRAPP_for_focal_time( densityMaps = Ponerinae_trait_object$densityMaps, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cat_2lvl_tip_data, trait_data_type = "categorical", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = focal_time, two_tailed = FALSE, # Select a one-tailed test one_tailed_hypothesis = "small > large", # We select "small > large" as the alternative hypothesis rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE) # Keep permutation data to be able to plot histograms of tests ## Explore output str(deepSTRAPP_output_one_tailed, max.level = 1) # Access deepSTRAPP results deepSTRAPP_output_one_tailed$STRAPP_results[1:5] # We obtain a p-value = 0.019 leading us to discard the null hypothesis ("small <= large"). # Thus, the alternative hypothesis ("small > large") is supported by the test. # The estimate is the quantile 5% of the distribution of differences in Mann-Whitney-Wilcoxon U-stats # between observed and permuted data # The test is significant as the estimate is higher than zero. # Plot histogram of Mann-Whitney-Wilcoxon test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_one_tailed) # The test is significant as the red line (quantile 5% = 3899.8) is above the null expectation # that Δ Mann-Whitney-Wilcoxon U-stats = 0. ``` ```{r STRAPP_tests_cat_2lvl_one_tailed_eval_dev, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = is_dev_version(), echo = FALSE} # We run a one-tailed STRAPP test for t = 10 Mya. deepSTRAPP_output_one_tailed <- suppressWarnings(run_deepSTRAPP_for_focal_time( densityMaps = Ponerinae_trait_object$densityMaps, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cat_2lvl_tip_data, trait_data_type = "categorical", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = 10, two_tailed = FALSE, # Select a one-tailed test one_tailed_hypothesis = "small > large", # We select "small > large" as the alternative hypothesis rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE, # Keep permutation data to be able to plot histograms of tests verbose = FALSE)) # Plot histogram of Mann-Whitney-Wilcoxon test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_one_tailed) ``` ```{r STRAPP_tests_cat_2lvl_one_tailed_eval_CRAN, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = !is_dev_version(), echo = FALSE} # Plot pre-rendered graph knitr::include_graphics("figures/4_Explore_STRAPP_hypotheses_2.3_Example_cat_2lvl_one_tailed.PNG") ``` ```{r STRAPP_tests_cat_3lvl_overall} # ------ Example 3: Categorical multinominal (3-levels) trait data ------ # #### Step 1: Prepare data #### ## Load phylogeny with old time-calibration data("Ponerinae_tree_old_calib", package = "deepSTRAPP") ## Load trait df data("Ponerinae_trait_tip_data", package = "deepSTRAPP") ## Load BAMM diversification outputs data("Ponerinae_BAMM_object_old_calib", package = "deepSTRAPP") # This dataset is only available in development versions installed from GitHub. # It is not available in CRAN versions. # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version. # Extract continuous trait data as a named vector Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data, nm = Ponerinae_trait_tip_data$Taxa) # Reorder tip data as in phylogeny Ponerinae_cat_3lvl_tip_data <- Ponerinae_cat_3lvl_tip_data[Ponerinae_tree_old_calib$tip.label] ## Select color scheme for states (i.e., habitats) colors_per_states <- c("forestgreen", "sienna", "goldenrod") names(colors_per_states) <- c("arboreal", "subterranean", "terricolous") # Run prepare_trait_data with default options Ponerinae_trait_object <- prepare_trait_data(tip_data = Ponerinae_cat_3lvl_tip_data, trait_data_type = "categorical", phylo = Ponerinae_tree_old_calib, evolutionary_model = "ARD", # Select an Equal-Rates model colors_per_states = colors_per_states, nb_simulations = 100, seed = 1234) # Set seed for reproducibility # Load directly output to save time data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP") Ponerinae_trait_object <- Ponerinae_cat_3lvl_data_old_calib #### Step 2: Run the overall Kruskal-Wallis test #### # For multinominal data (i.e., with more than two states/ranges), the overall test # is by definition a one-tailed Kruskal-Wallis test for rate differences across all states/ranges. # There is no two-tailed version of this test, thus deepSTRAPP assumes a one-tailed test by default. # The null hypothesis is that there is no difference in rates across all states/ranges. # The alternative hypothesis is that there is at least one pairs of states/ranges # with differences in diversification rates. # To test for differences between pairs of states/ranges, you need to run post hoc pairwise tests. # These Dunn's test can be one-tailed or two-tailed depending on the null hypotheses stated. # See the next Steps 3 & 4 for post hoc pairwise tests. # Here, we run an overall STRAPP test for t = 10 Mya. focal_time <- 10 deepSTRAPP_output_overall <- run_deepSTRAPP_for_focal_time( densityMaps = Ponerinae_trait_object$densityMaps, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cat_3lvl_tip_data, trait_data_type = "categorical", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = focal_time, alpha = 0.10, # Adjust the significance threshold posthoc_pairwise_tests = FALSE, # To run the overall rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE, # Keep permutation data to be able to plot histograms of tests # Needed to access traits and rates data to evaluate the direction of the correlation extract_diversification_data_melted_df = TRUE, return_updated_trait_data_with_Map = TRUE) ## Explore output str(deepSTRAPP_output_overall, max.level = 1) # Access deepSTRAPP results deepSTRAPP_output_overall$STRAPP_results[1:5] # We obtain a p-value = 0.071 leading us to discard the null hypothesis (no differences). # The estimate is the quantile 10% of the distribution of differences # in Kruskal-Wallis H-stats between observed and permuted data. # The test is significant as the estimate is higher than zero. # However, without plotting the rates vs. states, or running post hoc pairwise tests, # we do not know which pairs of states exhibit differences. # Plot histogram of Kruskall-Wallis one-way ANOVA test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_overall) # The test is significant as the red line (quantile 10% = 6.942) is above the null expectation # that Δ Kruskal-Wallis H-stats = 0. ``` ```{r STRAPP_tests_cat_3lvl_overall_eval_dev, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = is_dev_version(), echo = FALSE} # Load phylogeny with old time-calibration data("Ponerinae_tree_old_calib", package = "deepSTRAPP") # Load trait df data("Ponerinae_trait_tip_data", package = "deepSTRAPP") ## Load BAMM diversification outputs data("Ponerinae_BAMM_object_old_calib", package = "deepSTRAPP") # Extract continuous trait data as a named vector Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data, nm = Ponerinae_trait_tip_data$Taxa) # Reorder tip data as in phylogeny Ponerinae_cat_3lvl_tip_data <- Ponerinae_cat_3lvl_tip_data[Ponerinae_tree_old_calib$tip.label] # Load directly deepSTRAPP output to save time data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP") Ponerinae_trait_object <- Ponerinae_cat_3lvl_data_old_calib ## Select color scheme for states colors_per_states <- c("forestgreen", "sienna", "goldenrod") names(colors_per_states) <- c("arboreal", "subterranean", "terricolous") # Here, we run an overall STRAPP test for t = 10 Mya. deepSTRAPP_output_overall <- suppressWarnings(run_deepSTRAPP_for_focal_time( densityMaps = Ponerinae_trait_object$densityMaps, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cat_3lvl_tip_data, trait_data_type = "categorical", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = 10, alpha = 0.10, # Adjust the significance threshold posthoc_pairwise_tests = FALSE, # To run the overall rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE, # Keep permutation data to be able to plot histograms of tests # Needed to access traits and rates data to evaluate the direction of the correlation extract_diversification_data_melted_df = TRUE, return_updated_trait_data_with_Map = TRUE, verbose = FALSE)) # Plot histogram of Kruskall-Wallis one-way ANOVA test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_overall) ``` ```{r STRAPP_tests_cat_3lvl_overall_eval_CRAN, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = !is_dev_version(), echo = FALSE} # Plot pre-rendered graph knitr::include_graphics("figures/4_Explore_STRAPP_hypotheses_3.2_Example_cat_3lvl_overall.PNG") ``` ```{r STRAPP_tests_cat_3lvl_overall_2} ## Plot rates vs. trait values # We can plot the mean rates vs. states inferred by deepSTRAPP for T = 10Mya # to unravel the direction of the differences detected. ?plot_rates_vs_trait_data_for_focal_time() # Plot mean rates vs. trait states aggregated across all BAMM posterior samples plot_rates_vs_trait_data_for_focal_time(deepSTRAPP_outputs = deepSTRAPP_output_overall, colors_per_levels = colors_per_states) # The plot indicates that net diversification rates are higher in "terricolous" ants, # than "subterranean" ants, and lower in "arboreal" ants, for T = 10 Mya. # The plot also displays summary statistics for the STRAPP test associated with the data shown: # * An observed statistic computed across the mean rates and trait states (i.e., habitats) shown on the plot. # Here, H-stat obs = 374.82. Please note that this is not the statistic of the STRAPP test itself, # which is conducted across all BAMM posterior samples. # * The quantile of null statistic distribution at the significant threshold used to define test significance. # The test will be considered significant (i.e., the null hypothesis is rejected) # if this value is higher than zero, as shown on the histogram above. # Here, Q10% = 6.942, so the test is significant (according to this significance threshold), # meaning we detected differences between habitats overall. # * The p-value of the associated STRAPP test. Here, p = 0.071. # We can explore those potential differences with post hoc pairwise tests, # with both one-tailed hypothesis tests or two-tailed tests (See Steps 3 & 4 below). ``` ```{r STRAPP_tests_cat_3lvl_overall_eval_2_dev, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = is_dev_version(), echo = FALSE} ## Plot rates vs. trait values # Plot mean rates vs. trait states aggregated across all BAMM posterior samples plot_rates_vs_trait_data_for_focal_time(deepSTRAPP_outputs = deepSTRAPP_output_overall, colors_per_levels = colors_per_states) ``` ```{r STRAPP_tests_cat_3lvl_overall_eval_2_CRAN, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = !is_dev_version(), echo = FALSE} # Plot pre-rendered graph knitr::include_graphics("figures/4_Explore_STRAPP_hypotheses_3.2_Example_cat_3lvl_overall_2.PNG") ``` ```{r STRAPP_tests_cat_3lvl_two_tailed_eval2} #### Step 3: Run all post hoc Dunn's two-tailed tests #### # Post hoc tests are pairwise tests for differences between pairs of states/ranges. # They can be two-tailed, without assumption regarding which states exhibit higher rates than the others. # or they can be one-tailed, with assumption on which states exhibit higher rates within each pair. # deepSTRAPP runs post hoc pairwise tests across all pairs of states. # You can choose which type of tests (i.e., "two-tailed" or "one-tailed") to run. # Here, the two-tailed version is presented. See Step 4 for the one-tailed version. # In the two-tailed tests, the null hypothesis for each pair is that there is # no difference in rates between the two states. # The alternative hypothesis is that there is a difference in rates between the two states, # in any direction (e.g., "arboreal > terricolous" or "terricolous > arboreal"). # You can also select a method to adjust p-values for multiple testing. # See stats::p.adjust() for the available methods. # Here, we select none. # We run a two-tailed STRAPP test for t = 10 Mya. focal_time <- 10 deepSTRAPP_output_two_tailed <- run_deepSTRAPP_for_focal_time( densityMaps = Ponerinae_trait_object$densityMaps, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cat_3lvl_tip_data, trait_data_type = "categorical", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = focal_time, two_tailed = TRUE, # Select two-tailed tests for the post hoc tests (Default) posthoc_pairwise_tests = TRUE, # Ask for post hoc tests to be carried out rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE) # Keep permutation data to be able to plot histograms of tests ## Explore output str(deepSTRAPP_output_two_tailed, max.level = 2) # Access deepSTRAPP results for post hoc tests deepSTRAPP_output_two_tailed$STRAPP_results$posthoc_pairwise_tests$summary_df # We obtain a p-value for each pair of states. Here there are 3 possible pairs. # Only the pair "arboreal != terricolous" yields to a significant result. # For this pair, we obtain a p-value = 0.025 leading us to discard the null hypothesis (no differences). # The estimate is the quantile 5% of the distribution of differences # in absolute Dunn's Z-stats between observed and permuted data. # The test is significant as the estimate is higher than zero (Q5% = 0.759). # However, without plotting the rates vs. states, we would have not know the direction of this difference. # Plot histogram of all post hoc two-tailed test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_two_tailed, plot_posthoc_tests = TRUE) # Only the test for the pair "arboreal =! terricolous" is significant as the red line # (quantile 5% = 0.759) is above the null expectation that Δ Dunn's Z-stats = 0. # The conclusion is that it is the difference in rates between "arboreal" ants # and "terricolous" ants that drives the differences in rates recorded overall # with the Kruskal-Wallis test. # Yet, with this test only, we do not know which states has the higher rates # We can look at the plot of rates vs. states above (See Step 2), # but we can also test directly for the two hypotheses (i.e., ("arboreal" > "terricolous") # or ("terricolous" > "arboreal") using one-tailed tests. # See Step 4 below for the one-tailed tests. ``` ```{r STRAPP_tests_cat_3lvl_two_tailed_eval_dev, fig.width = 8.5, fig.height = 7, out.width = "100%", eval = is_dev_version(), echo = FALSE} # Load phylogeny with old time-calibration data("Ponerinae_tree_old_calib", package = "deepSTRAPP") # Load trait df data("Ponerinae_trait_tip_data", package = "deepSTRAPP") # Extract continuous trait data as a named vector Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data, nm = Ponerinae_trait_tip_data$Taxa) # Reorder tip data as in phylogeny Ponerinae_cat_3lvl_tip_data <- Ponerinae_cat_3lvl_tip_data[Ponerinae_tree_old_calib$tip.label] # Load directly deepSTRAPP output to save time data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP") Ponerinae_trait_object <- Ponerinae_cat_3lvl_data_old_calib ## Select color scheme for states colors_per_states <- c("forestgreen", "sienna", "goldenrod") names(colors_per_states) <- c("arboreal", "subterranean", "terricolous") # Here, we run a two-tailed STRAPP test for t = 10 Mya. deepSTRAPP_output_two_tailed <- suppressWarnings(run_deepSTRAPP_for_focal_time( densityMaps = Ponerinae_trait_object$densityMaps, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cat_3lvl_tip_data, trait_data_type = "categorical", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = 10, two_tailed = TRUE, # Select two-tailed tests for the post hoc tests (Default) posthoc_pairwise_tests = TRUE, # Ask for post hoc tests to be carried out rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE, verbose = FALSE)) # Keep permutation data to be able to plot histograms of tests # Plot histogram of all post hoc two-tailed test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_two_tailed, plot_posthoc_tests = TRUE) ``` ```{r STRAPP_tests_cat_3lvl_two_tailed_eval_CRAN, fig.width = 8.5, fig.height = 7, out.width = "100%", eval = !is_dev_version(), echo = FALSE} # Plot pre-rendered graph knitr::include_graphics("figures/4_Explore_STRAPP_hypotheses_3.3_Example_cat_3lvl_two_tailed.PNG") ``` ```{r STRAPP_tests_cat_3lvl_one_tailed} #### Step 4: Run all post hoc Dunn's one-tailed tests #### # Here, we make assumptions on the direction of the effects tested. # We conduct all possible one-tailed tests across pairs of states. # Thus, for three pairs, we conduct six tests, one for each possible alternative hypothesis # (i.e., "A" > "B" and "B" > "A"). # We run a one-tailed STRAPP test for t = 10 Mya. focal_time <- 10 deepSTRAPP_output_one_tailed <- run_deepSTRAPP_for_focal_time( densityMaps = Ponerinae_trait_object$densityMaps, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cat_3lvl_tip_data, trait_data_type = "categorical", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = focal_time, two_tailed = FALSE, # Select one-tailed tests for the post hoc tests posthoc_pairwise_tests = TRUE, # Ask for post hoc tests to be carried out rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE) # Keep permutation data to be able to plot histograms of tests ## Explore output str(deepSTRAPP_output_one_tailed, max.level = 2) # Access deepSTRAPP results for post hoc tests deepSTRAPP_output_one_tailed$STRAPP_results$posthoc_pairwise_tests$summary_df # We obtain a p-value for each pair of states in each direction. # Here there are 3 possible pairs = 6 one-tailed tests. # Only the alternative hypothesis "arboreal < terricolous" yields to a significant result. # For this test, we obtain a p-value = 0.012 leading us to discard the null hypothesis that # "arboreal" ants have equivalent or higher rates than "terricolous" ants. # The estimate is the quantile 5% of the distribution of differences # in Dunn's Z-stats between observed and permuted data. # The test is significant as the estimate is higher than zero (Q5% = 1.718). # We can conclude that our data support the idea that "arboreal" ants # have lower diversification rates than "terricolous" ants. # Plot histogram of all post hoc one-tailed test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_one_tailed, plot_posthoc_tests = TRUE) # Only the test for the alternative hypothesis "arboreal < terricolous" is significant as the red line # (quantile 5% = 1.718) is above the null expectation that Δ Dunn's Z-stats = 0. # The conclusion is that it is the higher rates in "terricolous" ants vs. lower rates in "arboreal" ants # that drives the differences in rates recorded overall with the Kruskal-Wallis test. ``` ```{r STRAPP_tests_cat_3lvl_one_tailed_eval_dev, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = is_dev_version(), echo = FALSE} # We run a one-tailed STRAPP test for t = 10 Mya. deepSTRAPP_output_one_tailed <- suppressWarnings(run_deepSTRAPP_for_focal_time( densityMaps = Ponerinae_trait_object$densityMaps, ace = Ponerinae_trait_object$ace, tip_data = Ponerinae_cat_3lvl_tip_data, trait_data_type = "categorical", BAMM_object = Ponerinae_BAMM_object_old_calib, focal_time = 10, two_tailed = FALSE, # Select one-tailed tests for the post hoc tests posthoc_pairwise_tests = TRUE, # Ask for post hoc tests to be carried out rate_type = "net_diversification", seed = 1234, # Set seed for reproducibility return_perm_data = TRUE, # Keep permutation data to be able to plot histograms of tests verbose = FALSE)) # Plot histogram of all post hoc one-tailed test stats plot_histogram_STRAPP_test_for_focal_time( deepSTRAPP_outputs = deepSTRAPP_output_one_tailed, plot_posthoc_tests = TRUE) ``` ```{r STRAPP_tests_cat_3lvl_one_tailed_eval_CRAN, fig.width = 8.5, fig.height = 6, out.width = "100%", eval = !is_dev_version(), echo = FALSE} # Plot pre-rendered graph knitr::include_graphics("figures/4_Explore_STRAPP_hypotheses_3.4_Example_cat_3lvl_one_tailed.PNG") ```