Morphoscape
Morphoscape.Rmd
The following provides a guide on the workflow of performing an adaptive landscape analysis, from the philosophies of morphospaces, and types of data required, to the details of how to use the functions in this “Morphoscape” package. This guide will cover the following topics:
1. Defining a Morphospace
- Phenotype
- Performance data: Specimens or Warps
2. Using Morphoscape
- Creating Performance Surfaces
- Calculate Landscapes
Defining a morphospace
Phenotype
Adaptive landscape analyses require two types of data to be married together: phenotypic data and performance data. Phenotype data can take many forms, and the only requirement of phenotype data to the construction of adaptive landscapes is the definition of a morphospace. The simplest type of morphospace can be a 2D plot with the axes defined by two numeric measurements of phenotype, such as the length and width of a skull, though any quantification of phenotype is valid.
Geometric morphometrics has become the quintessential method for quantifying detailed shape variation in bony structures, and the final outcome in many studies is the visualization of a PCA of shape variation - which is a morphospace. However, how one defines the morphospace can drastically impact the outcomes of these adaptive landscape analyses. A PCA and between-group PCA are both valid methods to ordinate morphological data, yet will produce dramatically different morphospaces depending on the goals of the analysis. PCA will produce a (largely) unbiased ordination of the major axes of variation, while bgPCA will find axes of variation that maximize differences in groups. Caution should be applied when using constrained ordination like bgPCA, as they can create ecological patterns where none may actually exist (see Bookstein 2019, Cardini et al 2019 and Cardini & Polly 2020 for the cautionary debate). However, in many ways bgPCA, or other constrained ordinations are ideal for questions regarding functional and adaptive differences between ecological groups when actual morphological differences are present.
For now the adaptive landscape methods in this package are limited to 2D morphospaces, with two axes of phenotypic variation. This largely done out of the complexities involved in analyzing and visualizing multivariate covariance in more than 3 dimensions. However, if more than two axes of phenotypic variation are desired to be analyzed, these can be done in separate analyses.
Collecting performance data: Specimens or Warps
Once an ordination and 2D morphospace of the phenotypic data is defined, one must then collect data on performance of these phenotypes. There are two approaches one can take to collecting performance data: collect data directly from specimens, or collect data from phenotypic warps in morphospace. The former is the simplest and most direct as long as your dataset is not too large, and allows you to to know actual performance of actual specimens. However, error can creep in if the performance traits do not strongly covary with the axes of your morphospace, and can produce uneven and inconsistent surfaces. In addition, specimens may not evenly occupy morphospace resulting in regions of morphospace not defined by existing phenotypes, and thus will not have measured performance data and may produce erroneous interpolations or extrapolations of performance. Finally, collecting performance data can be time consuming, and may be impractical for extremely large datasets.
The alternative is to collect performance data from hypothetical
warps across morphospace, which eliminates these issues. As long as the
method used to define your morphospace is reversible (such as
prcomp
, geomorph::gm.prcomp
,
Morpho::prcompfast
, Morpho::groupPCA
,
Morpho::CVA
), it is possible to extract the phenotype at
any location in morphospace. In fact many of these ordination functions
come with predict
methods for this very task. As such,
using prediction, it is possible to define phenotypes evenly across all
of morphospace called warps. These warps are also useful in they can be
defined to represent phenotypic variation for ONLY the axes of the
morphospace, and ignore variation in other axes. These warps can however
form biologically impossible phenotypes (such as the walls of a bone
crossing over one another), which may or may not help your
interpretations of why regions of morphospace might be occupied or
not.
Both these approaches are valid as long as morphospace is reasonably well covered. For an in-depth analysis of morphospace sampling see Smith et al (2021).
This package comes with the turtle humerus dataset from Dickson and Pierce (2019), which uses performance data collected from warps.
data("turtles")
data("warps")
str(turtles)
#> 'data.frame': 40 obs. of 4 variables:
#> $ x : num 0.03486 -0.07419 -0.07846 0.00972 -0.00997 ...
#> $ y : num -0.019928 -0.015796 -0.010289 -0.000904 -0.029465 ...
#> $ Group : chr "freshwater" "softshell" "softshell" "freshwater" ...
#> $ Ecology: chr "S" "S" "S" "S" ...
str(warps)
#> 'data.frame': 24 obs. of 6 variables:
#> $ x : num -0.189 -0.189 -0.189 -0.189 -0.134 ...
#> $ y : num -0.05161 -0.00363 0.04435 0.09233 -0.05161 ...
#> $ hydro: num -1839 -1962 -2089 -2371 -1754 ...
#> $ curve: num 8.07 6.3 9.7 15.44 10.21 ...
#> $ mech : num 0.185 0.193 0.191 0.161 0.171 ...
#> $ fea : num -0.15516 -0.06215 -0.00435 0.14399 0.28171 ...
turtles
is a dataset of coordinate data for 40 turtle
humerus specimens that have been ordinated in a bgPCA morphospace, the
first two axes maximizing the differences between three ecological
groups: Marine, Freshwater and Terrestrial turtles. This dataset also
includes these and other ecological groupings.
warps
is a dataset of 4x6 evenly spaced warps predicted
from this morphospace and 4 performance metrics.
Using Morphoscape
Once a morphospace is defined and performance data collected, the
workflow of using Morphoscape is fairly straightforward. Using the
warp
and turtles
datasets the first step is to
make a functional dataframe using as_fnc_df()
. The input to
this function is a dataframe containing both coordinate data and
performance data (and also grouping factors if desired). The first two
columns must be coordinates, while the other columns can be defined as
performance data, or as grouping factors. It is best to have your
performance data named at this point to keep track.
library(Morphoscape)
data("turtles")
data("warps")
str(turtles)
#> 'data.frame': 40 obs. of 4 variables:
#> $ x : num 0.03486 -0.07419 -0.07846 0.00972 -0.00997 ...
#> $ y : num -0.019928 -0.015796 -0.010289 -0.000904 -0.029465 ...
#> $ Group : chr "freshwater" "softshell" "softshell" "freshwater" ...
#> $ Ecology: chr "S" "S" "S" "S" ...
str(warps)
#> 'data.frame': 24 obs. of 6 variables:
#> $ x : num -0.189 -0.189 -0.189 -0.189 -0.134 ...
#> $ y : num -0.05161 -0.00363 0.04435 0.09233 -0.05161 ...
#> $ hydro: num -1839 -1962 -2089 -2371 -1754 ...
#> $ curve: num 8.07 6.3 9.7 15.44 10.21 ...
#> $ mech : num 0.185 0.193 0.191 0.161 0.171 ...
#> $ fea : num -0.15516 -0.06215 -0.00435 0.14399 0.28171 ...
# Create an fnc_df object for downstream use
warps_fnc <- as_fnc_df(warps, func.names = c("hydro", "curve", "mech", "fea"))
str(warps_fnc)
#> Classes 'fnc_df' and 'data.frame': 24 obs. of 6 variables:
#> $ x : num -0.189 -0.189 -0.189 -0.189 -0.134 ...
#> $ y : num -0.05161 -0.00363 0.04435 0.09233 -0.05161 ...
#> $ hydro: num 0.763 0.627 0.487 0.174 0.858 ...
#> $ curve: num 0.0544 0 0.1045 0.281 0.1202 ...
#> $ mech : num 0.359 0.473 0.446 0 0.149 ...
#> $ fea : num 0.372 0.458 0.512 0.65 0.777 ...
#> - attr(*, "func.names")= chr [1:4] "hydro" "curve" "mech" "fea"
Creating Performance Surfaces
It is then a simple process to perform surface interpolation by
automatic Kriging using the krige_surf()
function. This
will autofit a kriging function to the data. This is performed by the
automap::autoKrige()
function. For details on the fitting
of variograms you should read the documentation of automap
.
All the autoKrige
fitting data is kept in the
kriged_surfaces
object along with the output surface.
By default the krige_surf()
function will interpolate
within an alpha hull wrapped around the inputted datapoints. This is to
avoid extrapolation beyond measured datapoints. This can be defined
using the resample_grid()
function, which will supply a
grid object defining the points to interpolate, and optionally plot the
area to be reconstructed. The strength of wrapping can be controlled
using the alpha
argument, with smaller values producing a
stronger wrapping.
# Create alpha-hulled grid for kriging
grid <- resample_grid(warps, hull = "concaveman", alpha = 3, plot = TRUE)
kr_surf <- krige_surf(warps_fnc, grid = grid)
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
kr_surf
#> A kriged_surfaces object
#> - functional characteristics:
#> hydro, curve, mech, fea
#> - surface size:
#> 70 by 70
#> α-hull applied (α = 3)
#> - original data:
#> 24 rows
plot(kr_surf)
However, if one wishes to also extrapolate to the full extent of
morphospace, set hull = NULL
. Because the
warps
dataset evenly samples morphospace, we can set
hull = NULL
and reconstruct the full rectangle of
morphospace. When hull = NULL
an amount of padding will
also be applied to provide some space beyond the supplied datapoints and
can be controlled using the padding
argument. Finally, we
can also specify the density of interpolated points using the
resample
argument.
# Create alpha-hulled grid for kriging
grid <- resample_grid(warps, hull = NULL, padding = 1.1)
# Do the kriging on the grid
kr_surf <- krige_surf(warps_fnc, grid = grid)
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
kr_surf
#> A kriged_surfaces object
#> - functional characteristics:
#> hydro, curve, mech, fea
#> - surface size:
#> 100 by 100
#> - original data:
#> 24 rows
plot(kr_surf)
This reconstructed surface is missing actual specimen data points
with associated ecological groupings, which are needed for later
analyses. These can be added using the krige_new_data()
function.
# Do kriging on the sample dataset
kr_surf <- krige_new_data(kr_surf, new_data = turtles)
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
kr_surf
#> A kriged_surfaces object
#> - functional characteristics:
#> hydro, curve, mech, fea
#> - surface size:
#> 100 by 100
#> - original data:
#> 24 rows
#> - new data:
#> 40 rows
plot(kr_surf)
This of course all can all be done in one step.
krige_surf
will automatically call
resample_grid()
if no grid
argument is
supplied, and new_data
can be supplied as the data by which
later group optimums are calculated against.
# Above steps all in one:
kr_surf <- krige_surf(warps_fnc, hull = NULL, padding = 1.1,
new_data = turtles)
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
#> [using ordinary kriging]
kr_surf
#> A kriged_surfaces object
#> - functional characteristics:
#> hydro, curve, mech, fea
#> - surface size:
#> 100 by 100
#> - original data:
#> 24 rows
#> - new data:
#> 40 rows
plot(kr_surf)
Calculate Landscapes
The next step is to calculate a distribution of adaptive landscapes.
Each adaptive landscape is constructed as the summation of the
performance surfaces in differing magnitudes, Each performance surface
is multiplied by a weight ranging from 0-1, and the total sum of weights
is equal to 1. For four equally weighted performance surfaces, the
weights would be the vector c(0.25, 0.25, 0.25, 0.25)
. To
generate a distribution of different combinations of these weights, use
the generate_weights()
function.
generate_weights
must have either a step
or
n
argument to determine how many allocations to generate,
and nvar
the number of variables. One can also provide
either the fnc_df
or kr_surf
objects to the
data
argument. step
determines the step size
between weight values. step = 0.1
will generate a vectors
of c(0, 0.1, 0.2 ... 1)
. Alternatively, one can set the
number of values in the sequence set. n = 10
will generate
the same vectors of c(0, 0.1, 0.2 ... 1)
. The function will
then generate all combinations of nvar
variables that sum
to 1.
For four variables, this will produce 286 combinations. A step size
of 0.05 will produce 1771 rows. As the step size gets smaller, and the
number of variables increases, the number of output rows will
exponentially increase. It is recommended to start with large
step
sizes, or small n
to ensure things are
working correctly.
# Generate weights to search for optimal landscapes
weights <- generate_weights(n = 10, nvar = 4)
#> 286 rows generated
weights <- generate_weights(step = 0.05 , data = kr_surf)
#> 1771 rows generated
This weights
matrix is then provided to the
calc_all_landscapes()
along with the kr_surf
object to calculate the landscapes for each set of weights. For
calculations with a large number of weights, outputs can take some time
and can be very large in size, and it is recommended to utilize the
file
argument to save the output to file.
# Calculate all landscapes; setting verbose = TRUE produces
# a progress bar
all_landscapes <- calc_all_lscps(kr_surf, grid_weights = weights)
With a distribution of landscapes, it is now possible to find
landscapes that maximize ‘fitness’ for a given subset or group of your
specimens in new_data
using the calcWprimeBy()
function. The by
argument sets the grouping variable for
the data provided in new_data
from earlier. There are
several ways by
can be set: A one sided formula containing
the name of a column in new_data
or a vector containing a
factor variable.
# Calculate optimal landscapes by Group
table(turtles$Ecology)
#>
#> M S T
#> 4 29 7
wprime_by_Group <- calcWprimeBy(all_landscapes, by = ~Ecology)
wprime_by_Group <- calcWprimeBy(all_landscapes, by = turtles$Ecology)
wprime_by_Group
#> - turtles$Ecology == "M"
#>
#> Optimal weights:
#> Weight SE SD Min. Max.
#> hydro 0.104167 0.029167 0.10104 0.0 0.30
#> curve 0.008333 0.005618 0.01946 0.0 0.05
#> mech 0.012500 0.006528 0.02261 0.0 0.05
#> fea 0.875000 0.025746 0.08919 0.7 1.00
#>
#> Average fitness value at optimal weights:
#> Value SE SD Min. Max.
#> Z 0.7612 0.004671 0.01618 0.7437 0.7927
#> -----------------------------------------
#> - turtles$Ecology == "S"
#>
#> Optimal weights:
#> Weight SE SD Min. Max.
#> hydro 0.85469 0.009878 0.05588 0.8 1.00
#> curve 0.05312 0.010025 0.05671 0.0 0.20
#> mech 0.05312 0.010025 0.05671 0.0 0.20
#> fea 0.03906 0.007691 0.04350 0.0 0.15
#>
#> Average fitness value at optimal weights:
#> Value SE SD Min. Max.
#> Z 0.7491 0.002325 0.01315 0.7326 0.7835
#> -----------------------------------------
#> - turtles$Ecology == "T"
#>
#> Optimal weights:
#> Weight SE SD Min. Max.
#> hydro 0.0125 0.00559 0.02236 0.00 0.05
#> curve 0.8500 0.02415 0.09661 0.65 1.00
#> mech 0.1250 0.02661 0.10646 0.00 0.35
#> fea 0.0125 0.00559 0.02236 0.00 0.05
#>
#> Average fitness value at optimal weights:
#> Value SE SD Min. Max.
#> Z 0.7538 0.003712 0.01485 0.736 0.7852
#>
#> - method: chi-squared, quantile = 0.05
summary(wprime_by_Group)
#> Optimal weights by turtles$Ecology:
#> W_hydro W_curve W_mech W_fea Z
#> M 0.104167 0.008333 0.012500 0.875000 0.761245
#> S 0.854688 0.053125 0.053125 0.039062 0.749055
#> T 0.012500 0.850000 0.125000 0.012500 0.753825
plot(wprime_by_Group, ncol = 2)
It is also possible to use calcGrpWprime
to enumerate a
single group or for the entire sample:
# Calculate landscapes for one Group at a time
i <- which(turtles$Ecology == "T")
wprime_T <- calcGrpWprime(all_landscapes, index = i)
wprime_T
#> Optimal weights:
#> Weight SE SD Min. Max.
#> hydro 0.0125 0.00559 0.02236 0.00 0.05
#> curve 0.8500 0.02415 0.09661 0.65 1.00
#> mech 0.1250 0.02661 0.10646 0.00 0.35
#> fea 0.0125 0.00559 0.02236 0.00 0.05
#>
#> Average fitness value at optimal weights:
#> Value SE SD Min. Max.
#> Z 0.7538 0.003712 0.01485 0.736 0.7852
#>
#> - method: chi-squared, quantile = 0.05
wprime_b <- calcGrpWprime(all_landscapes, Group == "box turtle")
wprime_b
#> Optimal weights:
#> Weight SE SD Min. Max.
#> hydro 0.01944 0.007162 0.03038 0.0 0.1
#> curve 0.86944 0.018140 0.07696 0.7 1.0
#> mech 0.09167 0.021862 0.09275 0.0 0.3
#> fea 0.01944 0.007162 0.03038 0.0 0.1
#>
#> Average fitness value at optimal weights:
#> Value SE SD Min. Max.
#> Z 0.7746 0.003492 0.01482 0.7565 0.8073
#>
#> - method: chi-squared, quantile = 0.05
plot(wprime_b)
wprime_all <- calcGrpWprime(all_landscapes)
wprime_all
#> Optimal weights:
#> Weight SE SD Min. Max.
#> hydro 0.74853 0.008797 0.08885 0.6 1.00
#> curve 0.10735 0.009984 0.10084 0.0 0.40
#> mech 0.08578 0.008342 0.08425 0.0 0.35
#> fea 0.05833 0.006156 0.06217 0.0 0.25
#>
#> Average fitness value at optimal weights:
#> Value SE SD Min. Max.
#> Z 0.6343 0.001179 0.01191 0.6193 0.67
#>
#> - method: chi-squared, quantile = 0.05
Finally, it is possible to compare the landscapes for each group
using calcGrpWprime()
. This is done by comparing the number
of landscape combinations are shared by the upper percentile between
groups.
# Test for differences between Group landscapes
tests <- multi.lands.grp.test(wprime_by_Group)
tests
#> Pairwise landscape group tests
#> - method: chi-squared | quantile: 0.05
#>
#> Results:
#> M S T
#> M - 0 0
#> S 0 - 0
#> T 0 0 -
#> (lower triangle: p-values | upper triangle: number of matches)
# Calculate landscapes for one Group at a time
wprime_b <- calcGrpWprime(all_landscapes, Group == "box turtle")
wprime_b
#> Optimal weights:
#> Weight SE SD Min. Max.
#> hydro 0.01944 0.007162 0.03038 0.0 0.1
#> curve 0.86944 0.018140 0.07696 0.7 1.0
#> mech 0.09167 0.021862 0.09275 0.0 0.3
#> fea 0.01944 0.007162 0.03038 0.0 0.1
#>
#> Average fitness value at optimal weights:
#> Value SE SD Min. Max.
#> Z 0.7746 0.003492 0.01482 0.7565 0.8073
#>
#> - method: chi-squared, quantile = 0.05
wprime_t <- calcGrpWprime(all_landscapes, Group == "tortoise")
wprime_t
#> Optimal weights:
#> Weight SE SD Min. Max.
#> hydro 0.01176 0.005302 0.02186 0.00 0.05
#> curve 0.85294 0.022877 0.09432 0.65 1.00
#> mech 0.11765 0.026059 0.10744 0.00 0.35
#> fea 0.01765 0.007353 0.03032 0.00 0.10
#>
#> Average fitness value at optimal weights:
#> Value SE SD Min. Max.
#> Z 0.7008 0.003571 0.01472 0.6829 0.7325
#>
#> - method: chi-squared, quantile = 0.05
# Test for differences between Group landscapes
lands.grp.test(wprime_b, wprime_t)
#> Landscape group test
#> - method: chi-squared | quantile: 0.05
#>
#> Number of matches: 16
#> P-value: 0.8889
References
Bookstein, F. L. (2019). Pathologies of between-groups principal components analysis in geometric morphometrics. Evolutionary Biology, 46(4), 271-302.
Cardini, A., O’Higgins, P., & Rohlf, F. J. (2019). Seeing distinct groups where there are none: spurious patterns from between-group PCA. Evolutionary Biology, 46(4), 303-316.
Cardini, A., & Polly, P. D. (2020). Cross-validated between group PCA scatterplots: A solution to spurious group separation?. Evolutionary Biology, 47(1), 85-95.
Dickson, B. V., & Pierce, S. E. (2019). Functional performance of turtle humerus shape across an ecological adaptive landscape. Evolution, 73(6), 1265-1277.
Smith, S. M., Stayton, C. T., & Angielczyk, K. D. (2021). How many trees to see the forest? Assessing the effects of morphospace coverage and sample size in performance surface analysis. Methods in Ecology and Evolution, 12(8), 1411-1424.