21. Principal Warps, Relative Warps, and Procrustes PCA
Written by Norm MacLeod - The Natural History Museum, London, UK (email: email@example.com). This article first appeared in the Nº 75 edition of Palaeontology Newletter.
First, let’s remind ourselves of the ultimate goals for any geometric morphometric analysis - to define a mathematical space in which we can compare sets of landmark configurations that (1) ordinates shapes on the basis of their similarity, (2) treats these configurations as a whole entity rather than an accumulation of independent parts, (3) respects the conventions of the Kendall shape space, (4) supports shape modelling, and (5) is stable in the face of minor changes to the sample and/or reference shape. If we think back many columns ago now, a classic principal components analysis (PCA) of linear distances between landmarks ordinates shapes on the basis of their mutual similarity and is reasonably stable in the face of minor changes to the sample. The spaces formed by classical PCA can also be modelled, albeit with difficulty (Gnanadesikan 1977, Everitt 1978). But sets of linear distance data do not comprise a geometric entity in their own right or conform to the strictures of the Kendall shape space (see MacLeod 2009a). Accordingly, this approach is not considered especially ‘geometric’ in its treatment of morphometric data.
The thin plate spline (TPS) is technique that creates models of shapes described by landmark configurations as unified deformations. As such, the TPS is not a shape ordination method at all. Rather, it’s a graphical tool that can be used to compare any two landmark-defined shapes.
Principal and partial warps ordinate landmark configuration-defined shapes on the basis of their mutual similarity and support shape modelling. In a sense though, these methods ignore the Kendall shape space entirely insofar as they are based on a single shape — the reference shape — that is used to define a series of hypothetical deformations based on the arrangement and spatial scale of that shape’s landmarks. These deformations are then used to create a set of spatially ordered deformational modes that can be used as shape-variation descriptors. While these descriptors (or warps) are consistent with the conventions of the Kendall shape space, they don’t exploit its power.
So, despite having spent the last four — arguably the last six — columns developing aspects of the tools we need to realize our goal of achieving a truly geometric description of shape variation, we don’t seem to be there just yet. Nevertheless, today, we’ll arrive at our destination.
Most presentations of relative warps follow on from a discussion of principle warps. While this is perfectly reasonable from a mathematical point of view, the convention has led to substantial and largely needless confusion over the nature of relative warps. I’ll try to clear this confusion up here and, at the end, provide an easy way to calculate the bit of a relative warps analysis most morphometricians are interested in.
Recall, principal warps are the principle components of the bending energy matrix (Lp-1). This the inverse of the matrix Lp that expresses the spatial pattern of proximities of the landmark configuration’s shape coordinates.
In equation 21.2, r2ij is the square of the distance between the shape coordinates of landmarks i and j in the reference configuration and ln is the natural logarithm function (base e).
If the Lp matrix expresses the proximity of landmarks to each other in the shape coordinate space, its inverse expresses the the reciprocal of proximity. Accordingly, in the bending energy matrix relatively large values are assigned to comparisons between landmarks that lie proximate to one another and relatively small values to landmarks that lie at a distance from one another.
Taking the inverse of the Lp matrix quantifies our metaphor of shape change as a deformation that minimizes the ‘energy’ required to map one configuration of landmarks into another when that mapping is expressed as an interpolation surface or plate. This mode of interpolation differs from the more widely used elastic interpolation model because elastic deformations do not attempt to achieve global minimization of overall amount of deformation specified by the interpolation. In quite a profound sense use of the thin-plate spline metaphor encompasses the philosophical stance of trying to explain the features of nature by invoking models of minimal change. However, it needs to emphasized that, while this is convenient underlying, logical assumption and an elegant mathematical constraint, it may not adequately express the manner in which shape changes actually came about from either mechanistic biological or evolutionary perspectives. It is also very important to remember that the bending energy matrix is derived solely from information supplied by a single shape - the reference shape.
An eigenanalysis of the bending energy matrix (Lp-1) defines the set of principal warps. These are a set of mutually uncorrelated, non-linear modes of shape variation ordered in terms of spatial scale. The eigenvalues (principal values) derived from this eigenanalysis represent the relative amount of bending energy subsumed by each deformation mode.
The eigenvectors (principal warps) represent the geometries of the deformation modes themselves. High-energy modes express deformations whose geometries are relatively localized. Low-energy modes express deformations whose geometries are relatively generalized. Regardless, all modes specify a pattern of deformation that encompasses all landmarks. Although the differences between these modes lie in the extent of their relative regionalization, none can be regarded as being strictly regionalized in the sense that they involve only a subset of the existing landmarks. Computationally, all landmarks are always included in — and must taken into consideration when interpreting — all principal warps.
The only advantage principal warps provide is a means whereby the configuration of a reference shape’s landmarks is transformed from a simple set of shape coordinate values to a complex series of spatially ordered deformational modes. When these modes are taken together they constitute a redescription of the original bending energy matrix. This is analogous to a standard covariance-based PCA of any data set. A PCA does nothing more (or less) than provide a redescription of the original data in terms of a series of variance-ordered vectors (components) formed from the original variables (see MacLeod 2005). Like PCA, principal warps can be used to form the axes of a high-dimensional coordinate system into which landmark configurations other than the reference shape can be projected and the set of projections viewed as an ordination plot (see MacLeod 2010a). Such plots provide a visual sense of the degree to which these non-reference landmark configurations are similar to, or differ from, the reference configuration in a manner that is weighted by the geometric mode of deformation being expressed by each principal warp.
The easiest way to achieve this projection is to simply multiply the matrix of eigenvectors of the bending energy matrix (E, the principal warps) by the matrix of deviations of the landmarks of the shapes you wish to project into the principal warps space in the x (X’) or y (Y’) directions from the reference shape. This yields the weight matrix (W).
Bookstein (1991) suggested that, prior to this multiplication, the principal warps matrix be scaled by the inverse of the square roots of the principal values. The geometric result of this weighting is to emphasize large-scale deformations when determining the final values of W. However, like all weighting schemes in data analysis, this re-weighting needs to be justified in the context of each analysis.
Operationally, this weighting is accomplished by the following equation.
where Λ is the diagonalized matrix of principal values and α is the weighting factor. For Bookstein’s (1991) weighting scheme α = 1. If α > 0 large-scale variations will be weighted more highly in the determination of W. If α < 0 smaller-scale variations will receive greater weight. If α = 0 variations at all scales will be accorded equal weight.
Because of the obligation to justify all weighting schemes, a prudent default value for any principal warps analysis is α = 0. Nevertheless, the ability to set the α parameter to any desired value does give the analyst scope (albeit limited and rather crude) to fine-tune their analysis by allowing it to be focused, to a greater or lesser extent, on a generalized category of spatial variation. While adjustment of α can make a dramatic difference to principal warp ordinations, user’s should avoid the temptation use this parameter to try to make any partial warps ordination fit any particular hypotheses. Usually such an exercise is futile owing to the non-linear character of the principal warps themselves; they just don’t behave in a regular, predictable manner. However, in all cases such post hoc adjustments are indefensible. [Note: Those interested in assessing the effect of adjusting the α parameter should consult the Palaeo-math 101-2: Principal-Partial Warps spreadsheet, which can be downloaded from the Palaeo-math 101-2 web page, (see below).]
Once we’ve re-expressed the shapes in our sample as a set of W-matrix scores the hard part of relative warps analysis is over — or so most textbooks would have you believe. A classic relative warps analysis takes the complete set of these scores for both Wx and Wy and uses these as input into a standard covarince-based PCA.
On first inspection you might wonder “What’s the point of that? After all, a covariance-based PCA of a complete set of PCA scores should yield the original set of PCA scores. Nothing is gained by doing a PCA of a PCA. But recall the basis for a principal warps analysis is not the sample of shapes you’re interested in, but only the spatial information supplied by the reference shape. Moreover, the bending energy matrix is not a complete representation of shape variation within the reference shape, only the non-linear (= non-uniform) part thereof.
This strict dependence of the principal warps on the reference shape is the source of its most interesting and seductive feature: the fact that the principal warps are sample independent. Because of this feature the principal warps can be used as a geometric reference system that is completely indepedent of any sample. Unfortunately, it also means that each system of principal warps is fundamentally tied to what is essentially an arbitrary choice of reference. This choice can be made a bit less arbitrary by adopting the standard convention of using the sample mean shape as the reference. But, while this convention has the very desirable property of ensuring the linear relative warp spaces defined as combinations of the non-linear principal warp deformation modes are placed at a reasonable location within the set of shapes of interest, it also means the analyst has sacrificed the sample independence of their principle warps analysis unless the (equally arbitrary) decision is made to stop computing or updating the mean shape for other samples or subsequent analyses.
Setting these issues aside, a PCA of the total W matrix will result in a summary of shape variation that’s been optimised for a particular sample. If pursued in the standard mode, this summary will focus strictly on the non-linear aspects of shape variation (e.g., those that have a bending energy). It will also encompass variation at all spatial scales, though these might be differentially weighted (see the discussion of the α parameter, above).
Going back to our original goals for a generalized shape analysis system, relative warps analysis provides the best fit in all categories: it ordinates landmark-based shape configurations on the basis of their mutual similarity, treats these configurations as a unified geometric entities, respects the conventions of the Kendall shape space, supports shape modelling across all aspects of the geometric spaces formed by the relative warps either through thin plate splines or through direct back-calculation to the modelled landmark positions (see MacLeod 2009b, 2010b, and below), and owing to its sample-based character does not display the instabilities that come from referencing the shape spaces calculated to a single real or hypothetical specimen. In addition, relative warps analysis is also quite flexible in terms of the data it will accept. For example, if it would be advantageous to add-in scores calculated on the basis of the uniform component of shape variation, this is easily accommodated under relative warps analysis. It’s also possible to use the α parameter to focus the secondary relative warps analysis on variation existing as higher or lower spatial scales, provided there’s a clear justification for doing so (e.g., a desire to investigate allometric relations among the shape variables). Although not usually recommended, it’s even conceivable, at least in principle, to envision a relative warps analysis conducted on a subset of the principle warps thereby achieving a more complete contrast between shape similarity patterns at higher and/or lower spatial scales. These and many other data analysis variations are all possible in the context of relative warps analysis.
To illustrate the calculations involved in, and the interpretations that can be made from, relative warps results, let’s take our trilobite cranidial landmark data through the procedure. To do this we’ll use the complete set of 2(2k-3) principal warps weights (= scores, where the number of landmarks [k] = 10 for the trilobite dataset) and the weights on the uniform component of shape change (see the Palaeo-math 101: Relative Warps spreadsheet). Table 1 shows the eigenvalue data table for the covariance-based decomposition of these data.
Unlike principal warps none of the calculated eigenvector axes have been forced to adopt a value of 0.0 by the Procrustes alignment, though some of the eigenvalues for the higher-level relative warps are quite small. For these data 95 percent of the observed shape variation in the plane tangent to the Procrustes shape hemisphere at the sample mean shape is represented by the first seven relative warps.
The complete table of eigenvectors for this relative warps decomposition is too large to list here (see the Palaeo-math 101: Relative Warps spreadsheet). The loading coefficients for the first three of these relative warps axes are listed in Table 2. Together these relative warps account for over 75 percent of the shape variation recorded in our data. Looking in detail at this subset of the complete relative warps result will suffice for the purposes of our discussion.
The relative warps eigenvectors represent a set of displacements at each landmark location across the form as mediated by the non-uniform and uniform shape deformations specified by the partial warps scores (= weights) that served as the variables in this analysis. These scores themselves denote a varying system of weights applied to each partial warp variable that, together, summarize all observed shape-based variations exhibited by the 18 trilobite specimens included in the sample, ordered by the amount of shape variance being summarized along each relative warp.
This loading table may be interpreted in a manner identical to that of a standard principal components loading table. For the trilobite data the first relative warp axis (RW-1) expresses a geometric contrast between partial warps 1y and (to a lesser extent) 6y with respect to the Uniformx warp and (to a lesser extent) partial warps 5x and 2x. Specimens projecting to positions high on the RW-1 axis represents shapes that exhibit high covariance with partial warps 1y & 6y, and low covariance with partial warps 5x and 2x and with the uniform component of shape change long the x-axis. All the other relative warp axes are be interpreted in a similar manner.
The scores along these relative warp axes represent covariances between the observed shapes and these sets of latent, non-linear shape variables. These scores are calculated in a manner identical to that of PCA scores and can be plotted in two or three dimensions to assemble a picture of the dominant patterns of shape similarity/difference existing within the sample (Fig. 1).
As we have seen before in other analyses, the dominant shape contrast in this dataset occurs between the landmark-defined shapes of Acaste-Ceraurus and Sphaerexochus with the most important subdominant contrast being that between Acaste-Ptychoparia-Sphaerexochus and Deiphon. On the basis of this analysis Deiphon and Sphaerexochus also can be seen to relative unique shapes within this sample — shape outliers in a sense — while the landmark-defined shapes for all other the other genera form a broad band of shape variation oriented at an angle to the two dominant shape-variation trends.
Take the time to note how different this representation of shape variation within the sample is from any of the principal warps scatterplots I included in of the last column (MacLeod 2010a, Fig. 4). The shape variation information present in each of those principal warps plots has been included in the construction of Figure 1 (above). In the same way as a scatterplot of a PCA scores from any data set will look very different from plots of any two included variables, Figure 1 represents of summary of the information included in all the partial warp plots. This is a primary reason why relative warps are preferred to principal warps for most morphometric applications.
More than this however, Figure 1 represents the projection — in a linear space — of the positions of the various trilobite landmark configurations that represent these genera in their geometrically correct places on the surface of the Procrustes shape hemisphere. As such, this plot represents a better summary of geometric shape variation in these data than any other available to us at this time. We can (and will) collect other sorts of data from these specimens and take a look at what alternative summaries of cranidium geometry might tell us in upcoming columns. But so far as these landmark data are concerned, we have, at last, reached the end of our data analysis journey. There is no better summary of the geometry of these data that I can show you or teach you how to calculate. The only thing that remains is for you to calculate these types of summaries for datasets of your own.
But I do have one last trick up my sleeve which you might fine interesting. As I’m certain you appreciate, taking the path to a relative warps analysis that leads through principal warps analysis is conceptually complex and computationally intensive. Software can ease the computational load, but not the conceptual intricacies of selecting reasonable options and interpreting the results. Is there no shorter, more direct route between our data and the relative warps results we need to to use to interpret those data? As it turns out, there is.
The more direct solution to the calculation of relative warp ordinations is implicit in prior published discussions of the relative warps technique and, indeed, implicit in the presentation you’ve just read. The problem is, unless you were already very familiar with the principal warps and/or very experienced in reading descriptions of mathematical procedures, you probably missed it.
Recall I said that principal warps constituted a “redescription of the original data in terms of a series of variance-ordered vectors (components) formed from the original variables”. Recall also that I said “a covariance-based PCA of a complete set of PCA scores should yield the original set of PCA scores’. A covariance-based PCA of the complete set of PCA scores obtained from any dataset will be precisely the same as a PCA of the original data, save for minor differences due to rounding error. Accordingly, one might suppose that, since the complete set of principal warp weights is a redescription of the original Procrustes suposed shape coordinate data, and since a standard relative warps analysis is a PCA of the complete principal warp weight matrix (W), the same result could be obtained directly from a PCA of the Procrustes suposed shape coordinates.
Table 3 lists the eigenvalue table the PCA decomposition of the original Procrustes aligned trilobite cranidium landmark data.
Note the close correspondence to the values listed in Table 1 (above). Similarly, Table 4 listed the eigenvectors of the PCA decomposition of the original Procrustes aligned trilobite cranidium landmark data.
These vectors are aligned differently than those derived from the principal warps data (compare with Table 2, above). After all, there are 20 variables in this system and only 16 in the principal warps system. Nevertheless, when the the original cranidial landmark data are projected into the space formed by the first two Procrustes principal components are plotted …
Figure 2. Scatterplot of trilobite cranidium scores in the plane of the first two principal components of the Procrustes superposed shape coordinate data.
… the resultant ordination is essentially identical to that of the formal relative warps ordination (compare with Fig. 1). Note this ordination is also identical to the one we generated in our discussion of Procrustes shape coordinates (see MacLeod 2009c).
The close link between relative warps analysis and a PCA of Procrustes-aligned shape coordinate data has been known, appreciated, and used by experienced morphometricians for many years, even to the extent that it is routinely alluded to in presentations of the method at technical meetings. But for some reason this useful equivalence has only rarely made it into published articles and textbook treatments, and even then the relation tends to described in obscure ways.
For example, in the Zelditch et al. (2004) morphometrics primer the term ‘relative warps’ is not included whereas a discussion of Procrustes PCA is. The fact that the former is absent from the text because the latter has been included is not mentioned. Similarly, Jim Rohlf’s tpsRelw program requires users to calculate the principal warps decomposition before they can produce a relative warps result. This reinforces the impression that principal warps analysis is a necessary praecursor to relative warps analysis. While taking the formal route through principal warps might be necessary if a user intended to take advantage of the ability to use the α parameter to alter the spatial focus of the resulting relative warps analysis, the default value of the α parameter in Jim’s programme is set to 0 making the ordination identical that which would be obtained more directly from a Procrustes PCA.
Add to this the fact that the Procrustes PCA alternative also produces a set of eigenvector loadings that can be interpreted more readily in terms of the original superposed shape coordinates, and that can be used to create thin plate spine models of the deformations characterizing any part of the ordination space in a straight-forward and computationally simpler (= more easily understood) manner than the method required by formal calculation from the principal warps weight matrix, and you can appreciate the clear advantages of performing this analysis via Procrustes PCA than by calculation from principal warps weights (= scores).
Finally, Table 5 illustrates the advantages of calculating the along-axis shape models when making interpretation of the Procrustes PCA/relative warps axes, with the models represented (in this case) as thin plate splines (see MacLeod 2010b; note the modelling method discussed in MacLeod 2009b could also have been used as an alternative). Comparing the geometry of these models down each shape space axis makes the geometric interpretation of each axis a quick and easy process.
For our trilobite data, the Procrustes PCA/relative warps Axis 1 represents a dominant shortening of the cranidium in the anterio-posterior direction and a subordinate asymmetrical twisting of the shape from right to left down the axis. This twisting is, in all likelihood not a biological signal but rather a preservational artefact present in the specimens used in this dataset and emphasized as an important shape-variation trend (primarily) due to the small number of specimens included in this dataset. Along the Procrustes PCA/relative warps Axis 2 the (likely) artifactual twisting is also present, but this time as the dominant mode of shape variation and oriented in the opposite sense (from left to right) as one moves down that axis. As a statement of the power of the Procrustes PCA/relative warps approach to shape analysis it’s worth noting here that none of the other shape analysis procedures we’ve submitted these trilobite cranidial data to have revealed the preservation issues existing within this set of trilobite specimens in so clear and obvious a manner.
All of the analyses performed for this essay were undertaken using Mathematica routines that I would be happy to supply to readers on request. Finally, all of the calculations needed to perform a Procrustes PCA could also be done in MS-Excel provided a plug-in module has been installed to allow MS-Excel to calculate an eigenanalysis (e.g., PopTools, http://www.cse.csiro.au/poptools/). Now you really have no excuse not to start using Procrustes PCA/relative warps analysis today.
Everitt, B. 1978. Graphical techniques for multivariate data. Amsterdam, North Holland, 117 pp.
Gnanadesikan, R. 1977. Methods for statistical data analysis of multivariate observations. New York, John Wiley & Sons Inc., 311 pp.
MacLEOD, N. 2005. Principal components analysis (eigenanalysis & regression 5). Palaeontological Association Newsletter, 59, 42–54.
MacLEOD, N. 2009a. Shape theory. Palaeontological Association Newsletter, 71, 34–47.
MacLEOD, N. 2009b. Form & shape models. Palaeontological Association Newsletter, 72, 14–27.
MacLEOD, N. 2009c. Who is Procrustes and what has he done with my data? Palaeontological Association Newsletter, 70, 21–36.
MacLEOD, N. 2010a. Principal & partial warps. Palaeontological Association Newsletter, 74, 35–45.
MacLEOD, N. 2010b. Shape models II: the thin plate spline. Palaeontological Association Newsletter, 73, 24–39.
Zelditch, M. L., D. L. Swiderski, H. D. Sheets, and W. L. Fink. 2004. Geometric morphometrics for biologists: a primer. Amsterdam, Elsevier/Academic Press, 443 pp.