In the last two articles of this series we have chosen ways to code characters, constructed a data
matrix, analysed the matrix under several options of parsimony analysis (exhaustive, branch &
bound, heuristic) and looked at the behaviour of characters on the resulting trees/cladograms.
This all came with a bewildering array of details about CI, HI, RI etc. This article delves into
what these details mean. They revolve around how the characters are fitted, or optimised,
onto trees.
When PAUP* analyses a data matrix it searches for the optimal network that minimises the
number of character changes that must be assumed that is the essence of cladistic analysis.
But it is a network and not a tree. To convert one into another requires that we specify a
root. A root may be several taxa, one that you specify, or the first taxon in the data matrix (the
default option in PAUP*). In pre-computer days folks used to get excited about what was the
plesiomorphic or apomorphic state of a particular character. But now, as soon as you choose the
outgroup, the plesiomorphic state is set as the state in that taxon (of course, if there are multiple
outgroup taxa that differ in the character state assignments then the issues are more complicated
see Maddison et al. 1984 for more details). Let us make life simple for now. Optimisation of
characters on trees depends on what assumptions of character evolution you wish to enforce.
With a binary character (0, 1 states) there is little choice (DELTRAN vs ACCTRAN see first article).
The issues occur with multistate characters that best illustrate how characters are optimised.
For the first way in which a character may be optimised let us assume that we wish the character
to change (evolve if you like) in an ordered manner. In the literature this is known as Wagner
optimisation because it is the character state behaviour that the botanist H. Wagner assumed
when he first devised the Wagner algorithm on which parsimony programs are based. It is
also known as additive or ordered for reasons explained below. Figure 1 explains
the optimisation procedure. Let us assume that there are six taxa sharing states 0, 1, 2, 4 of a
multistate character that has a total of five states (state 3 happens to be missing from the taxa
under consideration here). The states of the character possessed by each taxon are given in
parentheses. As a result of analysis an unrooted network is produced (Fig. 1A). We choose a root
in this case taxon A that automatically creates the tree topology (Fig. 1B). The nodes on the
cladogram/tree are designated by w z. The process of optimisation reconstructs the states at
the nodes that minimises the number of changes that have to be assumed.

We begin by passing down the tree from the terminal taxa, assigning likely states to the nodes.
For node (y) subtending taxon C (state 1) and D (state 2) there may be either a 1 or a 2 state.
For node (z) subtending taxon E (state 2) and F (state 4) there may be a 2, 3 or 4 state. With
a character behaving under Wagner optimisation then to get from state 2 to 4 we must pass
through the 3 state (even though it does not happen to be represented in these particular
taxa) and in so doing we must add two steps to this change. In other words the character states
are ordered hence another name for this optimisation. To express this another way, in order
to pass from states 2 4 we have to add the state three into consideration; hence the term
additive. This is explained in Figure 1D. Now we come down to node (x) that subtends the likely
states at nodes (y) and (z). The common element at these latter nodes is state 2; therefore the
simplest explanation is that state 2 exists at this point. Carrying on in the same manner then likely states at node (w) are going to be 0, 1 or 2 (02). Having done this we are still left with
alternatives at nodes (w), (z) and (y). To resolve the states we sweep up the tree choosing the
alternatives that minimise the changes. We know the starting point as state 1 since that is seen
in the root (taxon A). Therefore passing between taxon A and node (w) to which we assigned 0,
1 or 2 states the most parsimonious solution is to assign a state 1 here. To explain the 0 state
in taxon B and the 2 state at node (x), two changes, or steps, are needed. These changes are
shown as black horizontal bars in Figure 1E. Try mentally assigning other states to node (w). We
know that state 2 exists at node (x). If we assigned a state 0 at node (w) then three changes,
or steps, would be necessary to explain the 0 on taxon B and the 2 state at node (x). This is
less parsimonious than the first solution and therefore will be rejected. Sweeping further up
the tree we do the same exercise and resolve the internal node states as shown in figure 1E,
remembering that under this type of optimisation to get from the 2 state to the 4 state we must
add two steps, hence the two black bars between node (z) and Taxon F. The total number of
steps exhibited by this multistate character on this tree is 5. Figure 1F shows that there is another
solution that is also five steps long. When this happens the optimisation is said to be ambiguous
(remember those single lines in the character change lists). Cladistics does not choose between
these options; you may have your own biological reasons for so doing.
The next commonly used optimisation is Fitch optimisation (after W. Fitch), also known as nonadditive
or unordered for reasons that are obvious. Here the same network is used as the starting
point (Fig. 2A) and the same exercise of assigning likely states to the internal nodes is undertaken
(Fig. 2C). This time, however, we can assume that any state can transform into any other at equal
cost. Therefore, the node subtending taxa E and F will have a state 2 or a state 4, with state 3
not being a possibility. The costs between any of the states are the same (Fig. 2D). Continuing
back down the tree and then sweeping up to resolve the alternatives we end up with a solution
shown in Figure 2E. Note that there are only four steps shown by this character on this tree
when it is assumed that the character is unordered (five steps when it was ordered). Once again
there may be more than one solution (ambiguous optimisation). [Please remember that the
choice between ordered and unordered is a biological choice and not one simply made because
unordered characters usually result in shorter trees! Also, you can order or unorder individual
characters (figure 10 in previous article). You cannot set ACCTRAN or DELTRAN for individual
characters].

There are two further options that are available for constraining the evolution of a character
on a tree. Dollo optimisation is used where we may believe that characters cannot be acquired
more than once. It is named after the embryologist Anton Dollo (Dollos Law) who suggested that
once a complex character state has been gained it can never be regained, but that it can be lost
on many occasions (an example may be the vertebrate eye). This may be invoked using the Set
Character type submenu of the DATA menu (previous article fig. 10) where you have the option
of specifying which way you want the gain to be interpreted. In Figure 3 the 1 state is the gain,
and because we have to assume that it can only be lost three steps have to be added to the tree
in the positions shown, in addition to the basal gain step.

The opposite of this assumption is called Camin-Sokal parsimony (named after two numerical
taxonomists) or irreversible parsimony. This assumes that character states can be gained
many times but that, once gained, they can never be lost. I can never think of a good reason
for invoking this with morphological data but some biogeographers use it in relation to areas
occupied by species.

In the two previous examples given the imposition of Dollo and irreversible is actually no less
parsimonious than straightforward unordered optimisation. But usually there are more steps
involved. Because of this the more parsimonious trees are likely to be those that favour the
minimum assumptions of change under the optimising procedure. But remember the actual
result depends on the interaction of many characters, not just the one that you have constrained.
All of the above optimising procedures are carried out in the parsimony algorithm through
small matrices called step matrices. These specify the cost of assuming certain character state
transformations. In Figure 5 step matrices are given for the four types of optimisation above. For
ordered characters then the cost of passing from state 0 to 2 (in either direction) is two steps
and from state 3 to state 0 it is three steps etc. Under the unordered regime than all changes
occur with equal cost (one step). Under Dollo then M equals a very large number. Thus if the
optimisation favoured two independent changes of 0 to 2 the 2 x 2M steps would be added
to the tree and this would almost certainly make the algorithm reject such a tree as decidedly
suboptimal. (Note that the reversals e.g. 3 state to 2 state add very few steps to the tree). Under
the irreversible step matrix the i stands for infinity.

It is possible to write your own step matrix for any particular multistate character. For instance
you may believe that it is four times more likely that state 1 can transform to state 4 than to
state 3, etc. These are not the easiest things to do, and I do not really know how you would
support such actions with morphological data. But the options are there. Remember, the
justification is biological, not cladistic. There are perfectly good justifications in molecular data.
The laws of chemistry justify that we accept that some changes among the nucleic acids are more
likely than others, and molecular systematists regularly use transition/transversion parsimony
(Fig. 5 bottom) in this case imposing a x5 penalty to some changes.
We can now return to the statistics that are poured out when you ask for tree descriptions (see top
of fig. 16, previous article). Here you will find the reports of the length of the tree, the consistency
index (and a separate line reporting the consistency index excluding the uninformative
characters), the homoplasy index and retention index, and something called the rescaled
consistency index. All of these statistics relate to the behaviour of all characters on the one tree
that appears below the statistics paragraph. These are ensemble figures. But they relate back to
the statistics of individual characters. The statistics for individual characters can be obtained by
checking the character diagnostics box under the tree description menu (previous article, fig. 14).
The character diagnostics reports a number of facts. At the top of Figure 6 (over the page) there
are details of the behaviour of two characters, numbered 1 and 2 on the particular tree shown
immediately below. The character x taxon matrix is shown to the right of the tree. If the tree
were a different shape then the statistics for individual characters will probably be different (for
nine taxa there are 2,027,025 fully bifurcating possible trees).
Let us have a look at character 1. On the tree character 1 appears twice; once in taxon C and
again supporting a group G I. However, if the tree were a different shape (lower right tree
where taxon C is grouped with taxa G I) then the character may fit only once it is a perfect
synapomorphy and this tree is the best tree for this particular character. Thus, the minimum
number of steps the character could make is 1 (this tells you that it is a binary character). On
the particular tree in question it actually makes two steps. Therefore the character is not totally
consistent with this particular tree. The consistency index (lower case ci) tells you by how much
it is consistent and it is computed by dividing the minimum number of steps the character
can make on any of the 2,027,025 trees (= m), divided by the actual number of the steps that it
makes on this particular tree ( = s). This computes to 0.500 (Fig. 6 bottom). The homoplasy index
(lower case hi) is simply 1 ci. Not many people bother to report this.
the distribution of that character is retained as a shared derived character, potentially enabling
us to recognise a group G I. Part of the character distribution is retained as a potential
synapomorphy, hence the term retention index. How much is computed by the formula shown
at the bottom of Figure 6. m and s are the same as for the consistency index calculation. g is
the maximum or greatest number of steps the character may make on any of the 2,027,025
trees. In other words g is the measure of the worst fit. The worst fit actually means the worst
of the most parsimonious solutions for all trees (if that makes any sense!). The tree shown at
bottom left is such a tree where the character must be assumed to have arisen on four occasions
(in practice the g value will be the lesser figure of occurrence/non-occurrence in the data matrix).
Using the formula given, then the retention index for this character will be 0.666. The rescaled
consistency index (lower case rc) is ci x ri and is often used to reweight characters (below).
Now look at character 2. This character is only found in Taxon B, therefore it has no grouping
ability at all. Yet it fits the tree perfectly and ci = 1 (hi = 0). Because it has no grouping ability
the retention index must be 0 (1-1/1-1). Thus the retention index is often more informative that
the consistency index.
The values for the consistency index and retention index reported when you ask for tree
descriptions are ensemble values for all characters on a particular tree. Thus the ensemble
consistency index for the tree in question (Upper Case CI) is calculated as the ratio of the
minimum number of steps that all characters can make on any tree to the minimum number of
steps all the characters make on the tree/cladogram in question. The ensemble retention index
(Upper Case RI) is similarly calculated where G is the greatest number of steps that all characters
can exhibit on any cladogram. And RC is simply CI x RI.

You will also notice that the CI is given considering all characters as well as excluding those
characters that are uninformative (have no grouping ability) as is character 2 in the example
given in Figure 6.
When you report the results of an analysis it is customary to give the ensemble values such as
"the resulting tree was 124 steps long, CI = 0.435, RI = 0.723, RC = 0.314".
Of what use may these values be? Some people may choose to use the values to reweight the
characters and carry out a further analysis. This is a posteriori weighting and can be done under
the DATA menu, then reweight characters. A box will appear asking you by what criterion
to reweight (you have the options of using the ci, the ri or the rc most people choose the
last). It will also ask you what baseweight to assign. Let us say that you choose 100. All those
characters that have a value of 1.000 will be weighted by 100, those with a value of 0.666 will be
weighted by 66, etc. It would be better not to choose the ci as the parameter since, as we have
seen above, an autapomorphy may have a ci value of 1.000, yet have no grouping potential. You
then carry out another analysis, and continue the procedure until the tree topologies stabilise.
Reweighting can reduce the number of equally parsimonious trees but this is not the primary
function of such a procedure. What is happening is that, at each stage of reweighting, a new
data matrix is created such that any character that has a value of 100 will be inserted to the
new data matrix 100 times; those of value 66 entered 66 times etc. Of course, because there
are many more characters in the new matrix then the overall length of the tree will be much
greater. This procedure is basically selectively weighting some characters more than others and
can be problematic from a philosophical point of view. Instances of where reweighting is done
involve cases where, after initial analysis, the CI is low (say below 0.500) but the RI is high (say
0.800). You must also be aware that reweighting may result in tree topologies not found in the
initial run.
Reweighting is debatable but it is seen in the literature and its inclusion here is to explain what it
means, since it is not immediately obvious. Other parsimony programs also include comparable
techniques.
To this point we have now covered the entire analysis. In the next article we will investigate
methods of estimating how good the trees and individual nodes actually are.
Peter Forey
The Natural History Museum
Cromwell Road, London SW7 5BD
REFERENCES
Madison, W.P., Donoghue, M.J. and Madison, D.R. 1984. Outgroup analysis and
parsimony. Systematic Zoology, 33: 83103.