## 4. Optimisation

In the last two articles of this series we have chosen ways to code characters, constructed a data matrix, analysed the matrix under several options of parsimony analysis (exhaustive, branch & bound, heuristic) and looked at the behaviour of characters on the resulting trees/cladograms. This all came with a bewildering array of details about ‘CI’, ‘HI’, ‘RI’ etc. This article delves into what these details mean. They revolve around how the characters are fitted, or optimised, onto trees.

When PAUP* analyses a data matrix it searches for the optimal network that minimises the number of character changes that must be assumed – that is the essence of cladistic analysis. But it is a network and not a tree. To convert one into another requires that we specify a root. A root may be several taxa, one that you specify, or the first taxon in the data matrix (the default option in PAUP*). In pre-computer days folks used to get excited about what was the plesiomorphic or apomorphic state of a particular character. But now, as soon as you choose the outgroup, the plesiomorphic state is set as the state in that taxon (of course, if there are multiple outgroup taxa that differ in the character state assignments then the issues are more complicated – see Maddison et al. 1984 for more details). Let us make life simple for now. Optimisation of characters on trees depends on what assumptions of character evolution you wish to enforce. With a binary character (0, 1 states) there is little choice (DELTRAN vs ACCTRAN – see first article). The issues occur with multistate characters that best illustrate how characters are optimised.

For the first way in which a character may be optimised let us assume that we wish the character to change (evolve if you like) in an ordered manner. In the literature this is known as Wagner optimisation because it is the character state behaviour that the botanist H. Wagner assumed when he first devised the ‘Wagner algorithm’ on which parsimony programs are based. It is also known as additive or ordered for reasons explained below. Figure 1 explains the optimisation procedure. Let us assume that there are six taxa sharing states 0, 1, 2, 4 of a multistate character that has a total of five states (state ‘3’ happens to be missing from the taxa under consideration here). The states of the character possessed by each taxon are given in parentheses. As a result of analysis an unrooted network is produced (Fig. 1A). We choose a root – in this case taxon A that automatically creates the tree topology (Fig. 1B). The nodes on the cladogram/tree are designated by w – z. The process of optimisation reconstructs the states at the nodes that minimises the number of changes that have to be assumed.

We begin by passing down the tree from the terminal taxa, assigning likely states to the nodes. For node (y) subtending taxon C (state 1) and D (state 2) there may be either a ‘1’ or a ‘2’ state. For node (z) subtending taxon E (state 2) and F (state 4) there may be a ‘2’, ‘3’ or ‘4’ state. With a character behaving under Wagner optimisation then to get from state ‘2’ to ‘4’ we must pass through the ‘3’ state (even though it does not happen to be represented in these particular taxa) and in so doing we must add two steps to this change. In other words the character states are ordered – hence another name for this optimisation. To express this another way, in order to pass from states 2 – 4 we have to add the state three into consideration; hence the term ‘additive’. This is explained in Figure 1D. Now we come down to node (x) that subtends the likely states at nodes (y) and (z). The common element at these latter nodes is state ‘2’; therefore the simplest explanation is that state ‘2’ exists at this point. Carrying on in the same manner then likely states at node (w) are going to be ‘0’, ‘1’ or ‘2’ (0–2). Having done this we are still left with alternatives at nodes (w), (z) and (y). To resolve the states we sweep up the tree choosing the alternatives that minimise the changes. We know the starting point as state ‘1’ since that is seen in the root (taxon A). Therefore passing between taxon A and node (w) to which we assigned ‘0’, ‘1’ or ‘2’ states the most parsimonious solution is to assign a state ‘1’ here. To explain the ‘0’ state in taxon B and the ‘2’ state at node (x), two changes, or steps, are needed. These changes are shown as black horizontal bars in Figure 1E. Try mentally assigning other states to node (w). We know that state ‘2’ exists at node (x). If we assigned a state ‘0’ at node (w) then three changes, or steps, would be necessary to explain the ‘0’ on taxon B and the ‘2’ state at node (x). This is less parsimonious than the first solution and therefore will be rejected. Sweeping further up the tree we do the same exercise and resolve the internal node states as shown in figure 1E, remembering that under this type of optimisation to get from the ‘2’ state to the ‘4’ state we must add two steps, hence the two black bars between node (z) and Taxon F. The total number of steps exhibited by this multistate character on this tree is 5. Figure 1F shows that there is another solution that is also five steps long. When this happens the optimisation is said to be ambiguous (remember those single lines in the character change lists). Cladistics does not choose between these options; you may have your own biological reasons for so doing.

The next commonly used optimisation is Fitch optimisation (after W. Fitch), also known as nonadditive or unordered for reasons that are obvious. Here the same network is used as the starting point (Fig. 2A) and the same exercise of assigning likely states to the internal nodes is undertaken (Fig. 2C). This time, however, we can assume that any state can transform into any other at equal cost. Therefore, the node subtending taxa E and F will have a state ‘2’ or a state ‘4’, with state ‘3’ not being a possibility. The costs between any of the states are the same (Fig. 2D). Continuing back down the tree and then sweeping up to resolve the alternatives we end up with a solution shown in Figure 2E. Note that there are only four steps shown by this character on this tree when it is assumed that the character is unordered (five steps when it was ordered). Once again there may be more than one solution (ambiguous optimisation). [Please remember that the choice between ordered and unordered is a biological choice and not one simply made because unordered characters usually result in shorter trees! Also, you can order or unorder individual characters (figure 10 in previous article). You cannot set ACCTRAN or DELTRAN for individual characters].

There are two further options that are available for constraining the ‘evolution’ of a character on a tree. Dollo optimisation is used where we may believe that characters cannot be acquired more than once. It is named after the embryologist Anton Dollo (Dollo’s Law) who suggested that once a complex character state has been gained it can never be regained, but that it can be lost on many occasions (an example may be the vertebrate eye). This may be invoked using the ‘Set Character type’ submenu of the ‘DATA’ menu (previous article fig. 10) where you have the option of specifying which way you want the ‘gain’ to be interpreted. In Figure 3 the ‘1’ state is the gain, and because we have to assume that it can only be lost three steps have to be added to the tree in the positions shown, in addition to the basal ‘gain’ step.

The opposite of this assumption is called Camin-Sokal parsimony (named after two numerical taxonomists) or irreversible parsimony. This assumes that character states can be gained many times but that, once gained, they can never be lost. I can never think of a good reason for invoking this with morphological data but some biogeographers use it in relation to areas occupied by species.

In the two previous examples given the imposition of Dollo and irreversible is actually no less parsimonious than straightforward unordered optimisation. But usually there are more steps involved. Because of this the more parsimonious trees are likely to be those that favour the minimum assumptions of change under the optimising procedure. But remember the actual result depends on the interaction of many characters, not just the one that you have constrained.

All of the above optimising procedures are carried out in the parsimony algorithm through small matrices called ‘step matrices’. These specify the ‘cost’ of assuming certain character state transformations. In Figure 5 step matrices are given for the four types of optimisation above. For ‘ordered’ characters then the cost of passing from state ‘0’ to ‘2’ (in either direction) is two steps and from state ‘3’ to state ‘0’ it is three steps etc. Under the unordered regime than all changes occur with equal cost (one step). Under Dollo then M equals a very large number. Thus if the optimisation favoured two independent changes of ‘0’ to ‘2’ the 2 x 2M steps would be added to the tree and this would almost certainly make the algorithm reject such a tree as decidedly suboptimal. (Note that the reversals e.g. ‘3’ state to ‘2’ state add very few steps to the tree). Under the irreversible step matrix the ‘i’ stands for infinity.

It is possible to write your own step matrix for any particular multistate character. For instance you may believe that it is four times more likely that state 1 can transform to state 4 than to state 3, etc. These are not the easiest things to do, and I do not really know how you would support such actions with morphological data. But the options are there. Remember, the justification is biological, not cladistic. There are perfectly good justifications in molecular data. The laws of chemistry justify that we accept that some changes among the nucleic acids are more likely than others, and molecular systematists regularly use transition/transversion parsimony (Fig. 5 bottom) – in this case imposing a x5 penalty to some changes.

We can now return to the statistics that are poured out when you ask for tree descriptions (see top of fig. 16, previous article). Here you will find the reports of the length of the tree, the consistency index (and a separate line reporting the consistency index excluding the uninformative characters), the homoplasy index and retention index, and something called the rescaled consistency index. All of these statistics relate to the behaviour of all characters on the one tree that appears below the statistics paragraph. These are ensemble figures. But they relate back to the statistics of individual characters. The statistics for individual characters can be obtained by checking the ‘character diagnostics’ box under the ‘tree description’ menu (previous article, fig. 14). The character diagnostics reports a number of facts. At the top of Figure 6 (over the page) there are details of the behaviour of two characters, numbered 1 and 2 on the particular tree shown immediately below. The character x taxon matrix is shown to the right of the tree. If the tree were a different shape then the statistics for individual characters will probably be different (for nine taxa there are 2,027,025 fully bifurcating possible trees).

Let us have a look at character 1. On the tree character 1 appears twice; once in taxon C and again supporting a group G – I. However, if the tree were a different shape (lower right tree where taxon C is grouped with taxa G – I) then the character may fit only once – it is a perfect synapomorphy and this tree is the best tree for this particular character. Thus, the minimum number of steps the character could make is 1 (this tells you that it is a binary character). On the particular tree in question it actually makes two steps. Therefore the character is not totally consistent with this particular tree. The consistency index (lower case ci) tells you by how much it is ‘consistent’ and it is computed by dividing the minimum number of steps the character can make on any of the 2,027,025 trees (= m), divided by the actual number of the steps that it makes on this particular tree ( = s). This computes to 0.500 (Fig. 6 bottom). The homoplasy index (lower case hi) is simply 1 – ci. Not many people bother to report this.

The distribution of that character is retained as a shared derived character, potentially enabling us to recognise a group G – I. Part of the character distribution is retained as a potential synapomorphy, hence the term retention index. How much is computed by the formula shown at the bottom of Figure 6. ‘m’ and ‘s’ are the same as for the consistency index calculation. ‘g’ is the maximum or greatest number of steps the character may make on any of the 2,027,025 trees. In other words ‘g’ is the measure of the worst fit. The worst fit actually means the worst of the most parsimonious solutions for all trees (if that makes any sense!). The tree shown at bottom left is such a tree where the character must be assumed to have arisen on four occasions (in practice the ‘g’ value will be the lesser figure of occurrence/non-occurrence in the data matrix). Using the formula given, then the retention index for this character will be 0.666. The rescaled consistency index (lower case rc) is ci x ri and is often used to reweight characters (below).

Now look at character 2. This character is only found in Taxon B, therefore it has no grouping ability at all. Yet it fits the tree perfectly and ci = 1 (hi = 0). Because it has no grouping ability the retention index must be 0 (1-1/1-1). Thus the retention index is often more informative that the consistency index.

The values for the consistency index and retention index reported when you ask for tree descriptions are ensemble values for all characters on a particular tree. Thus the ensemble consistency index for the tree in question (Upper Case CI) is calculated as the ratio of the minimum number of steps that all characters can make on any tree to the minimum number of steps all the characters make on the tree/cladogram in question. The ensemble retention index (Upper Case RI) is similarly calculated where G is the greatest number of steps that all characters can exhibit on any cladogram. And RC is simply CI x RI.

You will also notice that the CI is given considering all characters as well as excluding those characters that are uninformative (have no grouping ability) as is character 2 in the example given in Figure 6.

When you report the results of an analysis it is customary to give the ensemble values such as "the resulting tree was 124 steps long, CI = 0.435, RI = 0.723, RC = 0.314".

Of what use may these values be? Some people may choose to use the values to reweight the characters and carry out a further analysis. This is a posteriori weighting and can be done under the DATA menu, then ‘reweight characters’. A box will appear asking you by what criterion to reweight (you have the options of using the ci, the ri or the rc – most people choose the last). It will also ask you what baseweight to assign. Let us say that you choose ‘100’. All those characters that have a value of 1.000 will be weighted by 100, those with a value of 0.666 will be weighted by 66, etc. It would be better not to choose the ci as the parameter since, as we have seen above, an autapomorphy may have a ci value of 1.000, yet have no grouping potential. You then carry out another analysis, and continue the procedure until the tree topologies stabilise. Reweighting can reduce the number of equally parsimonious trees but this is not the primary function of such a procedure. What is happening is that, at each stage of reweighting, a new data matrix is created such that any character that has a value of 100 will be inserted to the new data matrix 100 times; those of value 66 entered 66 times etc. Of course, because there are many more characters in the new matrix then the overall length of the tree will be much greater. This procedure is basically selectively weighting some characters more than others and can be problematic from a philosophical point of view. Instances of where reweighting is done involve cases where, after initial analysis, the CI is low (say below 0.500) but the RI is high (say 0.800). You must also be aware that reweighting may result in tree topologies not found in the initial run.

Reweighting is debatable but it is seen in the literature and its inclusion here is to explain what it means, since it is not immediately obvious. Other parsimony programs also include comparable techniques.

To this point we have now covered the entire analysis. In the next article we will investigate methods of estimating how ‘good’ the trees and individual nodes actually are.

## References

Madison, W.P., Donoghue, M.J. and Madison, D.R. 1984. Outgroup analysis and parsimony. Systematic Zoology, 33: 83–103.