Classification is hard, can network science help?

September 7th, 2004 | by ian |

So I was inspired by a article by a friend of mine in thinking about his post: Science is easier from the outside. Given my background in experimental evolutionary biology I thought maybe I would throw a few comments his way, then my few comments combined to form something which probably oversteps the bounds of what can be considered a comment.

Classification in Biology, or phylogenetics, is fraught with issues that we typically do not face when creating our own systems of classification such as organization of content content on a website. Just look at the issues Anthropologists have in studying human evolution which, geologically speaking, happened yesterday.

When studying “trees of life” there is the necessarily subjective nomination of a phylogenetic root which causes biases in analysis of the rest of the hierarchy that are impossible to avoid (instead we often run many thousands of iterations of analysis on a dataset varying the choice of root that often yields radical differences). Think about it. How would you go about choosing the root of the tree?

Mismatches between genetic, morphological and life history based phylogenies abound: what data will you favour? You might think genetics is the most objective form of classification data but this is often problematic:

  • it is likely you have much less genetic information to work with (morphology preserves more easily than genetic information)
  • genes can be transferred between species via mobile elements, especially in the microbial and plant worlds which make up the majority of life on earth
  • genes can converge to the point where they look like they may have diverged from a common ancestor

Convergence is a problem since it can happen at all levels including genetic, morphological and life history (compared traits evolve separately and converge due to selective pressures and do not indicate shared ancestor).

This is all further compounded by gaps in the fossil record:

  • Different body structures and environments determine the ease of fossilization so the fossil record is biased.
  • Speciation can happen in the blink of a geological eye, so to speak, both in terms of the generation of diversity and the susequent sorting (selection). It is quite a detective story to determine who the suspects are…

Carl von LinnĂ©, the father of modern biological taxonomy, didn’t even have the benefit of understanding evolutionary processes let alone genetics when he developped his Systema Naturae. Instead he thought he was revealing the divine order in God’s creations. As a result of this starting assumption and very limited data set that didn’t include much in the way of non-morphological information his original constructions, while logical given what he had to work with, often did not reflect the natural-historical order.

The wild endeavour of science is one of discovery not invention, which we will leave to engineers. Scientists don’t have the luxury of constructing our world (and when they indulge in that luxury they often take us down the wrong path…not that thats a bad thing!). It is a process of discovery fraught with accidental success, abject failure, Eureka moments.

Classification is such a fundamental aspect of science, but it is also a wholly human one. A classification system can both be wildly useful and fundamentally flawed. What happens when something needs to go on two branches that are far apart in the classification structure?

Maybe a tree with a root and branches is the wrong way to look at classification. Perhaps we need to navigate a network of organization instead to find a happy home for everything, connected to all things related and far apart from that which is not. I admit that I am inspired here having recently read the book Six Degrees: The Science of a Connected Age which I believe to be the best account of why studying networks and their behaviours is relevant to all disciplines.

The likely problem is that conceptually and possibly even mathematically a network approach to classification might be too difficult for us!

Post a Comment