Multiple lines of evidence indicate that modern humans evolved within the last 200,000 years and spread out of Africa starting about 60,000 years ago. Before that, however, the details get a bit complicated. We’re still arguing about which ancestral population might have given rise to our lineage. Somewhere about 600,000 years ago, that lineage split off Neanderthals and Denisovans, and both of those lineages later interbred with modern humans after some of them left Africa.
Figuring out as much as we currently know has required a mix of fossils, ancient DNA, and modern genomes. A new study argues there is another complicating event in humanity’s past: a near-extinction period where almost 99 percent of our ancestral lineage died. However, the finding is based on a completely new approach to analyzing modern genomes, and so it may be difficult to validate.
Tracing diversity
Unless a population is small and inbred, they will have genetic diversity: a collection of differences in their DNA ranging from individual bases up to large rearrangements of chromosomes. These differences are tracked when testing services estimate where your ancestors were likely to originate. Some genetic differences arose recently, while others have been floating around our lineage since before modern humans existed.
These differences form the foundation of the new work, which analyzed multiple human genomes based on several well-established principles.
The first of these is that, given enough genomes, it’s possible to work out what the ancestral states of different areas of the chromosomes were. For example, a variation that’s present only in a set of closely related individuals and not anyone else probably arose in their common ancestor. That means the ancestral state of the chromosome lacked that variation.
Since we know the rate at which new mutations arise in modern humans, we can use these differences to create a molecular clock. In other words, we can take the number of mutations between the present and an ancestral state, compare that with the rate at which mutations occur, and estimate when that ancestral state was last present in the population.
Finally, the number of variations present in a population is related to the population’s size. Smaller populations tend to become inbred because it becomes difficult to avoid mating with relatives, leading to the loss of genetic diversity. In addition, there are simply fewer chromosomes around in total in small populations, which limits the potential for diversity. The converse is also true, in that large populations can support more diversity.
Put these together, and you have the outlines of what the researchers behind the new work have done. They took the variations present in today’s genomes and used them to determine the existence of various ancestral states and when they were likely to have existed. By figuring out how many different ancestral states were present at a given time, they could also estimate the population’s size.
Does this actually work?
All of this work is based on probabilities, so the results for any individual bit of chromosome have a fairly high chance of being wrong. But all those individual errors should be wrong in different ways. Given the entire genomes of enough individuals, however, a real signal should emerge from the noise of the individual errors. The big questions are whether the algorithm devised by the authors can recognize a signal and whether we have enough data to allow it to do so.
The researchers make their case by creating several model populations that undergo different forms of change. (Examples include a constant population size, constant growth, stasis followed by growth, and so on.) Various algorithms were set loose on this data, including the researchers’ software, FitCoal. Most of them had some significant errors, though some did better than others. And FitCoal consistently outperformed everything, producing population size estimates that were, in most cases, difficult to distinguish from the model population.
Reassuringly, most of the other algorithms produced results that were similar to those of FitCoal, though their error ranges were significantly larger.
The algorithm’s accuracy is likely to be the most controversial aspect of this work going forward, though. Unless someone spots an error in the code, then we’re likely to have to rely on comparisons with other software. Unfortunately, this sort of software is very computationally expensive. Adding more genomes to the analysis could also provide some clarity, as results could get more accurate with more data to work with. But additional genomes would make the computational challenge even worse.
https://arstechnica.com/?p=1965217