The Cromar-Robb line II: observations on the phenomenon of generational “drift”


As I continue to build the descendancy chart for Peter Cromar 1690, which focuses on the Cromar side of the Cromar-Robb line, I can’t help but notice oddities in the data as it arrives at a certain critical mass. Patterns of migration and settlement, the frequency with which certain family names enter the line, and the vast number of Cromars who are not in the direct line are all worthy topics of deeper investigation. However, as an amateur family historian, one standout anomaly left me a bit unsettled, until I forced myself to get wonky with math and statistics. While I”m no statistician by any stretch of the imagination, as someone who works in the area of data visualization, I know enough to get by — or get into trouble with.

Unexpected variance

As I delved deeper into the study, an unexpectedly wider degree of variance in birth dates and death dates within generations emerged as I moved forward in time. By Generation 7 the gap became quite worrisome. My grandfather Charles Robb Cromar, a member of Generation 7, was born in 1907 and died in 1982. William Frederick Cromar, another member of Generation 7, was born in 1862, six years before Charles’ father Theodore was born in Generation 6!

At 2000-plus entries and counting, had something gone horribly wrong with this tree?

A thought experiment helps to clarify the mystery, which is not so mysterious once one understands it. I am calling this phenomenon generational drift, because I haven’t found anyone else who has identified it, and indeed to my surprise I haven’t found any really enlightening discussions about it. To fill that gap, we’ll construct an imaginary family using certain properties that are reasonable to assume, given the limits of human biology.

A generational drift model

Image a person born in 1680. This person, who we will call the Progenitor, has a first child born at age 20, and a last child born at age 40. The earlier child repeats a pattern of birthing a child at 20, while the later child repeats a pattern of giving birth at 40. Each child lives for 80 years. This pattern persists for several generations. By Generation 9, how do these two branches fare?

Diagram of generational drift, by the author. Click on the image for a larger view.

There is quite a bit of generational drift in this family. In Generation 9, the earliest member in the early-birth branch is born in 1840 and dies in 1920. The latest child in the late-birth branch is born in the year 2000 and passes in 2080. This means, among other interesting wonky bits:

Extremes in generational drift
  • The earliest child in Generation 9 is born 80 years after the Progenitor dies.
  • The earliest child in Generation 9 dies 80 years before the latest child in Generation 9 is born.
  • The entire span from the birth of the Progenitor to the death of the earliest child in Generation 9 is 240 years.
  • The entire span of Generation 9 alone, from birth of the earliest to death of the latest member, is also 240 years.
  • This time period is also equivalent to the entire span of time for Generations 1-5 altogether.
  • The earliest child born in Generation 5 will not be alive to meet the last child in Generation 5.
  • The earliest child in Generation 9 is born in the same year as the last child in Generation 5.
  • The entire span from birth of the Progenitor to the death of the last child in Generation 9 is 400 years.
  • It’s not on the chart, but if we saw the pattern persist further, the last person in Generation 9 would be born in the same year as the first person in Generation 17. The early-birth line effectively laps the late-birth line at this point.

Though one may quite rightly view these as extreme outlier scenarios, in my own experience I do have an aunt who is younger than me, my mother was born when her mother was 19, and my own first child was born when I was 45 — so who’s to say how improbable this chart may seem? It demonstrates that a generation should not be understood as a time-sharing cohort, that a wide variety of birth and death dates should be expected among members of a given generation in any descendancy chart, and that the disparity only expands with later generations.

But how much drift really happens?

Having said that, statistically speaking (and I am NOT a statistician, so experts out there can correct my spitballing), the average date for each generation follows an approximate thirty year pattern using these assumptions. Children born to the Progenitor will “average” around 1710, their children will average to 1740, and so on. This pattern persists through the generations even though the extremes of birth and death dates will spread over time.

Gen 2Gen 3Gen 4Gen 5Gen 6Gen 7Gen 8Gen 9
Average17101740177018001830186018901920
SD14.116.318.520.722.724.726.628.6
Early SD17031732176117901819184818771906
Late SD17171748177918101841187219031934

But how often can we really expect the outliers 1840 and 2000 to show up? Let’s broaden the parameters of this imaginary family timeline, so that each child of the Progenitor repeats the Progenitor’s pattern. That is, what happens if every child gives birth at age 20 and age 40? This beautifully fractal but intimidating chart emerges:

Expanded diagram of generational drift, including an early and late birth for each offspring, by the author. Click on the image for a larger view.

If we count all the births up in Generation 9, we find the following distribution of birth dates follows a classic bell curve:

184018601880190019201940196019802000
18285670562881

The table and the bar graph both indicate that more than half of the births will occur within the range prescribed by the standard deviation (SD in the chart), which is nearly 29 years. In other words, SD predicts more than half the total births will occur between 1906 and 1934. We can proportionally divide the 118 births by the approximate 1.5 decades that occur on both sides of 1920 and add that to the 70 born in 1920, and sure enough, we see that about 150 births occur within the standard deviation range — nearly 60%. This leaves about 20% of the total born before 1906 and about 20% born after 1934. If we include 2 standard deviations, from 1891 to 1949, that encompasses 220 births, or over 80% of the total number 256. Less than 10% occur before 1890 and after 1950 respectively.

So, the outliers really are outliers — but that doesn’t mean they don’t exist! My grandfather Charles is among the later births in Generation 7, and coupled with my father Charles’ latest child status in his family, these facts combine to account for my membership among the older outliers in Generation 9. The Progenitor in our family’s case was Peter Cromar born in 1690, so adding 10 years to the figures in the chart above gives you a better idea of just how “outlier” a case I am.

Older expected births in Peter’s Generation 7 would be around 1882, so Charles was born 25 years “late” in his cohort. His son Charles, born in 1937, was 34 years “late” in Generation 8. Since I was a first-born to younger parents, my lateness in Generation 9 doesn’t compound much on top of my father’s. The whole line looks like this, with the number in the bottom row signifying the age of the parent when the child in the next higher generation column is born:

Gen 1Gen 2Gen 3Gen 4Gen 5Gen 6Gen 7Gen 8Gen 9
PeterRobertJohnGeorgeJohnThuddieCharlesCharlesWilliam
169017171760179218231868190719371959
274332314539302247

Peter was 27 when Robert was born, Robert 43 when John arrived, and so on. You can see how nearly all of these ages are at or higher than the average of 30, often much higher. My oldest child, born when I was the ripe old age of 47, will be quite the later outliner in her Generation 10 as a result! Let’s see by just how much, and grab a random sample of some generational early birds to see where they land:

Gen 1Gen 2Gen 3Gen 4Gen 5Gen 6Gen 7Gen 8Gen 9
PeterRobertRobertGeorgeAlexWilliamWilliamJennieNora
169017171752177218071839186218861909
273520353223242323

Nora Ann Dearden, daughter of Jennie Cromar, is a member of my generation, but born half a century before me! And what if we include Gen 10? Nora’s son Richard is born 1932, as compared to my daughter Alyson, born 75 years later in 2007: same generation, different century! This is a spectacular example of the generational drift our thought experiment predicted.

Visualizing the drift

The charts, graphs, and tables above give us a stab at understanding this interesting phenomenon, but they are not a particularly elegant or coherent way to express the data visually. In the last post in this series, I’ll be exploring more about visualization techniques, but while we’re waiting and wading through other interesting diversions at the mid-point of this data collection exercise, let’s take a peek at one interesting designer who has synthesized a genealogical circle or fan chart with the kind of timelines I’ve been toying with above.

Courtney Barr is the inventor of the Lifespan Tree, an excellent illustration of what Edward Tufte calls rich data. We see both generational and temporal relationships in these charts that can be custom-made to order using anyone’s family dataset. It will be a real challenge for me to find a better way to express this kind of information, and coming from me, this is a real endorsement of her work! Visit Lifespan Trees to find out more.

An example of Courtney Barr’s Lifespan Tree diagram, as illustrated at her website. | Copyright 2019 by Courtney Barr

In our next post in this series, I’ll take a look at what kind of study the Cromar-Robb line dataset is evolving into…

Share this …


Leave a Reply

Your email address will not be published. Required fields are marked *