What We Have Learned From a Y-DNA Surname Project
“Olympia Genealogical Society Quarterly,” Olympia, Washington,
Volume XXXVI, No. 3, July 2010, Pages 6-13.
By Lee Allyn James
In the last decade the use of DNA analysis to supplement conventional genealogical
research has skyrocketed. DNA analysis cannot replace the old-fashioned genealogical
methods of research, but it can be quite valuable in verifying what one has already
discovered or perhaps suggesting an alternative direction for the conventional research.
The two most common DNA analyses are associated with the mitochondrial DNA (mtDNA)
passed on from mothers to their children, and the Y-chromosome DNA (Y-DNA) passed
on from fathers to their sons, with the latter (Y-DNA) being the most commonly used
analysis method. Y-DNA analyses are, at present, the most frequently used because
they are usually associated with a stable surname. The Y-chromosome is passed from
father to son and, except for occasional mutations, remains unchanged from generation
to generation. The other property that makes Y-DNA more useful in genealogy than
mtDNA is that the Y-DNA mutation rate is fast enough that we can see changes in genealogical
times, but not so fast that they happen all the time. On the other hand, mtDNA mutation
rates are so low that even if two individuals match, their common ancestor may be
many thousands of years ago.
Because of the above properties of the Y-chromosome and the probability of
stable surnames, a number of “surname projects” have been organized and numerous
databases have been made available on-line wherein one can compare his own Y-DNA
profile with that of others with the same surname. This article describes such a
project featuring men with the surname James who have had their Y-DNA tested and
entered into the database. The overall James Y-DNA Surname Project presently features
the Y-DNA profiles of (at present) over 115 James men, and the results for our group
may be viewed in at http://www.familytreedna.com/public/james/default.aspx. The results
show that there are scores of unrelated James lines, and the author‘s group is that
group of James men who are descended from the David James who was born about 1669
in Wales, emigrated to America, and died in Chester County (now Delaware County),
Pennsylvania in 1739. Many Y-DNA surname projects have the same objective, but the
present article shows how much more can be learned from such a study.
Descendants of David James
Eight men with the surname James have had their Y-DNA tested, and the results
show that they are descended from a David James who was born about 1669, possibly
in Radnorshire, Wales,  emigrated to America, and died 27 June 1739 in Chester
County, Pennsylvania. Because this David James had descendants also named David,
he is referred to herein as ¯Ancestral David. The eight participants are shown in
Figure 1. Seven of the eight had their Y-DNA characterized by FamilyTreeDNA® (FTDNA),
while the eighth, Robert, employed a different laboratory, Familybuilder®. The numbers
below each of the participants shown in Figure 1 are the kit numbers assigned by
FTDNA. Each of the participants had researched their ancestors employing conventional
genealogical methods. The remainder of this article discusses what has been learned
as a result of this project.
The individual Y-DNA results for 37 markers are tabulated in Figure 2. Not
all of the participants had the same number of markers tested: Robert, 16 markers;
James, 25 markers; Wilbur, Larry and Joe, 37 markers; and Warren, Lee, and Chris,
67 markers. Nevertheless, some very meaningful conclusions can be drawn from the
results shown in Figure 2. It will be noted in Figure 2 that a few mutations exist
between the different members; mutations are denoted by a shaded cell. For example,
Wilbur and Robert are brothers, and both show a value of 31 at DYS 398-2, whereas
the other James participants all show a value of 30. It was originally thought that
Wilbur and Robert had differing values at DYS H4 because their respective laboratories
showed a difference at DYS H4. However, they used different test labs and, after
applying the appropriate correction, both brothers had the same value at DYS H4.
Finally, Larry, Joe, and Chris show a value of 39 at DYS CDYb, whereas the other
participants all show a value of 38. However, as will be shown, these are not unexpected
variations. Note that the DYS markers in Figure 2 denoted with an asterisk generally
exhibit a faster mutation rate than the average.
Example 1: Determine if the Eight Participants Are Related
The results of Figure 2 show that the largest difference in the Y-DNA results
occurs between Warren (or Lee) and Larry, a difference of one for three markers (i.e.,
a “genetic distance” of 3). FTDNA statistics predict that for a 34/37 match, there
is a 95% probability that their Most Recent Common Ancestor (MRCA) will be no more
than about fifteen or sixteen ancestors away, and in fact their MRCA (Ancestral David)
is only nine generations away. (Warren and Lee match 67/67, and the 95% probability
of their MRCA (Samuel) is no more than six generations - the observed result is five).
Therefore, it is clear that all of the eight James participants are indeed related
to one another. Although their research is incomplete, it is clear that Joe and Chris
fit into this James group. They either descend from Ancestral David or, at the very
least, share a common male James ancestor with Ancestral David. More about Joe and
Example 2: Determine if Ancestral David Had a Father and/or a Brother in America
Older James genealogies, based upon unproved legends, have suggested that Ancestral
David had a brother How-ell James and a father James James, both of whom also emigrated
to America and settled in Chester County, Pennsylvania. Fortunately, descendants
from those two men have participated in the overall James Surname Y-DNA Study. One
participant (Kit ¹ 49832) whose descent from Howell James is yet to be completely
proved, and two participants descended from James James (Kits ¹ 57434 & 67782), and
their results showed that our Ancestral David was not related to either Howell or
James. Kit ¹ 49832 exhibited a “genetic distance” of 17 on a 12-marker comparison
with Warren, Kit ¹ 57434 exhibited a genetic distance of 5 on a 12-marker comparison,
and Kit ¹ 67782 exhibited a genetic distance of 13 on a 37-marker comparison, all
indicating no relationship. Hence, the Y-DNA analysis disproved the earlier suggestions
that Ancestral David had a brother (Howell) and father (James) also living in Chester
Example 3: Derive the 37-Marker Haplotype of Ancestral David
The following analysis is based on 37 markers and employs the results for the
six James men proved to be descended from Ancestral David. It is also stipulated
that “parallel mutations” are not allowed in this analysis. A parallel mutation is
defined as the same mutation at the same DYS in two different descendant lines. Although
rare, parallel mutations do occur but are not considered in this analysis.
Warren and Lee are both descended from Samuel, and they match 37/37 (actually
the full match is 67/67). In the 37 markers considered, Warren and Wilbur differ
by only marker (DYS 389-2). Their common ancestor is Ancestral David, so Ancestral
David‘s 37-marker haplotype is the same as Warren and Wilbur‘s on the 36 markers
that they have in common. Ancestral David‘s value at DYS 389-2 could either be 30
(Warren, James, and Lee) or it could be 31 (Wilbur and Robert). Larry, who like Wilbur
and Robert, is also descended from Thomas, and Larry has a value of 30 at DYS 389-2.
Therefore, Ancestral David and his grandson Samuel have the same 37-marker haplotype
and, for that matter, the same 37-marker haplotype as Warren and Lee.
Looking now at Daniel, the son of Thomas, since Wilbur has all the same marker
values as Ancestral David except for DYS 389-2, then Daniel has to have the same
36 markers. Since parallel markers are not allowed in this analysis, then Daniel
must have a value of 30 at DYS 389-2 since Larry (who is descended from Daniel) has
a value of 30 there. Therefore, Ancestral David, his sons Evan and Thomas, and his
two grandsons Samuel and Daniel, all have the same 37-marker haplotype, and this
is shown in Figure 2. However, if more markers were to be tested, it is possible
that Ancestral David and his grandsons might eventually exhibit different haplotypes.
The mutation at DYS 389-2 exhibited by Wilbur and Robert happened somewhere
between Daniel‘s son Thomas, and Wilbur and Robert‘s father William. Three mutations
also ocurred between Daniel‘s son Jonathan and Larry, but all of Larry‘s mutations
occur in the faster mutating markers (denoted by asterisks in Figure 2). For example,
DYS CDYb mutates about fifteen times faster than the “average” marker.
Example 4: Where Do Joe and Chris Fit In?
Both Joe and Chris match Ancestral David‘s 37-marker Y-DNA profile except for
a value of 39 at DYS CDYb instead of Ancestral David‘s value of 38.
One possible hypothesis might be that Joe and Chris fit into the line from
Ancestral David through Thomas, Daniel and Jonathan. If this is true, then Joe and
Chris must descend from Jonathan to account for the CDYb mutation (Larry has the
same mutation) because Jonathan‘s brother Thomas does not have the CDYb mutation.
However, Elias (Joe‘s ancestor) was born circa 1770-1780 and died circa 1840-1850,
while Jonathan was born circa 1785 and died in 1843. Therefore, Elias and Jonathan
were contemporaries and Elias could not have descended from Jonathan. Although parallel
mutations are generally not allowed in this type of analysis, an exception might
be made in the case of Joe, Larry and Chris because CDYb mutates about fifteen times
faster than the “average” marker.
Chris is a 37/37 match with Joe, and like Joe his research is incomplete. Chris‘s
research extends back only to Isaac, and Isaac might be a possible son of Elias.
This link is suggested by several factors: 1) Joe and Chris match 37/37, suggesting
a very close relationship. 2) Unlike most of the other James lines, both Joe‘s
line and Chris‘s line trace back to the deep South (Alabama, Georgia, Mississippi).
3) The full name of Isaac‘s son is William Elias James, suggesting that the middle
name might be honoring a grandfather (Elias).
Besides Daniel, Ancestral David‘s son Thomas had another son Jonathan (a full
brother to Daniel), a son Enoch who was a half-brother to Daniel, and Elias (a half-brother
to Daniel). Daniel‘s sons and Elias‘ descendants are fully accounted for and Jonathan
had only girls, but it is possible that Joe and/or Chris might descend from Enoch.
Enoch married Rachel Richards in Pennsylvania. We have lost track of Enoch after
his marriage to Rachel, and it is possible that he moved to the deep South where
Joe‘s and Chris‘ lines presently end. It is interesting that Elias (Joe‘s ancestor)
was born 1770-1780, and thus is the right age to be a son of Enoch. Elias had a son
named Thomas (perhaps named after a great-grandfather?), and this Thomas had sons
named Enoch and David, and a daughter named Rachel. These are all relatively common
given names of the period, but they do suggest a possible relationship back to Enoch,
the grandson of Ancestral David.
It is also possible that Joe and/or Chris descend from Ancestral Davis‘s son
Isaac. However, nothing is known about Isaac or his descendants and there is no evidence
to either prove or disprove this suggestion. Further research is necessary to determine
where Joe and Chris fit into this James group, but there is no question that they
do indeed fit.
Example 5: The Unhappy Case of a Ninth Participant
A participant in the overall James Surname Project (name and kit number withheld
for privacy reasons) had researched his James ancestry back to Ancestral David and,
in addition, had written a book on his family history. He had traced his ancestry
back through Ancestral David‘s son Isaac, and the rest of our group was very pleased
because he would have been the only known descendant of Isaac. However, when his
12-marker Y-DNA test results came back he had only a 8/12 match (genetic distance
of 4) with the rest of the group, and clearly was not related to the rest of the
group. FTDNA states that with a 8/12 match, “the odds greatly favor that you have
not shared a common male ancestor with this person within thousands of years.”
Two possibilities for this difference are possible: 1) the genealogical research
was incorrect, or 2) a “non-paternal event” occurred somewhere along his line (e.g.,
an adoption, an unregistered name change, or marital infidelity). Although an unhappy
outcome, this negative result at least showed the participant that he has a problem
that he might be able to solve.
Example 6: Determine Our James Group Haplogroup
FTDNA defines haplogroups this way: “If we look at the world population as
a huge genealogical tree, the haplogroups are the original branches of the tree,
which characterized the early human migrations. Therefore, haplogroups are normally
associated with geographical regions.” Knowledge of one‘s haplogroup could provide
an idea about the migration routes of one‘s “deep ancestors” (those ancestors who
lived much earlier than recorded history).
An approximate estimate of one‘s haplogroup may be made using results such
as those shown in Figure 2. For example, such an estimate predicted that our little
James group belongs to Haplogroup R1b, a group of people who expanded throughout
Europe as humans re-colonized after the last Ice Age about 10,000 years ago. Oppenheimer
has suggested that the R1b ancestors of the Welsh spent the last Ice Age in refuge
from the ice close to the Bay of Biscay near the present France/Spain border, and
traveled by land to Wales because the sea levels were about 137 meters (450 feet)
lower than today, but this theory has been disputed by others. The R1b Modal Haplogroup
is shown in Figure 2. However, further refinement of one‘s haplogroup beyond the
simple R1b designation requires a different type of DNA test - one that evaluates
Warren James in our James group has had a number of SNP tests to further refine
his (and the group‘s) haplogroup. The results are generally portrayed as a “positive”
or a “negative” for a given SNP. Warren‘s results, generated at two laboratories
(FTDNA and EthnoAncestry®), are:
Positive: M207, M173, M343, P25, M269, S129, S128, S116, and L21.
Negative: M18, M73, S21, S29, S26, M65, M153, SRY2627, S28, M126, M160, S68, M37,
M222, and P66.
Although the testing and much of the DNA analysis have been standardized worldwide,
the interpretation of haplogroups is still a subject of discussion between analysts.
Without going into further detail, the above results put Warren (and the rest of
the group) into Haplogroup R1b1b2a1a2f* (where the asterisk denotes that further
refinement is necessary but not possible at this time) according to the ISOGG,
and R1b1b2a1b5* according to FTDNA. The above difference is mainly an issue of timing
- ISOGG tends to declare a new haplogroup earlier than FTDNA which tends to review
the data longer. Additional SNPs will need to be discovered and tested for before
additional refinement in our haplogroup is possible. At this time additional research
(and much more data) is necessary in order to further refine “deep ancestry” migration
routes, but the hope is that someday this will be possible. It is also probable that
the above haplogroup designations will change as new SNPs are discovered and tested
Example 7: Average Mutation Rates For Our James Group
Knowledge of the mutation rates for our James group was undoubtedly not high
on the priority list when the various participants joined our group. However, knowledge
of mutation rates is important to geneticists in understanding how (and when) Y-DNA
mutates, and this is the subject of considerable ongoing discussion and controversy.
Therefore, the Y-DNA results for our James group have been included in a genetic/genealogy
database maintained by Charles F. Kerchner, Jr.
It has been long been recognized that some DYS markers mutate at a higher (or
lower) rate than the “average.” For example, estimates of multiples above the “average”
rate for some of the faster-mutating markers are: DYS 464c 2.5 times faster, DYS
449 3.6 times faster, DYS 576 4.4 times faster, and DYS CDYb 15.4 times faster. It
is also apparent that the average mutation rates in some haplogroups are higher than
in other haplogroups. For example, the average marker in Haplogroup R1a appears to
mutate about 1.8 times faster than the average marker in Haplogroup R1b.
Kerchner‘s predictions for “average” mutation rates (ì) for our James group
are: 0.0031 mutations per transmission event for a 12-marker FTDNA test, 0.0030 for
a 25-marker test, and 0.0040 for a 37-marker test, where one transmission event (T)
is from a father to a son (one generation). This implies that, for a 37-marker (M)
profile, the average “lifetime” of a DYS haplotype in our James group is
1/[1 - (1 - ì)M] = 1/[1 - (1 - 0.004)37] = 7.3 generations.
In other words, on the average, a Y-DNA haplotype in our group retains the same value
for about 7.3 generations before a mutation occurs and the haplotype changes. The
average mutation rate (ì) can also be used to verify that the number of mutations
observed between two persons is reasonable. For example, Warren (or Lee) and Larry
differ by 3 mutations in 37 markers. The number of expected mutations can be estimated
(ì)(M)(T) = (0.004)(37)(16) = 2.4 mutations
where T is the number of transmission events (16 events, 7 from Warren to Ancestral
David and 9 from Larry to Ancestral David). The prediction of 2.4 mutations compared
to the observed 3 mutations is very reasonable considering that all three of Larry‘s
mutations are in the faster-mutating markers.
Example 8: “Are We Related To Jesse James?”
Almost everyone with the surname James has, at one time or another, been asked
the question “Are you related to the outlaw Jesse James?” In the case of our little
James group, traditional genealogical research has already shown that it is quite
unlikely that we were related to the famous outlaw Jesse Woodson James (1847-1882).
However, it is always remotely possible that somewhere along our relatively long
James line an unknown James man escaped notice or perhaps even an earlier undiscovered
James connection in the British Isles. The James Y-DNA surname project has allowed
us to completely foreclose this remote possibility. One participant in the overall
James Project (Kit ¹ 108303) is descended from a common male ancestor as the outlaw
Jesse James: William James (1754-1805) who was born in England. Hence, his Y-DNA
profile should be close to, if not exactly matching, that of Jesse James. Comparing
this participant‘s (Kit ¹ 108303) 67-marker Y-DNA profile to that of Warren and/or
Lee shows a genetic distance of 29 in a 67-marker test, clearly an indication of
no relationship. Therefore, those in our little James group can state unequivocally
that we are completely unrelated to the outlaw Jesse James.
This Y-DNA surname project has been a worthwhile experience for the participants
in our little James group. Six of the participants have been able to validate their
research about their descent from Ancestral David. Although their research is not
yet complete, two other participants have learned that they either descend from Ancestral
David or, at the very least, share a common male James ancestor with Ancestral David.
Hence, they now have a genealogical target to aim for. The results of this project
have also allowed earlier suggestions to be disproved - that Ancestral David had
a brother and father also living in Chester County. There was an unhappy outcome
for a ninth participant who learned that he did not fit into this James group, but
at least he knows that he has a problem that he may be able to solve. Considerable
refinement has been made in characterizing the group‘s haplogroup, although considerably
more data will be required before our group‘s pre-historical ancestors can be further
defined. In addition, the average mutation rates for our group have been determined.
It was shown that our James group was not related to the outlaw Jesse James. Finally,
readers are reminded that caution should be exercised when comparing Y-DNA results
from different testing laboratories.
The logic arguments for Objectives 3 and 4 were developed by Robert D. McLaren,
and were part of his paper “Managing a Large Surname DNA Project, With Some Interesting
Results,” presented at the Annual Meeting of the National Genealogical Society held
in Virginia in May 2007. The author thanks Bob for letting him use his logic arguments.
Thanks are also due to Charles F. Kerchner, Jr. for providing the average mutation
rates for our James group. Finally, the author is also indebted to Susan (James)
Rosine, Warren‘s daughter, for providing the SNP and haplogroup information that
comprises Example 6, as well as several useful suggestions that improved this article.
This is an expanded written version of a talk delivered to the Olympia Genealogical
Society Spring Seminar held in Olympia, WA in 2008.
1. Thomas H. Roderick, PhD, “The Y Chromosome in Genealogical Research: ‘From
Their Ys a Father Knows His Own Son,’” National Genealogical Society Quarterly, Vol.88,
June 2000, pp. 122-143.
2. Although his origin in Radnorshire has not been definitively proved, it is
known that David James was Welsh because the inventory of his estate lists books
written in the Welsh language.
3. Robert also tested a seventeenth marker and had a value of 23 at DYS ¹ 635
(also called GATA-C4), a marker not tested for by the other James participants.
4. DNA Y-chromosome Segment. A nomenclature system established by international
consensus. The DYS is the “name” of each marker.
5. Although unusual, the observation of a mutation between brothers is not unheard
of. For example, Bennet C. Greenspan, the founder and President of FTDNA relates
how he, and his brother Jim differ by a value of one at DYS 385a (Bennett had the
mutation and passed it on to his son): Bennett Greenspan, “An Insider‘s Look at the
Genealogy DNA Field,” New England Ancestors, Vol. 5, ¹ 3, Summer 2003, pp. 21-23.
6. Brothers Wilbur and Robert employed different DNA testing laboratories, and
not all laboratories report results for DYS H4 on the same basis. More information
on this issue may be found at the National Institute of Standards & Technology website:
http://www.cstl.nist.gov/biotech/strbase/YSTRs/H4_nomenclature.htm . FTDNA (Wilbur‘s
test) measures GATA-H4, while Familybuilder (Robert‘s test) measures TAGA-H4 and,
when the appropriate correction was applied, both brothers had the same value (reported
herein as GATA-H4).
7. A difference of one unit at one marker is “genetic distance” of one. A difference
of one unit at each of two markers, or a difference of two units at one marker, is
a genetic distance of two, etc.
8. Haplotype. One person‘s set of values for the markers that have been tested.
Two individuals that match on all markers but one have two distinct haplotypes.
9. Chris also matches Warren and Lee 66/67, again a very close relationship.
10. Stephen Oppenheimer, The Origins of the British - A Genetic Detective Story,
New York, NY: Carroll & Graf Publishers, 2006.
11. Single Nucleotide Polymorphism (SNP). A change in the DNA that happens when
a single nucleotide (A, T, G, or C) in the genome sequence is altered.
12. A person who tests “positive” at L21 has a “G” (Guanine) where almost everyone
else has a “C” (Cytosine); i.e., a SNP.
13. International Society of Genetic Genealogy, http://www.isogg.org.
14. T. Whit Athey, PhD, “Mutation Rates - Who‘s Got the Right Values?,” Journal
of Genetic Genealogy, Vol. 3, ¹ 2, 2007, pp. i-iii.
16. Jesse James‘s mtDNA has also been characterized: see Anne C. Stone, James E.
Starrs, and Mark Stoneking,
“Mitochondrial DNA Analysis of the Presumptive Remains of Jesse James,” Journal
of Forensic Science, Vol. 46, ¹ 1, 2001, pp. 173-176.
Figure 2. Thirty-seven-marker results for the eight participants in this portion
of the James Surname Y-DNA Project.