How Software Errors Corrupt Our Trees

Infographic

Infographic

The article I wrote about major problems with new Dutch records on Ancestry.com sparked some great discussions both here and on several Facebook groups. People were especially appalled by the error that caused ‘Burgerlijke Stand’ (Civil Registration) to be included in the place name. But this isn’t the biggest error caused by software.

I found almost 1 million records for people who lived in Reusel-De Mierden, Noord-Brabant, Netherlands in Ancestry Member Trees, many going back to the 16th or 17th century. That is remarkable for a tiny municipality that has only existed since 1997! I estimate that at least 900,000 of these records should say “Holland” instead of Reusel.

Another error causes Dutch place names that had “NL” to be resolved to “Newfoundland and Labrador, Canada” instead of “Netherlands”. These errors are pretty obvious, since the  place name will say something like “Amsterdam, Noord-Holland, Newfoundland and Labrador, Canada.” I searched for the names of all Dutch provinces with Newfoundland behind it and found over 100,000 incorrect profiles.   

The error that has had the least impact so far, is the one where ‘Burgerlijke Stand’ (Civil Registration) is included in the place name in the index and found its way to member trees from there. I found about 2,300 people with birth places with ‘Burgerlijke stand’ in it so far. But this record set has only been online at Ancestry.com since earlier this week so unfortunately this error has some serious potential for growth.

I chose to check Ancestry Member Trees because that is the largest collection of family trees that I know of. This isn’t meant to imply that the errors originated with Ancestry.com. But the friendly way in which Ancestry lets you accept hints from other trees probably caused these errors to spread fast once an error made its way to a Member Tree.

It’s easy to see how people would be tempted to replace a generic place of origin like ‘Holland’ by the specific town in Noord-Brabant where the immigrant ancestor came from. A place name as specific as ‘Reusel-De Mierden, Noord-Brabant, Netherlands’ doesn’t appear like somebody made it up…

Check your trees!

I advise you all to take a moment to check your trees for any of these errors. Please leave a comment below if you find any. You won’t be alone, I promise!

Embed the infographic on your own site

Feel free to share the infographic on your own blog, website or social media profile. A link back is appreciated but not required.

If you want to embed the infographic on your own site, just copy and paste the following code into the HTML-view of your blog:

<a href="http://www.dutchgenealogy.nl/corrupt-trees"/><img src="https://s3.amazonaws.com/easel.ly/all_easels/288793/CorruptDutchfamilytrees/image.jpg" alt="How software errors corrupt our trees" title="How software errors corrupt our trees, by Dutchgenealogy.nl" /></a>

The infographic is available under a Creative Commons Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0) license, which means you may copy and redistribute the infographic in any medium or format for any purpose, even commercially, as long as you give me the credits and don’t change the image.

About Yvette Hoitink

Yvette Hoitink is a professional genealogist in the Netherlands. She has been doing genealogy for 20 years. Her expertise is helping people from across the world find their ancestors in the Netherlands. Read about Yvette's professional genealogy services.

Comments

  1. Virgil Hoftiezer says:

    That is so bad, but may explain some of the major errors we see in some trees on Ancestry.com.
    Unfortunately it is impossible to remove these errors — once on the internet it must be absolutely true.
    Thanks for alerting us to these horrible errors.
    Virgil

  2. Good job Yvette!
    And for all researchers. ….please don’t copy data just like that…..check and double check before you do!
    Irma

  3. Yvette, I disagree. It’s not software (errors) that corrupt trees, it’s the user!

    If software automatically changes your data, than you would have a point. But in the case of the hints, it’s the user that pushes the acknowledge button without checking / correcting the data. The software functionality here is not to blame, it’s not in error. The hint might be incorrect because the software uses an incomplete geographical dataset, I hope users will help getting these datasets complete, for example via the Wikipedia like http://www.geonames.org.

    The ‘Burgerlijke Stand’ issue has another dimension to it. You can blame FamilySearch, as their dataset contains these (rather easy to spot!) errors. But Ancestry (and MyHeritage as they have also partnered with FamilySearch), should they check and correct the data? I think they (and again, also the users) should inform FamilySearch, so the source gets corrected, all parties using these datasets then also get corrected data.

    Finally, I’d like to repeat what Irma says: researches, always check and double check information you find and use, be it from an archive or library, offline or online publication.

    Bob

    • Hi Bob,

      Thanks for stopping by.

      The problem with Reusel-De Mierden is that it is the software that automatically replaced all references to ‘Holland’ with ‘Reusel-De Mierden, Noord-Brabant, Netherlands’ if you chose to geocode the place names. At least, that is what Family Tree Maker used to do. Users were not presented with the new names and asked if they wanted to accept it if they chose batch mode, the program just changed all the instances automatically.

      Same with “, NL” being replaced by Newfoundland and Labrador, Canada. I think that was a global search-and-replace gone horribly wrong at the index level. You can debate whether or not that is a software problem or a human error, but that error wasn’t made by the compiler of the tree but by the compiler of the data (or one of their database administrators). With ‘Burgerlijke Stand,’ I don’t know if it was a database design error, conversion error or data entry error but whichever it was, the resulting database isn’t normalized so information that should be in two separate fields are combined into one. Once again, that is the raw material that the users are presented with. And yes, people should be more careful about what they accept, but how is an American to know that there is no village ‘Burgerlijke Stand’ in Arnhem?

      As I tried to explain in the text underneath the infographic, I did not mean to blame Ancestry.com for these errors. Users blindly accepting hints aggravate the problem, but are not the root cause. They do make them more visible.

  4. If it wasn’t so sad one could laugh about these boo-boos. But I am grateful for the use of your infographic. Thanks!

  5. I have also re-posted the infographic in my blog. As Peter said, if it wasn’t so sad, it would be funny. And it is one reason, why I don’t put up my tree at Ancestry

Leave comment

*