On 7 October 2014, a group of genealogy technologists gathered in Leiden, The Netherlands, for the first Gaenovium conference. Although small with around 25 delegates, it was certainly forward looking and shows promise of things to come. It seems fitting that open data and open standards for genealogy have been expounded in the city whose symbol is a pair of crossed keys.
Unlocking the full potential of historical documents requires:
- practical, convenient and non-discriminatory access, or the researcher’s work can’t even get started
- un-restricted use, re-use and re-combination of data, so the researcher is free to follow any line of enquiry and can freely collaborate with others
The principles promoted by open movements such as Open Definition have found support in the academic and cultural domains. Gaenovium attendees included representatives of universities and commercial digitisation and archival management companies, which all exploit open data to their advantage. Independent developers, genealogy organisations from the Netherlands, Nederlandse Genealogie Vereniging, and Centraal Bureau voor Genealogie, and Verein fur Computer-genealogie e.V. from Germany, accounted for most other delegates.
Generally historical data were not collected for the purpose of genealogy. Genealogists are masters of reusing and combining data, but sometimes forget that the data may also be used for other kinds of research. Marijn Schraagen of Leiden University spoke about algorithms for name matching, which has applicability beyond genealogy. He compared new and established algorithms for efficient use of computing resources and scalability as well as functional capability. He commented that a new algorithm may not be better at matching names, but might do so more quickly. Over dinner, an attendee from Utrech University described using compiled genealogies to investigate human life spans.
Digitisation and archive management companies Picturae, Mindbus, and DE REE archiefsystemen were represented. Dutch cadastral maps on HISGIS, WieWasWie and Archieven.nl are examples of their collaborative work that are well worth exploring. I am guilty of a common sin committed by native English speakers. I often pass over resources that are not in English, and just look what I missed!
Open data advocate, Bob Coret convincingly demonstrated Open Archives, a platform that combines data from several Dutch heritage institutions. Use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) in a genealogical context highlights the connection between archives and genealogy. The majority of genealogical sources are original documents housed in an archive, or some derivative such as microfilm or digital image.
A discussion of genealogy data standards would be incomplete without mention of GEDCOM. Louis Kessler’s Reading wrong GEDCOM right set out pragmatic best practices for overcoming an imperfect and poorly implemented standard.
The panel discussion, mediated by Bob Coret, with Louis Kessler, myself and Phil Moir of D C Thompson Family History (aka findmypast, GenesReunited etc.) examined the way forward. The newly re-invigorated Family History Information Standards Organisation (FHISO) has now started to develop a new standard.
So, how did the big four genealogy companies appear at Gaenovium? FamilySearch were roundly criticised for their failure to engage and co-operate with others in standards development. Although I appreciate the records they make available, I find myself unable to defend them. They sent no representative, so remain disengaged. Ancestry also did not attend, and were not even mentioned. Even though I disagreed with Phil Moir of D C Thompson Family History during the panel discussion, I appreciated his presence. I hope the feedback helps the company to serve its customers better. My Heritage demonstrated their engagement with innovation the genealogy industry by sponsoring the conference. In addition, they sent two delegates who actively showed interest in the opinions of others.
© Sue Adams 2014
When I first got my hands on this photograph it had no helpful annotation on the back to tell me who the people were. The photo is part of my grandmother’s collection, so I could recognise her and my grandfather. As they died many years ago, I could not ask them to identify the other people. Three people of the next generation, the baby in his mother’s arms, the toddler on her grandmother’s knee and the child bridesmaid are still alive and all recognise the groom as a favourite: Uncle Albert.
The marriage certificate provides the date, location and names three people present at the wedding in addition to the couple:
Bride: Margaret Canning
Groom: Albert Adams
Date: 12 April 1941
Location: St Margaret’s Church, Ward End, Birmingham
Father of Bride: Arthur Solomon Canning
Father of Groom: Thomas Henry Adams (deceased)
Witnesses: A S Canning, J Adams
Although some of the people in this picture have been identified, others have not. First, I want to do the electronic equivalent of writing on the back, and second, I want to share the photo with relatives and let them add to the annotation.
Annotating Digital Images – Metadata
It is important that my annotations are embedded in the image file and that they are not lost when the file is copied or edited. Two commonly used file types that support embedded text are jpeg and tiff, which also support information like camera settings, date and time of creation, and copyright. Rather than describe the positions of people, which can get cumbersome, I want to point to a face and label it with the person’s name.
Digital cameras, social media and image processing software now commonly boast ‘face recognition’ capabilities. Two processes are often conflated when people talk about face recognition. The first is the ability to determine that a face is part of the picture, rather than some other object with similar dimensions (e.g. a ball, balloon). This problem has been solved and successfully implemented in many cameras and software, which identify the part of the picture containing a face and highlight the region in a rectangle.
The second problem, the ability to compare two faces and determine whether the same person is depicted, is much more complex and difficult. Automatic comparison and identification requires multiple images to train the software to recognise a person. The training is done by a person. People are talented at recognising other people, computers aren’t.
All the embedded information, the file’s metadata, is the needed for the person labelling functions I want to work.
Picasa and Photoshop Elements – Metadata compatibility
Two image processing programs, Picasa 3.9 (free) and Photoshop Elements 8 (came bundled with some hardware) installed on my computer, are both capable of identifying face regions and labelling them. However, faces labelled in one program are not recognised by the other.
There are many ways labels, tags and definition of face regions can be implemented by software, so programs have developed a variety of different solutions. Incompatibility between programs is a consequence. Consumer dissatisfaction prompted a consortium of digital media companies, The Metadata Working Group, to publish technical guidelines in November 2010, aimed at overcoming the incompatibilities.
Photoshop Elements 8, released in 2009, does not seem to store face regions in the image file. Photoshop Elements is now on version 11, so it might have implemented the metadata guidelines. Picasa 3.9, the current version, does store face regions in the file metadata, but they are not recognised by Photoshop 8.
Face regions are stored separately from tags. Tags are widely used to facilitate searching files containing tag labels. For example, photos depicting Albert tagged as ‘Albert Adams’, can be found from my operating system or image software. Photoshop created tags as I labelled face regions, but Picasa did not. It turns out I want both.
For now, I prefer using Picasa for naming people as it is more user-friendly, but use Photoshop for other image editing tasks.
Sharing and online collaboration
I would like to share this photo online in a way that allows fellow genealogists or relatives to tag the unidentified people.
Social media sites such as Facebook and Google+ have face region and tagging capabilities. However, only people with whom you are associated on the website can be tagged for reasons of online privacy and social etiquette. Most of the people in this photo are long dead and certainly not on social media!
Picasa has a facility to upload photos, which is in transition from the old ‘Picasa Web Albums’ to ‘Google + Photos’. I uploaded the photo and viewed it online, but am not sure which service was in operation when I could see this in my browser:
So, dear relatives, can you identify any of the people not yet tagged?
© Sue Adams 2013