Hierarchies Help Source Organisation, Analysis and CitationPosted: 21 Mar 2015
Jill Ball’s recent hangout on air entitled Let’s get organised caused me pause and think. The question of how to organise the physical and digital ‘stuff’ we accumulate during genealogical research is a common one that elicits a wide variety of responses. This discussion revolved mainly around digital files.
The panellists broadly follow two patterns of file organisation, person oriented and source oriented. Person oriented systems typically arrange files in folders for surnames and individuals. Source oriented systems typically arrange files in folders for each source type. Some people also have place and project folders.
Retrieval strategies include using file naming conventions, tagging, assignment of unique file ids, indexes and spreadsheets. Panellists use Family Historian, Custodian, Evernote, Excel and coggle.it to help them keep track of the ‘stuff’.
In Provenance of a Personal Collection – Archival Accession, Arrangement and Description, I advocated recording source information in a hierarchical archival style catalogue. Archival catalogues typically arrange source items by the provenance and context of their creation and use, which is reflected in multiple levels of logical organisation. Storage is not necessarily the same as logical organisation.
Are Hierarchies Hard?
Genealogists create family trees. A family tree is a hierarchical branching structure where layers represent generations of ancestors. So, genealogists would readily adopt a source hierarchy, wouldn’t they? The discussion made it pretty plain that is not the case.
Jill advocates a flat digital file structure. Among the panellists who use a hierarchy of physical or digital folders, the number of levels is restricted to no more than 2 or 3.
Why do people find hierarchies difficult?
Navigation through hierarchy levels is hard to get your head around. I have been looking for a tool that helps means me draw hierarchical trees and visualise my catalogue structure. Thanks to Alex Daws’ suggestion I tried the mind mapping tool, coggle.it. My genealogical ‘stuff’ falls into the categories depicted:
For the interactive version follow this link:https://coggle.it/diagram/550aa664a7d032c23734a105/7e12c7b8433ffe43f86b5994f61abf9977f826ac312616349825bbda313db27a
I have included a selection of top level categories and expanded a few of them. On the right are the things I acquired from family and collaborators, and my personal documents. On the left are the things acquired from physical and digital archives.
The personal collections are organised by their provenance, the person from whom the items came. This visual representation makes it easy for me to see I have omitted a helpful relative, Pat (how ungrateful am I!). I have expanded the part Raymond’s collection which I discussed in Provenance of a Personal Collection. The sub-categories reflect the use (e.g. probate) and history (e.g. belonged to Winnie) of the items.
Complete collections can be organised without taking account of possible future additions. The branches colour coded in yellow are complete. Raymond and Mabel are deceased, and personal study projects relate to courses completed.
The personal collection labelled Sue is my own. It includes collections that were created by my own life, the types of things discussed in Fresh Starts, my genealogy business records named Family Folk, and the results of my research such as blog posts. The category named Genealogical Research Collection is my personal sin bin. It’s arrangement reflects my early attempts to organise things, and indirectly documents my development as a researcher. Rather than rearrange things I have documented the existing arrangements.
Digital and Physical Archives
The left side of the source tree depicts my understanding of the arrangement of things I accessed through archives. I have expanded the top levels for just one record, the marriage of Joseph Wilson and Elizabeth Wilson at Claverley in 1808, that I discussed in Three Wilson-Wilson marriages and the Family History Library Experience.
The original marriage register is held by Shropshire Archives and the archive catalogue entry includes the hierarchy that shows how the marriage register fits into the archive’s collections:
Notice that the top level is missing from the archival catalogue. Parishes are collected together in a group denoted by P or XP, but there is no catalogue entry for this group. Many archive catalogues could be made more user friendly by the inclusion of top level groups and a visual interface. This catalogue entry also refers to the microfiche copy of the registers.
I have followed the archive catalogue in my source tree, but added in the missing parish level and separated out the microfiche version. The Family History Library transcript and microfilm are arranged by call number and film number, a peculiarity of that institution.
Digital archives typically consist of an index or database that may reference a collection of digital images. Database entries are accessed via search functions. The arrangement of digital image collection is similar, but not identical, to the arrangement of physical archive in this case. Some digital image collections differ substantially from their physical counterparts.
In addition to the original, there are 6 different versions of the marriage record. They were derived from the original either directly or indirectly via several different copying processes, but that is hard to show on my source tree.
Citations and Source Identification
Traditionally many academic disciplines cited published and unpublished works in the form of a bibliographic citation, but only included the data they collected in summary form. In many disciplines it is now recognised that the academic paper alone is no longer sufficient and the underlying data also needs to be shared. How to Cite Datasets and Link to Publications explores the issues and makes proposals for scientific data sets. Citing genealogical sources is more similar to citing scientific data than to citing finished works.
Genealogists typically want to pin point a single record or piece of data within a data set. For the marriage example the following locate the record within source items:
|Original||page & record number|
|Microfiche||counted row and column numbers, record number|
|Transcript||page number, record number|
|Microfilm||Item number, counted image number, record number|
|Digital image||browsing breadcrumb, image number|
|Database entries||search terms|
Genealogists need to know exactly which source item was used, because they differ in accuracy and reliability. My source tree distinguishes between the 7 source items, but does not make the relationships between them clear. Here is how I think each was derived:
Copying and processing potentially produces errors, so genealogists need to check against originals if possible. In the marriage example, I used the FamilySearch database to find the transcript and then checked the transcript against the microfiche copy of the original, because they were available at the time. Now I would use the high quality digital image that has since been published. The archive quite rightly restricts access to the original so that it is preserved.
The complicated-looking citations in Evidence Explained identify the source, the equivalent of my source tree. Multi-level citations, indicated by “citing”, give the relationships between derived versions and the original.
I have tackled some quite complex ideas in this post. I hope find some worth considering as your genealogy organisation systems develop. As Julie Goucher said, there is no one size fits all.
I thank Jill and all the panellists for challenging my assumptions, sharing their frustrations and confusion, and openly debating the issues. Conversations like this are valuable contributions that genealogy vendors and software developers need to hear. As a member of FHISO, I am listening.
© Sue Adams 2015