Jill Ball’s recent hangout on air entitled Let’s get organised caused me pause and think. The question of how to organise the physical and digital ‘stuff’ we accumulate during genealogical research is a common one that elicits a wide variety of responses. This discussion revolved mainly around digital files.
The panellists broadly follow two patterns of file organisation, person oriented and source oriented. Person oriented systems typically arrange files in folders for surnames and individuals. Source oriented systems typically arrange files in folders for each source type. Some people also have place and project folders.
Retrieval strategies include using file naming conventions, tagging, assignment of unique file ids, indexes and spreadsheets. Panellists use Family Historian, Custodian, Evernote, Excel and coggle.it to help them keep track of the ‘stuff’.
In Provenance of a Personal Collection – Archival Accession, Arrangement and Description, I advocated recording source information in a hierarchical archival style catalogue. Archival catalogues typically arrange source items by the provenance and context of their creation and use, which is reflected in multiple levels of logical organisation. Storage is not necessarily the same as logical organisation.
Are Hierarchies Hard?
Genealogists create family trees. A family tree is a hierarchical branching structure where layers represent generations of ancestors. So, genealogists would readily adopt a source hierarchy, wouldn’t they? The discussion made it pretty plain that is not the case.
Jill advocates a flat digital file structure. Among the panellists who use a hierarchy of physical or digital folders, the number of levels is restricted to no more than 2 or 3.
Why do people find hierarchies difficult?
Navigation through hierarchy levels is hard to get your head around. I have been looking for a tool that helps means me draw hierarchical trees and visualise my catalogue structure. Thanks to Alex Daws’ suggestion I tried the mind mapping tool, coggle.it. My genealogical ‘stuff’ falls into the categories depicted:
For the interactive version follow this link:https://coggle.it/diagram/550aa664a7d032c23734a105/7e12c7b8433ffe43f86b5994f61abf9977f826ac312616349825bbda313db27a
I have included a selection of top level categories and expanded a few of them. On the right are the things I acquired from family and collaborators, and my personal documents. On the left are the things acquired from physical and digital archives.
The personal collections are organised by their provenance, the person from whom the items came. This visual representation makes it easy for me to see I have omitted a helpful relative, Pat (how ungrateful am I!). I have expanded the part Raymond’s collection which I discussed in Provenance of a Personal Collection. The sub-categories reflect the use (e.g. probate) and history (e.g. belonged to Winnie) of the items.
Complete collections can be organised without taking account of possible future additions. The branches colour coded in yellow are complete. Raymond and Mabel are deceased, and personal study projects relate to courses completed.
The personal collection labelled Sue is my own. It includes collections that were created by my own life, the types of things discussed in Fresh Starts, my genealogy business records named Family Folk, and the results of my research such as blog posts. The category named Genealogical Research Collection is my personal sin bin. It’s arrangement reflects my early attempts to organise things, and indirectly documents my development as a researcher. Rather than rearrange things I have documented the existing arrangements.
Digital and Physical Archives
The left side of the source tree depicts my understanding of the arrangement of things I accessed through archives. I have expanded the top levels for just one record, the marriage of Joseph Wilson and Elizabeth Wilson at Claverley in 1808, that I discussed in Three Wilson-Wilson marriages and the Family History Library Experience.
The original marriage register is held by Shropshire Archives and the archive catalogue entry includes the hierarchy that shows how the marriage register fits into the archive’s collections:
Notice that the top level is missing from the archival catalogue. Parishes are collected together in a group denoted by P or XP, but there is no catalogue entry for this group. Many archive catalogues could be made more user friendly by the inclusion of top level groups and a visual interface. This catalogue entry also refers to the microfiche copy of the registers.
I have followed the archive catalogue in my source tree, but added in the missing parish level and separated out the microfiche version. The Family History Library transcript and microfilm are arranged by call number and film number, a peculiarity of that institution.
Digital archives typically consist of an index or database that may reference a collection of digital images. Database entries are accessed via search functions. The arrangement of digital image collection is similar, but not identical, to the arrangement of physical archive in this case. Some digital image collections differ substantially from their physical counterparts.
In addition to the original, there are 6 different versions of the marriage record. They were derived from the original either directly or indirectly via several different copying processes, but that is hard to show on my source tree.
Citations and Source Identification
Traditionally many academic disciplines cited published and unpublished works in the form of a bibliographic citation, but only included the data they collected in summary form. In many disciplines it is now recognised that the academic paper alone is no longer sufficient and the underlying data also needs to be shared. How to Cite Datasets and Link to Publications explores the issues and makes proposals for scientific data sets. Citing genealogical sources is more similar to citing scientific data than to citing finished works.
Genealogists typically want to pin point a single record or piece of data within a data set. For the marriage example the following locate the record within source items:
|Original||page & record number|
|Microfiche||counted row and column numbers, record number|
|Transcript||page number, record number|
|Microfilm||Item number, counted image number, record number|
|Digital image||browsing breadcrumb, image number|
|Database entries||search terms|
Genealogists need to know exactly which source item was used, because they differ in accuracy and reliability. My source tree distinguishes between the 7 source items, but does not make the relationships between them clear. Here is how I think each was derived:
Copying and processing potentially produces errors, so genealogists need to check against originals if possible. In the marriage example, I used the FamilySearch database to find the transcript and then checked the transcript against the microfiche copy of the original, because they were available at the time. Now I would use the high quality digital image that has since been published. The archive quite rightly restricts access to the original so that it is preserved.
The complicated-looking citations in Evidence Explained identify the source, the equivalent of my source tree. Multi-level citations, indicated by “citing”, give the relationships between derived versions and the original.
I have tackled some quite complex ideas in this post. I hope find some worth considering as your genealogy organisation systems develop. As Julie Goucher said, there is no one size fits all.
I thank Jill and all the panellists for challenging my assumptions, sharing their frustrations and confusion, and openly debating the issues. Conversations like this are valuable contributions that genealogy vendors and software developers need to hear. As a member of FHISO, I am listening.
© Sue Adams 2015
During this week’s Hangout-on-air, I publically criticised Evidence Explained, widely regarded as an essential reference to genealogical citation. In particular, I find the examples for census and civil registration records in the United Kingdom confusing.
In this post I will examine one census example, a first reference note on page 304:
“1871 England Census”, database, Ancestry.com (www.ancestry.com : accessed 1 September 2006), entry for George Lucas (age 33), Bromley St Leonard, London; citing PRO RG 10/571, folio 27, p. 3; Poplar registration district; Bow subdistrict, ED 14, household 9.
The original of this record is held at The National Archives, Kew, and its catalogue entries are arranged in several levels :
|Level||Reference||Title, Creator, Date|
|1||RG||Records of the General Register Office, Government Social Survey Department, and Office of Population Censuses and Surveys|
|2||RG 10||General Register Office: 1871 Census Returns
Creator: General Register Office, 1836-1970
Date: 1871 April 2
|3||Subseries within RG 10 – LONDON – MIDDLESEX|
|4||Sub-subseries within RG 10 – Registration District 20.POPLAR|
|5||RG 10/571||Registration Sub-District 1C Bow.
Civil Parish, Township or Place: Bromley St Leonard (4)
There is extensive documentation of the parliamentary Acts, instructions, forms and resulting publications at Histpop Online Historical Population Reports. The RG 10 subseries and sub-subseries follow the order of registration districts in the Registrar General’s Annual Reports. Sub-districts were further divided into Enumeration Districts, an area that could be covered by one enumerator. The enumerator collected household schedules (which generally were not preserved) and used them to fill in the Census Enumerators Books (CEB). The census record we have here is a page from a CEB. There could be one or more CEBs for each Enumeration District, and several EDs for each Registration sub-district.
Before the CEBs were microfilmed, each sheet of paper or folio was stamped on the top right corner of the front side, starting with no 1 on the first page of the first CEB and continuing the sequence through subsequent CEBs. Consequently, the combination of folio number and page number uniquely identifies each page within a sub-district.
To cite the original page following Thomas Jones “who; what; when; where in; where is.” format:
General Register Office; 1871 Census, England & Wales, Census Enumerators Book; 2 April 1871; Entry for George Lucas, line 1 [counted], schedule no. 9, folio 27 [stamped], p. 3, Bromley St Leonard, Enumeration district 14; Registration Sub-District 1C Bow, Registration District 20. Poplar, London – Middlesex, 1871 Census Returns, Records of the General Register Office, The National Archives, Kew.
The thing that is missing from the above is an archival reference, also known as a call number. The reference elements are included, but are scattered. The National Archives type of reference is a well established convention that is widely understood. If you were permitted access to the original, you would quote the reference RG 10/571 for the bundle of CEBs and RG 10/571/27/3 for the page, because that reflects the current archival arrangement and makes it easy for archive staff to retrieve.
Now let’s take a look at the Ancestry version of this record. Ancestry often re-arranges records because the website deals with digital images rather that the original thing. Treating each image as a separate ‘thing’ within a series is sensible because each image is a separate file. This is different from the original CEB or bundle of CEBs that comprise RG 10/571. Ancestry’s card catalogue splits the census by year and country into separate collections. Within a collection, the breadcrumb trail shown above the image reveals the arrangement, the last element is the image number at the bottom of the image. A complication for this example is that the digital image was derived from microfilm, of which I have no details.
Citation of the Ancestry copy requires a layered citation, which gives details of both the digital image and original:
General Register Office; 1871 Census, England & Wales, Census Enumerators Book; 2 April 1871; Entry for George Lucas, line 1 [counted], schedule no. 9, folio 27 [stamped], p. 3, Bromley St Leonard, Enumeration district 14; Registration Sub-District 1C Bow, Registration District 20. Poplar, London – Middlesex, 1871 Census Returns, Records of the General Register Office, The National Archives, Kew; digital image from microfilm, Ancestry, “1871 England Census”, database, Ancestry (www.ancestry.co.uk : accessed 27 March 2014), London, Bromley St Leonard, District 14, image 4.
I much prefer this citation to the Evidence Explained version. With census records I more interested in “what” rather than “who”, so I might change the order of citation elements or drop the creator. The title in the form ‘Census, 1871, England & Wales, CEB’ would make all my census records appear together in a source list. I could omit the first ‘Ancestry’ and abbreviate common terms. I did not give the full URL that takes you directly to the record because that could change, and it is rather long.
How would you cite this record?
I have previously pondered citations. What do you make of these examples:
Copies of Copies, Citation and Source Evaluation with FamilySearch
Citation and Verification or ‘Where the hell did I get this from?’
Thomas W. Jones, Mastering Genealogical Proof (Arlington, Virginia: National Genealogical Society, 2013)
Elizabeth Shown Mills, Evidence Explained. Citing History Sources from Artifacts to Cyberspace. (Baltimore, Maryland: Genealogical Publishing Company, 2007)
© Sue Adams 2014
Bride: Tryphena Bull
Groom: James Valentine Wellstood
Date: 16 November 1874
This information came from some sporadic correspondence, between 2005 and 2008, with a contact made through Genes Reunited. According to this source, Tryphena was the daughter of John Bull (a grocer) and Mary Osborn of Brackley, Northamptonshire, making her my great grand aunt and my mum’s grand aunt. Is this true?
A quick check at FreeBMD confirms that a marriage did occur between these two people (Oct-Dec 1874 quarter, Pancras registration district, vol 1a, page 46). Pancras is in London, so there does not seem to be a connection to Brackley. From Typhena’s age on census records (1881-1911) after her marriage, I conclude the likely a date of birth is 1843, with a range between 1843 and 1846. These records contain no information about her parentage, but earlier censuses should. I found 3 Tryphena Bulls in 1871. One was married and one was too young, leaving just one possible Tryphena, born ca 1848 in Northamptonshire, a servant to George Knapp in St Clement, Oxford. Well that is closer to Brackley, but still provides no information on Tryphena’s parentage. The 1861 census does contain evidence that Mary Bull, a retired grocer’s wife, was Tryphena’s mother, but it may not be simple to find this record.
A search on Ancestry gives this result:
What does Tryphena Robbins [Tryphena Bull] mean? Ancestry users can submit corrections and alternative interpretations that may be added to the index. Such additions are enclosed in square brackets. In this case, the addition is helpful and made it easy to find this record. The same search on findmypast yielded no results.
According to the transcript (follow the ‘View Record’ link) this household contains the following people:
|Mary Bull Robbins||47|
So, how does this compare with the census page?
The transcript incorrectly combines two households. The ‘No of schedule’ and ‘Inhabited house’ entries and building delimiters indicate that Mary Bull’s household is in a separate building from William Robbins’ household. The ‘Do’ next to the name Mary Bull indicates repetition of the surname above, so the index rendering of the name as Mary Bull Robbins is strictly correct. However, I believe this an error made by the census enumerator as dittos are usually restricted to within a household. It is important that indexes accurately reflect the original records including errors, but interpretive additions are useful.
Given the index issues with this census household, how should I cite the record so that you do not have to rely on searching an index, can find an image on an alternative website, microfilm and the original?
Census. 1861. England, Northamptonshire, Brackley St Peter, Enumeration District 3, image no 32; schedule 214. Digital image. Ancestry (www.ancestry.co.uk : accessed 14 November 2012); citing The National Archives, Kew, RG 9/921, folio 51, p. 31.
The county, civil parish, enumeration district and image number is the information needed to access census pages by browsing the census collection in Ancestry’s Card Catalogue.
The National Archives reference reflects the archival arrangement of the original records at Kew. It is used for microfilm copies, so many citations from the pre-internet era are in this form. Findmypast’s Census Reference Search uses the TNA reference, providing a direct route to the census page.
I like Ancestry’s browsing capabilities, because it allows me to easily examine the context and sometimes no index can make sense of the records. I prefer Findmypast’s reference search because it is consistent with well established British citation practice and quicker. I would like all of the elements of the citation to be embedded in the digital images I download in a form that could be read by my genealogy software. That would make keeping track of my sources much easier.