Claverley Property Document Analysis, Part 2: Semantic mark-up

In Claverley Property Document Analysis, Part 1: Transcript I introduced the manorial court records of a property transaction and presented a transcript of the court session introduction and the two cases that dealt with the legal processes. Now we have a readable copy, the next step is to extract the genealogical information. I set some homework for readers to check if they were getting all the information contained in the transcript. My answers are:

  1. How many people are mentioned? 12 plus Queen Victoria in a regnal date
  2. How many places are referred to? 7 particular locations are referred to, with differing accuracy: some very specifically e.g. a house, others within a farmstead, hamlet or town. Assuming Heathton and Sleathon are the same, there are 11 place names including names of jurisdictions like parish, manor and county.
  3. Who lived at Catstree? Samuel Nicholls and Sarah Ward Nicholls
  4. How many ‘facts’ (e.g. John Wilson was a Farmer on 25th April 1844) are contained in this transcript?  I came up with 88. You might have different ideas.  Read on.

When you, a human, read a piece of text, you recognise names, places, dates and relationships without much difficulty. A computer sees the transcript as a string of characters, mostly letters with a few numbers and punctuation marks. Before it can help you organise and analyse the information, you need to tell it what is a person’s name, a place name and so on. One method of doing this is semantic mark-up.

Tony Proctor of Parallax Viewpoint has been working on this problem for some time, so I challenged him to show me how he would code such mark-up. His blog post Claverley Property Document Transcript explains the internal representation giving detailed code. The non-technical will likely glaze over when confronted with computer code. I have to admit doing so with some of Tony’s earlier posts, but seeing it applied to my own example has greatly helped my understanding. He commented that a fully developed tool using this approach would look like a fancy word processor.

The court session introduction, marked up with semantic tags, colour coded as red for people’ s names, green for place names, blue for dates, orange for occupation/rank, manorial legalese in purple and property description shaded in grey, might look like this:

Claverley manor Court Baron Session

Claverley manor Court Baron Session

This is a basic level of coding, which could be refined further. For example, I have coded small places as pale green and larger places as darker green. Names could be similarly divided into forenames and surname, dates into day, month and year and so on. I did not code the reference to Queen Victoria as a person because this an integral part of a regnal date. Unlike the other people mentioned she played no part in the proceedings and might otherwise appear to be directly involved in nearly every legal matter!

So far, this mark-up does not include relationships or actions. These links are harder to represent visually, so I have resorted to using a table.

Person/ Event Relationship/ Action Object Relationship to date Date
A Manor existed at Claverley on 25th April 1844
Court Baron was held at Kings Arms on 25th April 1844
Court Baron was held in Claverley on 25th April 1844
Thomas Whitman was Lord of Claverley manor on 25th April 1844
John Crowther resided at Kings Arms on 25th April 1844
John Crowther resided in Claverley on 25th April 1844
Francis Harrison was deputy Steward on 25th April 1844
Christopher Gabert was a copyholder on 25th April 1844
Edward Crowther was a copyholder on 25th April 1844
Francis Harrison attended Claverley Court Baron on 25th April 1844
Christopher Gabert attended Claverley Court Baron on 25th April 1844
Edward Crowther attended Claverley Court Baron on 25th April 1844

Now the same treatment for the two cases:

Claverley manor Case x

Claverley manor Case x

Person/ Event Relationship/ Action Object Relationship to date Date
John Wilson attended Claverley Court Baron on 25th April 1844
Samuel Nicholls attended Claverley Court Baron on 25th April 1844
John Wilson resided at Aston on 25th April 1844
John Wilson resided in Claverley manor on 25th April 1844
John Wilson was a Farmer on 25th April 1844
Samuel Nicholls formerly resided at Catstree on 25th April 1844
Samuel Nicholls formerly resided in Worfield parish on 25th April 1844
Samuel Nicholls resided at Bridgnorth on 25th April 1844
Samuel Nicholls resided in Salop county on 25th April 1844
Samuel Nicholls was a Gentleman on 25th April 1844
Samuel Nicholls was a Devisee in trust to John Felton on 25th April 1844
John Wilson was a Devisee in trust to John Felton on 25th April 1844
John Felton resided at Draycott before 25th April 1844
John Felton resided in Claverley manor before 25th April 1844
John Felton heretofore resided at Hopstone before 25th April 1844
John Felton was a Yeoman before 25th April 1844
John Felton died before 25th April 1844
John Felton was a copyholder before 25th April 1844
Sarah Ward Nicholls paid three hundred and fifteen pounds seven shillings on or before 25th April 1844
Sarah Ward Nicholls was a Spinster on 25th April 1844
Sarah Ward Nicholls paid to John Wilson and Samuel Nicholls on or before 25th April 1844
Sarah Ward Nicholls purchased [property description] on 25th April 1844
Sarah Ward Nicholls resided at Catstree on 25th April 1844
Sarah Ward Nicholls resided in Worfield parish on 25th April 1844
Sarah Ward Nicholls resided in Salop county on 25th April 1844
[property] was called Mill Hill on 25th April 1844
[property] was in Sleathton township on 25th April 1844
[property] was in Salop county on 25th April 1844
[property] consisted of piece or parcel of land and all that newly erected messuage or dwelling house and outbuildings on the same piece of land or some  part thereof with the appurtenances on 25th April 1844
Grosvenors formerly owned [property description] before 25th April 1844
Onions previously owned [property description] before 25th April 1844
John Felton formerly occupied [property description] before 25th April 1844
William Ferrington occupied [property description] on 25th April 1844
[property] measured three acres one rood and sixteen perches or thereabouts being by computation the half of one third part of a nook of land on 25th April 1844
John Wilson surrendered [property description] to Lord of Claverley manor on 25th April 1844
Samuel Nicholls surrendered [property description] to Lord of Claverley manor on 25th April 1844
John Wilson surrendered [property description] to the use of Sarah Ward Nicholls on 25th April 1844
Samuel Nicholls surrendered [property description] to the use of Sarah Ward Nicholls on 25th April 1844
Claverley manor Case y

Claverley manor Case y

Person/ Event Relationship/ Action Object Relationship to date Date
Sarah Ward Nicholls resided at Catstree on 25th April 1844
Sarah Ward Nicholls resided in Worfield parish on 25th April 1844
Sarah Ward Nicholls resided in Salop county on 25th April 1844
Sarah Ward Nicholls was a Spinster on 25th April 1844
Sarah Ward Nicholls attended because of a surrender to her use on or before 25th April 1844
John Wilson surrendered [property] to the use of Sarah Ward Nicholls on or before 25th April 1844
Samuel Nicholls surrendered [property] to the use of Sarah Ward Nicholls on or before 25th April 1844
John Wilson resided at Aston on 25th April 1844
John Wilson resided in Claverley manor on 25th April 1844
John Wilson was a Farmer on 25th April 1844
Samuel Nicholls resided at Catstree before 25th April 1844
Samuel Nicholls resided in Bridgnorth on 25th April 1844
Samuel Nicholls resided in Salop county on 25th April 1844
Samuel Nicholls was a Gentleman on 25th April 1844
John Wilson was a Devisee in trust to John Felton on 25th April 1844
Samuel Nicholls was a Devisee in trust to John Felton on 25th April 1844
John Felton formerly resided at Hopstone before 25th April 1844
John Felton resided at Draycott before 25th April 1844
John Felton resided in Claverley manor before 25th April 1844
John Felton was a Yeoman before 25th April 1844
John Felton was a copyholder before 25th April 1844
John Felton died before 25th April 1844
John Felton named devisees in trust in a will and testament before 25th April 1844
Sarah Ward Nicholls desires to be admitted tenant to the Lord of this manor according to the custom of this manor of [property description] on 25th April 1844
Grosvenors formerly owned [property description] before 25th April 1844
Onions owned [property description] before 25th April 1844
[property] was called Mill Hill on 25th April 1844
[property] consisted of piece or parcel of land and all that newly erected messuage or dwelling house and outbuildings on the same piece of land or some part thereof with the appurtenances on 25th April 1844
[property] was situated in Heathton township on 25th April 1844
[property] was situated in Claverley manor on 25th April 1844
[property] was situated in Salop county on 25th April 1844
[property] measured three acres one rood and sixteen perches or thereabouts being by computation the half of one third part of a nook of land on 25th April 1844
Sarah Ward Nicholls was admitted tenant to [property description] on 25th April 1844
Lord of Claverley manor by his deputy Steward granted seizin for ever at the will of the Lord according to the custom of this manor by the rents and customary services therefore due and of right accustomed and for such estate and ingress[?] to Sarah Ward Nicholls on 25th April 1844
Sarah Ward Nicholls paid the Lord a fine of six pence half penny and four sixth parts of a farthing on 25th April 1844
Sarah Ward Nicholls was admitted tenant of Manor of Claverley on 25th April 1844
Sarah Ward Nicholls did fealty to the Lord on 25th April 1844

Traditional person based genealogy programs typically allow ‘facts’ to be entered for each person, but do not give easy access to events common to several people. The ‘facts’ typically have fields for the category, date, place and a general description. Some of the table entries above can be shoe-horned into that structure, but do not fit well.

Whilst it is useful to view all information about a person in a time line or other summary, I often want to know about several people at common place and time. For instance, the question “Who attended the Court Baron session of 25 April 1844?” can easily be extracted from the combined tables. I summarised the phrases “Came to this court” and “in his/her/their own proper person(s)” as “attended”. Court officials such as the deputy steward and homage (the two copyholders present as a jury) also clearly attended. In summarising the act of court attendance, I have taken one step of analysis. I am no longer dealing with a faithful copy like a transcript, but have started to interpret the document in the light of historical and legal context.

I found it hard to summarise legal jargon and the property description. Legal terms like “by the rod”, “heirs and assigns forever” and “granted seizin” convey information about the legal process and the type of land tenure.

Where there is a sequence of residence, occupation or ownership, I found it hard to express the relationship to the known date of the court session without including time information in both the 4th and 2nd columns. Also, the references to prior owners and occupiers included in the property description, may not be a complete sequence, and may refer hundreds of years back in time. These complexities are characteristic of this type of document. Other types of historical document have other characteristics.

There is repetition of entries in the tables, which reflects the original. Although the two cases are together in this example, they could be separated by months or even years. In an abstract of the transaction, it would be desirable to remove the duplication, but not in the underlying data.

So far, I have stuck to the information contained within just the record of one land transaction. By itself, it raises many questions and answers few. It is important to be able to track just what this record says without conflating it with information from elsewhere. So, we need to link back to the transcript and forward to other records.

Next time I will work on the locations mentioned.

© Sue Adams 2013

About these ads

15 Comments on “Claverley Property Document Analysis, Part 2: Semantic mark-up”

  1. Tony Proctor says:

    All very impressive Sue. It wouldn’t significantly change my STEMMA representation but I do need to explain why since it’s important rather than a trivial decision.

    The items you have identified could definitely be flagged with some sort of appropriate semantic tagging, although I’m not aware of any standard vocabulary that would accommodate them. The presentation you demonstrate is not unlike my own in terms of using a configurable “style gallery” to highlight content of different types However, yours is much deeper and possibly more directed towards document analysis. It is true that STEMMA could represent the same information using its element but it actually has a separate mechanism that I would have applied in many of these instances.

    The majority of STEMMA’s semantic mark-up is to enable cross-linking of references to definitions of relevant people/places/events, integration into timelines, and general assisted searching. The cross-linking and the attaching of notes, alternative spelling/meanings, and other clarifications, all help to make the narrative readable and navigatable. This navigation might go from narrative content into a timeline, or into a tree, or even onto the Internet, but the overall goal was to integrate the narrative rather than leave in as plain text, or worse still an image file.

    What you call “facts”, STEMMA refers to as “properties” (see http://www.parallaxview.co/familyhistorydata/home/document-structure/person/properties). I feel strongly that the word “facts” is misleading since the possibility of something being factual depends on the nature of the source. Just as bad is the software-designer’s term of PFACT, which stands for property, fact, attribute, characteristic, or trait. Hmm! Anyway, STEMMA allows ‘extracted items of evidence’ to be labelled as such properties. This includes all the normal stuff like name, age, occupation, place-of-birth, cause-of-death, role(s), status, etc. However, it also has a modern framework for defining as many additional properties as you like using URIs (see http://www.parallaxview.co/familyhistorydata/home/extensibility/extended-vocabularies). This doesn’t need any central registration, and yet clashes are avoided. STEMMA’s properties can also have different data-types, be multi-valued, and specify optional units (e.g. days, weeks, etc).

    I’m not suggesting that my approach is better than anyone else’s but it feels more natural to me, and it accommodates this tradition of recording digested items of information gleaned from a source. Those properties are automatically linked back to the relevant source but they can have notes attached to them as well. The example at http://www.parallaxview.co/familyhistorydata/data-model#CSMultiRoleEvents illustrates custom properties, custom census roles, and multi-valued properties (roles in this case).

    • Tony Proctor says:

      Oops! WordPress stripped out a reference to an XML element name that I tried to reference. Here’s the same sentence without the angle-brackets that confused it:

      “It is true that STEMMA could represent the same information using its *NoteRef* element but it actually has a separate mechanism that I would have applied in many of these instances.”

  2. I’m late to this exercise. The color coding is intriguing and I can where it would be extremely helpful. Usually when I do this type of exercise I prepare a timeline. Your tables provide an intriguing and useful new dimension to the analysis. Thank you. I’m looking forward to trying this on some of my own work.

  3. I came upon this discussion via Randy Seaver at Genea-Musings and like the concept here, although the technical side is “beyond my pay grade.” I can, however, grasp the color coding and tabular format; both seem like useful tools for organizing the volume of data extracted from deeds or other estate papers. Thank you for the examples and notes; I look forward to using this technique with my current project.

  4. tonyproctor says:

    It looks like the word is finally getting out Sue, although there’s a distinct absence of comments from vendors and research groups.

    I wanted to add some constructive comments about your mark-up if I can. At first sight, it has an RDF-like nature to it. {Re-cap for non-technical readers} Tim Berners-Lee (inventor of the Web) had a vision of “linked data” for the Semantic Web where entities (also called “resources”) were linked together to help provide information rather than merely data. The essence was to represent arbitrary entities using URIs, which are extensible tags that look a lot like the URLs we enter into our browser. RDF is one of the enabling technologies for the Semantic Web and allows entities to be related together using subject-predicate-object “triples”. The Subject and Object are both represented by URIs (or sometimes a string for the Object) and so can represent anything in principle. However, so is the Predicate that relates them so any relationship can be represented too, again in principle. For example, (England)-(isPartOf)-(UK).

    I see the need for a large extensible ontology (set of entity types) if there’s a need to go down to this level of detail for the mark-up, and it’s worth reading http://semanticweb.org/wiki/Ontology.

    By far the biggest problem with your mark-up, though, is that the relationships are necessarily temporal ones (i.e. applicable at a given date, or range of dates). This is a huge problem for RDF, which otherwise models static relationships, and hence for the application of RDF to historical data in general. The above example, for instance, is only valid during the dates when the United Kingdom existed. There are attempts underway to define a Temporal RDF but nothing definite yet.

    • Umm, Time is a problem in other areas as well. Am I right in thinking we need a five part thing like (England)-(ispPartOf)-(UK)-(some time quailifier e.g. on, before, between)-(Date&time)?

      • tonyproctor says:

        Off the top of my head, it probably only needs 4 rather than 5 Sue. It needs something that indicates the combined from/to date range, either one of which could be null. However, it the relationship then changes from one thing to another (as opposed to simply not being valid outside of that range) then it needs some sort of continuity over the transition to allow queries to work successfully. I think this is why Temporal RDF wasn’t obvious and doesn’t yet exist.

  5. […] Claverley Property Document Analysis, Part 2: Semantic mark-up → […]

  6. Ian Macdonald says:

    Trouble with this is that you do need to agree on a metamodel for the concepts and that, no doubt, would lead you into deciding on whether you want to use object modelling or entity relationship modelling or some other such.

    As it stands I suspect that the colour coding (great for illustration purposes) would need to be sharpened up and particularly needs to be restricted to types of things identified. You will run out of colours quickly if using colours for things like person name rather than just person. I’d suggest you want all the the attributes of person to be the same colour.

    It really worries me when you suggest separate colours for places of different size. That is pretty fuzzy. First of all at what point in time to you decide size is relevant? Take Middlesborough. It didn’t exist other than as a field or two in 1800 but was bought to allow railway expansion and progressively swallowed up a great array of villages along the Tees.

    I’m not sure that a semantic web approach will solve anything since it allows almost anything to be linked using any notion you have of relatedness. I suspect you want to be more structured than that and want to be able to analyse the results of your markup to provide deductions of genealogical significance so that you can link people together (and groups or communities?). For that to happen you have to define those factors that influence judgements on relatedness and focus on them (and how to weight them?)

    And, as Tony says, you need to nail the temporal element – something that many people have tried to do. Part of the problem there is that you are playing in an open systems world where actors have been free over many centuries to play by their own rules. You have to factor in recorded temporal data, reliability of recording, deduced temporal data, spread of estimate, level of discrimination (century, year, quarter, month, day or on Scottish birth records down to the minute); type of calendar. Fun I suppose, if you’re into that sort of thing.

    • Hi Ian

      I agree that there is much work to do on the data modeling, and that is at the core of my thinking. The colour coding is entirely illustrative as it is clear most people find such visualisation very helpful for understanding.

      The categories certainly do need futher refinement. The two place categories were an included as an example of possible sub-categories, which I further expanded in a later post Claverley Property Document Analysis, Part 3: Places. The precision and scale place names indicate varies greatly. A place name might mean somthing clearly defined like a parish jurisdiction, or an undefined area. Fuzziness is a challenging charactereistic of historical spatial information.

      I agree that temporal data is not well served by current encodings, another significant challenge. I disagree with your view that a semantic approach won’t solve the problems. I think its flexibility may be a part of the solution, but needs implementation by some whizzy developer.

      • Ian Macdonald says:

        Oddly enough I’m not sure that it is such a knotty problem. The main elements of a metamodel are obvious enough and they are already being used as you’ve pointed out. The very fact that something is being done successfully also suggests that it may not be quality that is the issue.

        I suspect that the issue you want to deal with (apologies for speculating on your behalf) may be more to do with the level of resolution. It is not hard to deal with concepts like Person, Place, Record Type and son on. However if you wanted to be able to tag people with a wide range of subtle attributes such that you could then use them to infer relationships and family linkages then that requires much greater precision in terms of both definition and application. That’s harder.

        I agree that developing this would benefit from an evolutionary mechanism using an approach that does not lock you in to some current technologies. Call it ‘agile’ if you like though that methodology buzzword has been around for a few years now and surely is in need of replacement. Certainly build something then keep improving it.

        I fear though that quantity is the real problem, unless you want to restrict usage to narrow specialist topics.

        I am astonished at how much material has been scanned in the past decade or so. It would not have been easy to predict a business case for doing so. Hallelujah though for the fact that it has and continues to happen. However, that works at the level of the document page. The effort required to tag everything on these pages that might have semantic relevance must be 50 times as great if not more. Where will the business case come from for that?

        Now as it happens I’ve just been discussing this with a whizzy developer (Dr Alex Macdonald, senior computer scientist at Adobe – the son and heir). He is of the view that search engines have or are making semantic web ideas irrelevant. So an alternative to the drudgery of tagging everything is to adopt an empirical approach that has more of an AI flavour. In the same way that we learn through constant exposure and that Google translate is constantly improving with each discovery of a word meaning that has a different nuance, we could envision a search engine that learns what is of genealogical significance and progressively refines its ability to trawl for meaningful data.

        The catch though is the business case. Can the genealogical community be persuaded to fund it. I doubt that it could be done directly even with crowd-sourcing. It might though be done by stealth by starting with some small value-added component that people will pay a little for then using that to fund more development and so on.

        Happy New Year

  7. Ian Macdonald says:

    It needs a specification rather than a developer. Not many developers are so good at that.

    Have a horrible feeling though that all this chat just illustrates the fact that this is a horrible problem.

    An even bigger problem is, if you find a satisfactory way to semantically tag stuff in documents, how to find sufficient resource to apply these ideas, given the prodigious number of documents being digitised and placed online.

    You need software that can scan the images and intelligently tag words and phrases that can be interpreted as relevant within the context of your metamodel. Human validation thereafter might just be do-able.

    Personally I want to be able to combine the spatial and temporal dimensions so I can study the dynamics of family migration around the country. I suppose some GIS systems might do that?

    • Yes, it is a big horrible knotty problem. A specification without an implemented example is just hot air. That is why I want to work with a talented developer (i.e. whizzy), not just any developer. I think an agile approach to the development of useful tools for historical research is more likely to succeed.

      Right now, I am concerned with quality rather than quantity. Proof of concept is needed before it can be scaled up. The digital humanities community is working on a myriad of technologies that are potentially relevant. Some of the results this scholarship is already online e.g. A Vision of Britain, Old Bailey Online

  8. […] in this series I transcribed a court record of a land transaction that occurred on 25 April 1844; proposed semantic mark-up that identified people’s names, places, dates, and legal language; and validated the locations of […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 713 other followers