Scanning the Bill Lawrence collection

The next stage in processing the Bill Lawrence collection, the subject of this series, is scanning.  Scanning creates digital copies of documents and photographs.  The purposes I envisage for the digital copies of Bill’s collection include:

  1. Sharing with family members
  2. Ready accessibility for research
  3. Examining close-up detail
  4. Improving legibility
  5. Reducing wear and tear on the originals

Making digital copies adequate for all these uses within time, cost and equipment constraints requires compromises.  So before I start there are some decisions to be made.

Scanning is for access not preservation

One decision that I made while deciding what I intend using the digital copies for is that I am not producing copies for preservation.  I frequently see scanning document and photos referred to as a means of preserving them.  Scanning does not preserve anything, it makes copies.  Paper and photographs only benefit from scanning if they are handled less and suffer less damage as a consequence, because the digital copies are used instead.  I also see people suggesting that the original may be discarded once scanned.  Even a high quality digital copy is not the same as the original.  If it is worth scanning it is worth keeping.  Why would you throw a unique original away?  I made decisions about what was worth keeping when I catalogued the collection.  Paper documents and photographs can be expected to last decades if kept in moderately good conditions (e.g. a room that you live in).  In comparison, digital copies are very fragile indeed, and require constant care if they are to remain useable and accessible in the long term.  I can replace lost digital copies by re-scanning, but I can’t replace the originals.

My aim is to make working copies, but also pay attention to archival recommendations.  My research purposes require good quality images in an accessible format.  In the case of this collection, anything that fulfils my needs will likely exceed the expectations of family members, who would be happy with photocopies.

Image Quality

A good quality image for research purposes clearly shows all detail and is faithful to the appearance of the original.

Scanner or digital camera?

It is much easier to achieve consistent image quality using a flat-bed scanner than a digital camera.  Scanners have their own light source, a fixed distance between the sensor and target and collect the image data line-by-line by moving the sensor over the target.  Consequently images are in focus, of consistent exposure and resolution, and may be very high resolution.

Although digital cameras feature autofocus, automatic exposure and white balance, quality of the image will vary with the light source, camera support and distance from the target.  Camera sensors collect all the image data at once, so resolution is limited by the number and size of pixels on the sensor.  A handheld camera using ambient light is unlikely to achieve consistent images.  Using a camera stand or tripod to set the distance to the target (standardising resolution), and ensure sharp focus; and studio style lighting to standardise exposure and white balance can overcome the difficulties.  Some people claim they can digitise more quickly using a camera, but I think this is true only if some quality is sacrificed or the studio is already set up.  Claims that apps and mini-studio products convert your phone into a scanner are technically inaccurate.  They use the camera in optimal conditions, and may process the images afterward.

There are situations when using a camera is preferable, or the only option.  Archives in the UK generally do not allow the use of devices that require contact with the document, which rules out using a portable flat-bed scanner like the popular Flip-Pal or any of the wand or mouse devices.  In case you are wondering how that is justified, self-service photocopying is also not allowed.  Three dimensional objects, very large documents that don’t fit on a scanner, and documents with features that are difficult to scan well (e.g. watermarks) often warrant careful photography.

Taking account of the equipment I own, my A4 flatbed scanner is the best option for Bill’s collection.  Only a few documents need to be scanned in more than two pieces.  I don’t own a mini-studio or have practical space to set one up.  I may use my digital camera to capture watermarks, and for some creative images.

Resolution

Digital images are made up of pixels, the tiny squares (usually) that you can see if you zoom in enough.  Data for each pixel consists of values for constituent colours (Red, Green, Blue) and other properties like brightness.  Resolution is measured in pixels per inch or ppi, which is also called dots per inch or dpi.  The higher the scanning resolution, the more fine detail is captured.

Archival institutions do not have universal recommendations, but do have similarities.  The National Archives (TNA, United Kingdom), British Library (BL)  and National Archives and Records Administration (NARA, United States of America) all agree a minimum of 300 ppi for documents.  For documents with significant elements of less than 1.5mm, NARA and BL suggest 400 ppi or more.  Size matters when it comes to scanning photographs, with smaller photographs needing higher resolutions.  TNA suggests a minimum of 600 ppi, BL a range between 300 ppi and 1200 ppi, and NARA between 600 ppi and 2800ppi.  Another way of expressing resolution is as the minimum number of pixels along the longest edge of the document.  Using this measure NARA suggests 4000 pixels for documents with fine detail and small photographs, and up to 8000 pixels for large photographs.  The Smithsonian keeps it simple with a recommendation of at least 6000 pixels along the longest edge for all artefacts.

So, 300 ppi is adequate for most documents in Bill’s collection, and 600 ppi is a good starting point for the photographs.  At what resolution would you scan the passport?  The most informative pages contain personal details of Robert and Mary Spencer (Bill’s grandparents), small photographs (47mm x 67 mm), ink and embossed stamps, signatures, and a detailed background pattern.  Although not strictly necessary for genealogical research, I am fascinated by the anti-forgery features.  For genealogical purposes, I need clear representation of stamps, signatures, handwriting and printed information.  So the smallest features of interest to me are the background pattern and letters in the embossed stamps (which contain the Royal Arms).  After some experimentation, I found 1200 ppi gave me the detail I wanted.

Note that passports are official documents that remain property of the government and parts of this passport, particularly the Royal Arms, might still be subject to Crown copyright.  The guidelines for making copies are aimed a current passports. Even though the people are long dead and passports have changed considerably since this passport was issued in 1922, making it very unlikely that copies could now be miss-used, I am going to exercise caution and not publish images of whole pages or the Royal Arms. The photograph of Robert Spencer has been over-stamped in ink. Zooming in on his ear and part of the stamp, an area 10mm high on the original, you can see the effect of resolutions of 300, 600 and 1200 ppi:

ear closeup

Close up of ear in passport photograph at resolutions of 300, 600 & 1200 ppi, shown at the same size. On the right, the relative print size of image extracts with increasing resolution.

Colour and Other Considerations

Depending on the model of scanner there may be a range of settings available.  A basic scanner such as the Flip-Pal only has settings for 2 resolutions (300 ppi and 600 ppi).  More advanced models have an array of settings including brightness, contrast, colour saturation and balance.  NARA offers some sage advice:

“Some people suggest it is best to save raw image files, because no ‘bad’ image processing has been applied. This assumes you can do a better job adjusting for the deficiencies  of a scanner or digital camera than the manufacturer, and that you have a lot of time to adjust each image”.

Only if default settings fail to produce an acceptable result will I make any adjustments.

I choose to scan all documents in colour because an authentic copy makes it easier to see different inks in handwriting.  Some documents might be more legible scanned as greyscale images.  Assessing colour fidelity is difficult because each scanner & digital camera collects the data slightly differently and each computer screen and printer interprets the data differently.  Screens come with differing capabilities, so my stand-alone monitor is capable of much greater range in colour and contrast than my basic laptop screen.  Calibration of equipment can overcome the technical differences, but remember that no two people see colour the same way.  I limit my colour correction to collecting calibration information by scanning a colour reference card (e.g. QPcard) once during a scanning session.  Archives may include a colour reference and scale card in each image and calibrate the images after scanning.

Some adjustments are best done during scanning.  Printed materials like newspapers and magazines may require adjustment to the de-screening setting, which corrects the moiré pattern that may occur.

Effect of de-screening

Un-altered close up of a bathroom tap on left, de-screened on right.

Adjustments to contrast can enhance text legibility, especially if the original is faded.  This can also be modified using image editing software.

Contrast example

Faded original extract of Bill Lawrence’s birth certificate on left, enhanced contrast on right.

Digital Image Format

Common image file formats produced by scanners include JPEG (Joint Photographic Experts Group, file extension jpg), TIFF (Tagged Image File Format, file extension tif), BITMAP (file extension bmp) and PDF (Portable Document Format, file extension pdf).  Each has advantages and disadvantages.  Ease of use, quality of image data, preservation of image data, file size and support for image metadata are all considerations.

Many archival institutions use tiff for archival master copies, because it preserves the image data without any loss.  JPEG files store the image data in a compressed form that discards some information (known as ‘lossy’ compression).  Consequently Tiff files are considerably larger than JPEG files.

For this project I choose to save files as JPEGs, as the small compromise in image quality is outweighed by the ease of use and support for metadata.

Preparation leads to consistency

Having decided on scanning protocols, the images I scan will be consistent quality and suitable for all the purposes I envisage.  I also have a starting point for future projects.

My choices for scanning protocols are not the only way of completing a scanning project.  You could make different choices because you have different aims and constraints.  It is important that you know what decisions you made and why you came to those decisions.  Then you are in a good position to keep what works and modify what didn’t in future projects.

© Sue Adams 2016

Advertisements

4 Comments on “Scanning the Bill Lawrence collection”

  1. Michael says:

    Thank you for walking us through a conversation on scanning. I appreciated reading the pros and cons of flatbed scanners vs cameras.

    Like

  2. Another advantage to scanning for the purpose of having a working copy is that you can harness OCR (optical character recognition). I have a ScanSnap scanner with software that will scan a document and create a searchable PDF. I can’t tell you how handy it is to be able to search a large document for every occurrence of a name or date – something you CAN’T do with an unindexed original.

    Like

    • Sue Adams says:

      Whilst Optical Character Recognition (OCR) can assist with converting printed documents to searchable text, it only copes well with a modern fonts and a clear image. As the documents in this collection are handwritten manuscripts, printed forms filled in by hand and photographs, OCR is not very helpful for this project.

      Another disadvantage of relying on OCR is the accuracy is often not good enough for genealogical purposes. Even at 99% accuracy, too many important bits of information are corrupted. Miss-readings of people’s names, places and dates seriously hamper correct interpretation and can lead an unwary researcher in the wrong direction.

      Like

  3. Gigi H says:

    Great post with a lot of information included.

    Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s