Livingstone's 1871 Field Diary

A Multispectral Critical Edition

Spectral Image Processing
The spectral imaging of the 1871 Field Diary and associated documents produced raw image sets of 202 Livingstone folia in total, not including extraneous image sets and sets of documents imaged for the NLS. 50 of these image sets (those of laminated folia) each contained 12 registered spectral image "digital negative" files (DNGs), while the remaining 152 image sets contained 16 DNGs due to the inclusion of "raking light" images. In other words, the spectral imaging of Livingstone’s diary resulted in the creation of 3,032 "raw" images files totalling, roughly, 750 GB of data. This data required processing by the team’s imaging scientists in order to make Livingstone’s handwritten text accessible to specialists and the public-at-large.
Figures 1, 2, 3, 4. Livingstone, 1871 Field Diary,
297b/155-140, detail in four versions, top to bottom:
color, pcar, adapThresh_multiply, and spectral ratio.
Spectral image processing uses tailored mathematical algorithms in order to manipulate and enhance raw spectral image data. In the case of Livingstone’s manuscripts, such processing relies on the fact that different ink types on a given page (for instance, Livingstone’s ink, the ink of the newsprint, etc.) behave differently under different bands of wavelengths of light. Imaging scientists use this differentiated behavior to create processed images (renderings of combinations of bands) that distinguish among what may be otherwise very subtle differences in color. The processing of the Livingstone data began while the team was still in Scotland in June 2010 and lasted well into the spring of 2011.
Visit the related section from Livingstone’s Letter from Bambarre.
Principal Component Analysis (PCA)
Onsite processing in Scotland by Easton and Christens-Barry relied primarily on applying principal component analysis (PCA) to the raw image sets. In addition, Easton and his student Caroline Houston continued to experiment with this technique after the initial imaging phase. The PCA technique uses combinations of an original set of images to construct an equivalent set of images ordered by statistical variance. The first principal component is a combination of the original images with the largest variance; the second principal component is the combination of the original with the next largest variance orthogonal to the first, etc. Images from initial PC bands ("high order" bands) correspond to the smallest data variances. If the images contain objects that differ in color, they may be distinguished ("segmented") in the set of principal components.
Bonus: Download the nine PC bands produced for 297b/157-138.
Once the PC images are produced, the scientists examine the images to identify those that show different inks and insert these images into the red, green, and blue channels (RGB) of a "pseudocolor" (false color) image. If the handwritten ink appears as "light" in one of the PC images and "dark" in another, then the corresponding pixels may exhibit a color tone in the pseudocolor image that allows easier differentiation between the handwritten and printed texts. If necessary, the scientists further manipulate the pseudocolor image by using appropriate software to rotate the hue angle of the pseudocolor image to generate new combinations of the principal components. However, the process is not conducive to automation because of the manual steps involved and so is most feasible for the study of exceptionally illegible portions of text.
Figure 5. Easton (left) and Houston while processing
a folio of Livingstone's diary, Rochester, New York.
Spectral Ratio and Pseudocolor Rendering
Knox, who remained in Hawaii during the onsite imaging phase, supplemented both initial and follow-up PCA processing with techniques previously developed and/or refined for the Letter from Bambarre (see that site’s Note on Images). In subsequent months, after the team and the data returned from Scotland, Knox also developed two additional techniques that proved instrumental to deciphering Livingstone’s writing.
Figure 6. Knox moments after a breakthrough
in processing Livingstone's diary, Maui, Hawaii.
The first of these additional techniques lent itself to automation and so could be applied across all the Livingstone data sets without additional cost or labor. In December 2010, Knox discovered that although Livingstone’s writing faded as the wavelengths increased (and completely disappeared at the 940 nm wavelength), the printed text remained fairly constant across the spectrum. Knox therefore calculated the numerical ratio of the 450 nm, 592 nm, and 850 nm images relative to the 940 nm separation and put the resulting three images into, respectively, the red, green, and blue channels of a pseudocolor image. This technique effectively suppressed the printed text and made Livingstone’s handwriting very visible.
Figure 7. Livingstone, Letter to John Kirk, 13 Feb. 1871
(NLS MS.10701, fol. 154v), bleedthrough processing, detail.
A second technique, which could be partly automated, focused on minimizing ink bleedthrough. This technique involved rendering pseudocolor images using combinations of raw spectral images. In this case, Knox selected two images of the recto side of a given leaf (the 505 nm and 780 nm illuminations), and one image of the verso (505 nm) that he flipped horizontally. He performed a spatially local normalization of contrast and brightness on each image, then inserted these three images into, respectively, the red, green, and blue channels of a pseudocolor image. The resulting image displayed the recto text in cyan-blue and the verso text in green-yellow, a combination that significantly enhanced the legibility of the recto text.
Processing Objectives and Results
The Livingstone team developed a list of leaves for processing after the preliminary processing in June 2010, which established the viability of recovering Livingstone’s text. This list prioritized the most challenging images (those under the file prefixes DLC297b and DLC297c), while suggesting only color rendering for the most legible folia, the latter of which would be produced using a text file script with an executable program. The subsequent months of processing entailed a high degree of collaboration between imaging scientists and Wisnicki, who assessed whether processing results met scholarly needs and who suggested further possibilities for experimentation.
Figure 8. Christens-Barry (background) and Easton processing
Livingstone's diary, National Library of Scotland, June 2010.
The scholar-scientist discussions also resulted in comparative analysis of processing techniques and the development of additional PCA and pseudocolor processing strategies, some of which could be automated. These additional techniques included a pseudocolor method that suppresses all written and printed text, and that highlights paper topography (raking). Furthermore, during this period, Christens-Barry developed the "Equipoise Toolbox" palette for ImageJ, an open-source image processing software package. This palette allowed Wisnicki and other scholars to fine-tune the spectral images batch produced by the imaging scientists.
Figure 9. Livingstone, 1871 Field Diary, 10703/36r,
in two versions: color (top) and surface (bottom).
Ultimately, the work of the scientists resulted in the creation of a variety of spectral image processing techniques to recover the text of Livingstone’s 1871 Field Diary (see Notes on Spectral Images for the full list). The imaging scientists applied nearly all of these techniques to the diary and, where possible, used automation to extend their work to the other Livingstone manuscript leaves captured in Scotland.
Documents for Download
  1. Livingstone Processing Priority List, Wisnicki, July 2010
  2. Email: Knox-Easton-Wisnicki Processing Discussion, December 2010
  3. Processing Metadata Worksheet, Christens-Barry
  4. Processing Metadata Worksheet, Knox
  5. Processing Metadata Worksheet, Easton-Houston
  6. Equipoise Toolbox for ImageJ (zip, includes User Guide)
  7. Equipoise Toolbox Preliminary User Guide
  8. PCA-Pseudocolor Rendering (Interim Report), Easton, February 2011
  9. Conference Paper: "Recovery of handwritten text from the diaries and papers of David Livingstone," Knox et al, Jan 2011. Presented at the "Computer Vision and Image Analysis of Art II" conference, IS&T/SPIE Electronic Imaging Symposium, San Francisco, 23-27 January 2011.
  10. PowerPoint: "Recovery of handwritten text from the diaries and papers of David Livingstone," Knox et al, Jan 2011. Presented at the "Computer Vision and Image Analysis of Art II" conference, IS&T/SPIE Electronic Imaging Symposium, San Francisco, 23-27 January 2011.
  11. Email: Processing Summary, Knox, 27 August 2011
Data Management