David Livingstone Spectral Image Archive README Document

Authors: Doug Emery
Adrian S. Wisnicki
Date: March 30, 2012

Contents

1 David Livingstone Spectral Image Archive

This archive contains images and transcriptions of the folia that constitute the Manyema Diary (1870-71) of David Livingstone, the British explorer of Africa. The Manyema Diary consists of:

  1. The Bambarre Field Diary, also called the 1870 Field Diary, and
  2. The Nyangwe Field Diary, also called the 1871 Field Diary.

In addition, the archive includes a small selection of letters written by Livingstone in 1871.

The original manuscripts of these documents reside at the Scottish National Memorial to David Livingstone, the National Library of Scotland, the Bodleian Library (University of Oxford), and the British Library, and in the private collection of photographer Peter Beard.

The Manyema Diary, a major Scottish national treasure, documents the year leading up to Livingstone's famous meeting with Henry Morton Stanley in late 1871, a meeting memorialized by Stanley's introductory question, "Dr. Livingstone, I presume?" This data set brings together the pages of the diary for the first time since the nineteenth century.

Livingstone composed the Manyema Diary when short of writing materials. As a result, he improvised by writing crosswise over a series of printed texts and other odd scraps of paper. He also created ink from the seeds of a local plant called Zingifure as his supply of iron gall ink dwindled.

2 Rights and Conditions of Use

The David Livingstone Spectral Image Archive is released with license for use under the Creative Commons Attribution-Noncommercial 3.0 Unported License © 2012, except British Library images © British Library Board, 2012, and Bodleian Library, University of Oxford images published by permission, 2012.

3 Intended Audience and Consumers

The David Livingstone Spectral Image Archive is intended to serve any interested user or party. However, its content is focused on serving the following groups.

  1. Scholars of Victorian literature, British imperial history, African history, geography, and the medical humanities
  2. Imaging scientists, and scientists in other disciplines interested in the production of the images
  3. Libraries and archives
  4. Application providers

4 Digital Project Data Set Purpose

The David Livingstone Spectral Image Archive provides all the digital information available on the Manyema Diary (1870-71) and Select Letters (1871) of David Livingstone in a single digital data set, with a standard structure. Its purposes are twofold:

  1. Serve as the authoritative digital data set of images in a standardized format that meets the needs of users, information providers, archives and libraries.
  2. Offer a standard product sustainable by users to which current or future users can add additional standardized information (e.g. alternate texts, image analyses or conservation information).

5 Data Set Contents

This data set consists of:

  1. a core content set of digital images of the Manyema Diary (1870-71) and select letters (1871) of David Livingstone, each with accompanying metadata and checksums, and XML transcriptions of the 1871 Field Diary.
  2. project-generated and third-party documentation of all included components
  3. supporting functional files, including XML schemas, and cascading style sheet files
  4. a directory for researcher contributed content files, not a part of the core data set, but which includes those folia from the Bodleian Library, University of Oxford and the British Library that were scanned rather than captured through spectral imaging.

5.1 Core Data Content

The core content of images and supporting metadata is the focus of the Digital Product. For each folio, a comprehensive set of registered images is provided of Livingstone's manuscripts. In addition, XML transcriptions of the folia that constitute the 1871 Field Diary are provided.

The core data includes:

  • Image data consisting of large 8-bit image files, including requantized raw images and processed pseudo-color images. All these files include embedded metadata and metadata files.
  • XML transcriptions.

The following image types are provided:

Color Images

Type: color
  • File Suffix: color
  • Each color image is created using registered, 16-bit flattened TIFF images captured under five visible illuminant bands, 638 nm (red), 592 nm (amber), 535 nm (green), 505 nm (cyan), and 450 nm (royal blue). A set of linear formulae is used to calculate calibrated color values from the five bands at each pixel position, and each image is output using a CIE L*a*b color space.

PCA Image Types

Type: PCA color
  • File Suffix: pca321r_pcolor, pca421r_pcolor, pca621r_pcolor, pca721r_pcolor
  • Psuedocolor image made up of principal component bands with the hue angle rotated
Type: PCA
  • File Suffix: pca321r, pca421r, pca621r, pca721r, pca321r_1, pca321r_2
  • Grayscale image that is extracted from a single channel of the corresponding pca###r_pcolor image (note: the ### indicates the principal component bands used)
Type: PCA multiplied threshold
  • File Suffix: pca321r_adapThresh_multiply, pca321r_1_adapThresh_multiply, pca321r_2_adapThresh_multiply, pca421r_adapThresh_multiply, pca621r_adapThresh_multiply, pca721r_adapThresh_multiply
  • Grayscale image that is the result of the multiplication of the thresholded grayscale image and the corresponding pca###r image (note: the ### indicates the principal component bands used)

Pseudocolor Image Types

Type: intercept
  • File Suffix: intercept
  • The infrared images (700 nm - 940 nm) were fit to a best straight line on a pixelwise basis. This generates "slope" and "intercept" images.
Type: packflat8
  • File Suffix: 0365_packflat8, 0450_packflat8, 0465_packflat8, 0505_packflat8, 0535_packflat8, 0592_packflat8, 0638_packflat8, 0700_packflat8, 0735_packflat8, 0780_packflat8, 0850_packflat8, 0940_packflat8, RABL_packflat8, RABR_packflat8, RAIL_packflat8, RAIR_packflat8
  • A linear contrast stretch applied to the 16-bit single-wavelength images. The black and white values were set 3 standard deviations away from the average value. The values beyond 3 standard deviations were clipped to black or white.
Type: pseudo_0505-0780
  • File Suffix: pseudo_0505-0780
  • The 505 nm and 780 nm wavelengths are combined in a no-veil pseudocolor image with the 780 in the red separation and the 505 in the blue and green separations.
Type: pseudo_0780
  • File Suffix: pseudo_0780
  • The505 nm and 780 nm wavelengths from one side are put into the red and green separations, respectively. The 505 nm wavelength image of the reverse side is reversed and aligned with the front side, and place in the blue separation.
Type: pseudoratio_0505-0780
  • File Suffix: pseudoratio_0505-0780
  • The 505 nm and 780 nm wavelengths are divided by the 940 nm wavelength and then combined in a standard pseudocolor image.
Type: RAIPratio
  • File Suffix: RAIPratio
  • Left and right raking infrared images are divided by the non-raking 940 nm image and used in a standard pseudocolor image.
Type: raking_irdiff
  • File Suffix: raking_irdiff
  • The left and right raking images in infrared are differenced, divided by the non-raking 940 nm wavelength, then linearly stretched to fit 6 standard deviations from white to black.
Type: RAPRratio
  • File Suffix: RAPRratio
  • The right raking blue and infrared images are divided by the non-raking 940 nm image and used in a standard pseudocolor image.
Type: RARR
  • File Suffix: RARR
  • The right raking blue image is divided by the right raking infrared image and then linearly contrast stretched.
Type: ratio_by_0940
  • File Suffix: ratio_by_0940
  • The 450 nm, 592 nm and 850 nm wavelengths are divided by the 940 nm wavelength, stretched to fit 6 standard deviations from white to black and put into the red, green and blue separations respectively.
Type: RIRL
  • File Suffix: RIRL
  • Left and right raking infrared images are differenced and linearly contrast stretched.
Type: sharpie_0505-0780
  • File Suffix: sharpie_0505-0780
  • The 505 nm and 780 nm wavelengths are combined in a no-veil pseudocolor image with the 780 in the red separation and the 505 in the blue and green separations. The sharpie image is made by linearly stretching the difference of the red and blue separations of the pseudocolor image.
Type: sharpieratio_0505-0780
  • File Suffix: sharpieratio_0505-0780
  • The 505 nm and 780 nm wavelengths are divided by the 940 nm wavelength and then combined in a standard pseudocolor image. The sharpie image is made by linearly stretching the difference of the red and blue separations of the pseudocolor image.

For each folio, the data set provides:

  • All eight-bit raw and processed registered TIFF images for the directory's folio
  • An text metadata file for each of the TIFF files in the directory
  • An MD5 checksum file for each of the TIFF and XML content files

All file names follow strict naming conventions to facilitate easy identification of file type and content.

In addition to its images, each content directory provides preservation information in the form of:

  • Metadata embedded in image files
  • Text metadata files for each image
  • MD5 checksum data for all TIFF files to ensure their fixity

The metadata for images complies with the Archimedes Palimpsest project metadata standard, which is provided with this set as documentation. The metadata provides investigative, data sharing and scientific information on the images and transitions.

Metadata are data elements about the content, quality, condition, and other characteristics of the data sets that make up the digital holdings. Metadata records are produced according to rules and definitions governing several subtypes:

  1. Identification Information
  2. Spatial Data Reference Information (images and spatial indexes, only)
  3. Imaging and Spectral Data Reference Information (images only)
  4. Data Type Information
  5. Data Content Information
  6. Metadata Reference Information

Finally, XML TEI P5 Transcriptions are provided for the folia that constitute the 1871 Field Diary and the 1871 letters:

  • DLC297b_133-162_005v
  • DLC297b_135-160_006v
  • DLC297b_137-158_007v
  • DLC297b_139-156_008v
  • DLC297b_141-154_009v
  • DLC297b_143-152_010v
  • DLC297b_145-150_011v
  • DLC297b_147-148_012v
  • DLC297b_149-146_012r
  • DLC297b_151-144_011r
  • DLC297b_153-142_010r
  • DLC297b_155-140_009r
  • DLC297b_157-138_008r
  • DLC297b_159-136_007r
  • DLC297b_161-134_006r
  • DLC297b_163-132_005r
  • DLC297c_103-132_001v
  • DLC297c_105-130_002v
  • DLC297c_107-128_003v
  • DLC297c_109-126_004v
  • DLC297c_111-124_005v
  • DLC297c_113-122_006v
  • DLC297c_115-120_007v
  • DLC297b_116_003r
  • DLC297b_117_003v
  • DLC297b_119_004r
  • DLC297b_118_004v
  • DLC297c_121-114_007r
  • DLC297c_123-112_006r
  • DLC297c_125-110_005r
  • DLC297c_127-108_004r
  • DLC297c_129-106_003r
  • DLC297c_131-104_002r
  • DLC297c_133-102_001r
  • DLC1120b_001_001r
  • DLC1120b_001_001v
  • NLS10703_001_036r
  • NLS10703_001_039v
  • NLS10703_002_037r
  • NLS10703_002_038v
  • RHOLAfrs16-1_001_172ar
  • RHOLAfrs16-1_001_172av
  • RHOLAfrs16-1_002_172br
  • RHOLAfrs16-1_002_172bv
  • NLS10701_001_154r
  • NLS10701_001_154v
  • NLS10707_001_085r
  • NLS10707_001_085v
  • NLS10768_001_080r
  • NLS10768_001_080v
  • NLS10768_001_081r
  • NLS10768_001_081v
  • PB_001_001r
  • PB_001_001v
  • PB_002_002r
  • PB_002_002v

5.2 Documentation

Documents are provided to fully describe the contents of the data set and facilitate their use. There are both external and internal documents. External documents detail data standards, file specifications, and technologies used by the project, such as the TIFF specification, MD5 checksum algorithm, and various XML-related technologies. Internal documents detail project data standards and practices, image processing algorithms, and information required to use the data set not detailed in the external documentation.

5.2.1 External Documentation

External documentation includes:

  • CSS 2
  • DCMI_Metadata_Terms
  • Dublin Core - rfc5013.txt
  • GIF89a
  • ITU Recommendation T81 (PDF)
  • HTML 4.0
  • MD5 hash - rfc1321.txt
  • PDF 1.7
  • PNG
  • ReadMe
  • SVG1.1
  • TIFF 6.0
  • XHTML 1.0
  • XML 1.0
  • XML Schema
  • XSL1.1
  • Unicode - Unicode Code charts - Unicode specifications and technical reports
  • ZIP file format specification 6.3.2

5.2.2 Internal Documentation

Internal documentation includes:

  • Archie Image Manipulation software documentation - Manual
  • File Naming Conventions
  • Folio Index
  • Image Metadata Standard
  • MD5 ReadMe
  • Metadata Data Dictionary

5.3 Supporting Functional Files

The data set provides supporting files needed to share or work with the Digital Product content data. Primarily these files are XML schema documents used to validate and process transcription, spatial index, and metadata files in XML format. The following supporting file collections are included.

  • Cascading Style Sheets (CSS) for the HTML files
  • The XML RELAXNG schema used to validate the TEI transcriptions of the Livingstone texts

5.4 Supplemental Files

The purpose of the Supplemental material is to provide alternate presentations of source material used to generate text and other content supplied with the core data. The following supplemental files are included.

  • Text file versions of line-by-line spatial mappings of the images

5.5 Contributed Research Files

This Contributed Research data is intended initially to include useful and specialized images contributed to the project by image scientists. This directory also includes folia from the Bodleian Library, University of Oxford and the British Library that were scanned rather than captured through spectral imaging. These images are useful to scholars, but not integrated into the core data set because, for example, they are not registered to core image dimensions or they are not accompanied by complete metadata.

Over the life of the data set, this directory may be used to include carefully vetted contributions that provide critical contributions to the data set, such as conservation, codicological, and other information.

6 How to Use This Data Set

This data set contains supporting documentation to enable discovery of the data and available access tools. The files named below may be located by using the file 1_FileList.txt which accompanies this ReadMe file.

6.1 General Orientation

For General Orientation to the data set, see

  • 0_ReadMe.txt, 0_ReadMe.html: this file
  • 1_FileIndex.txt, 1_FileList.html: list of files in the data set
  • FileNamingConventions.txt: a description of naming conventions for image, XML, and MD5 files
  • FolioIndex.txt: a list of the David Livingstone Spectral Image Archive folios by over-text folio
  • MD5_README.txt: a brief how-to on using MD5 files to confirm the integrity of content files

6.2 Metadata

Metadata information for the images and transcriptions is described in several supporting documents.

  • Image_Metadata_Standard.pdf: The projects imaging metadata standard document.
  • MetadataDataDictionary.txt: A complete dictionary of the metadata elements used in all contexts
  • rfc5013.txt: Dublin Core metadata elements
  • DCMI_Metadata_Terms: Dublin Core metadata term specification

6.2 Computer Access Tools

For machine access to the files in this data set the following files can be used.

  • XML schemas and DTDs for working with content TEI XML files

6.3 Scientific Information

The included scientific texts provide descriptions of image capture and processing techniques used to create the data set.

  • Archie_1.0.pdf: Documentation of the Archie 1.0 image manipulation software suite