Dos and Don'ts for digitisation workflows

Logo

A collection of typical questions and problems that occur during digitsation projects including possible answers.

Navigation

Scanning and photographing

Do: scan the color chart on each page as well

Ideally, each scan should include a color chart. This does not usually pose a problem for subsequent processing or when displaying the image as it can be detected and hidden through automated cropping.

Incidentally, color charts don’t last forever and should occasionally be replaced.

Do: think twice before deciding to use a scanning robot for your digitization project.

A scanning robot may not be the right choice for some types of source material. Tasks such as preparing the books, loading and unloading, and making fine adjustments for individual book formats can be relatively time-consuming depending on the model. Furthermore, not all books are equally suitable for scanning by a book robot. Experience has shown that current scanning robots are not yet reliable or autonomous enough to be able to operate entirely without supervision. The time required by a human supervisor to monitor the work of the robot and to load the device and make fine adjustments for each book can be as much as for manual scanning.

Do: validate your images automatically, ideally straight after scanning

Wherever technically possible, image files should be validated automatically as soon as they have been scanned, i.e. before performing any further tasks in the workflow. The free software JHove has proven effective for common methods of validation (e.g. of TIF files). It can be run automatically from a command line instruction and can return a machine-readable validation result in the form of an XML file. It may also be worth validating file names at an early stage in the workflow. Software tools like ‘regular expressions’ can force project staff to comply with established naming schemes. They can also ensure that numerical sequences in file names are unbroken or that the file name includes relevant metadata.

Don’t: allow the scanning software to perform automatic cropping

The option to let your scanning software crop images automatically may sound attractive as a way of reducing your workload, but the potential benefits and disadvantages should be carefully weighed up for each project. If the images are cropped too heavily, the project may not be able – at some point in the future – to generate more information from them than was anticipated at the time of scanning. For this reason, especially in the case of valuable materials that should ideally not be exposed to multiple scanning, it is important to ensure that the master image is produced with a generous all-round margin. The scanning program should never be instructed to automatically perform ‘heavy cropping’ of master images. Instead, an automated cropping step can be incorporated into the remaining workflow using a derivative image and selected parameters. This step can be performed as many times as required without changing the master image.

Don’t: allow uneven lighting conditions in the scanning room to spoil your images

The brightness of the images produced by most scanners and cameras will vary depending on the ambient lighting conditions (summer/winter and morning/afternoon). Although they have their own integrated lighting systems, they are nevertheless influenced by other light sources in the room due to the way they are designed. Ideally, this should be thoroughly checked at least once for each device. Even hardware that is advertised as operating independently of ambient light can deliver surprising results under close examination of two photographs taken under different lighting conditions. All images of the same object should be taken under constant lighting conditions.