Archaeological Journal digitisation

In 2011, Heritage Technology was commissioned by the Royal Archaeological Institute to conduct a process of digitisation involving 120 volumes of the Archaeological Journal, its annual archaeological review publication. The Archaeological Journal was first published in 1843 and presents the results of archaeological and architectural survey and fieldwork on sites and monuments of all periods, in additional to syntheses and overviews of archaeology in the British Isles.

The project took digital scanned copies of the journal, provided by the University of Southampton, and employed Optical Character Recognition (OCR) software to create searchable, digitally-recognised documents. These documents were in turn packaged into article and volume PDFs and delivered to the Archaeology Data Service, along with detailed metadata relating information on each article's title, author, citation information, etc.

Technical details

The scale of the project was significant, involving over 50,000 pages of scanned material that required careful organisation and documentation processes.

The age of the publications also presented technical problems. Roughly half of the supplied material was published in the 19th century, meaning type-faces and the quality of print was often poor. Optical Character Recognition (OCR) software works best when applied to clear, high-quality printed materials with little variation in print quality. Many of the earlier volumes of the Archaeological Journal (eg. 1844 - c.1880) contained poorly printed or damaged pages that affected the OCR-ing process. However, in these cases manual rectification allowed errors to be corrected.

An overall average accuracy level of roughly 99.7% was observed for a selection of sample pages from across the 120 volumes.