Archive digitisation and Optical Character Recognition (OCR)

Heritage Technology has experience in transforming physical archives and publications into digital versions, suitable for online storage and presentation. Digitisation is useful for many kinds of hard-copy data, such as archaeological excavation archives or journal back catalogues. Once in a digital format, these archives are easily searchable, can be backed up easily and remove the need for physical storage.

This digitisation process can involve a number of stages:

  • scanning and photography of hard-copy data and materials - we can provide these services in-house for small-scale projects, or source low-cost external alternatives for large-scale digitisation projects (eg. journal back-catalogues, etc)
  • optical character recognition (OCR) provides high-accuracy recognition of text within scanned versions of printed materials, allowing digital copies to become searchable - see below
  • file optimisation, to ensure digital copies retain a high level of quality whilst minimising storage requirements
  • archive creation, via appropriate use of metadata and adherence to standardised file-formats for legacy purposes

Our services can be tailored to meet budgets and timescales, although in the case of large-scale digitisation projects, Heritage Technology can outsource scanning to specialist companies that provide a quicker, more cost-effective service for larger digitisation jobs.

Optical character recognition (OCR)

OCR software allows printed textual content to be 'recognised', essentially turning a scanned image into a digital document where text can be searched and selected. Once in an OCR-ed format, digital documents can be made available online privately, or though recognised content providers such as JSTOR or Ingenta.

Slide and other image collections

Heritage Technology also offers services for the digitisation of slide and other image-based collections.

Have materials that require digitisation?

Heritage Technology offers options to suit any timescale or budget.


