Archive Creation with Metadata and Searchable PDFs

You can use this solution of PDF converter enterprise wide. It helps you to create searchable PDF archives containing various documents with different attributes.

Creating archives with a large number of documents and a need for multi-attribute search and content search:

This archive creation process is intended for variously-sized archives with attribute search and content search functionality. Usually, such archives require the use of enterprise databases like Oracle, MS SQL, etc.

A high-performance scanner (multifunctional device) is used to produce digitized document images. Such scanners save the scanned document images (in image format or PDF format) to your local or network data storage.

In order to set up a scanning process permitting concurrent attribution of digitized documents, the source documents must be properly prepared. This preparation includes using so-called split sheets – special pages with barcodes placed on them. The barcode notifies the beginning of a new document and contains information like document type, author name, creation date, etc.

An external OCR system handles barcode recognition and separates one document from another.

In this case, OCR integration allows PDF Render Center to perform the following functions:

  • Separation of specific documents from the general scan-flow (based on separating the split pages from the flow and then processing them)
  • PDF document attribution via identification of the information on the split sheets and creation of a separate document with metadata (usually XML) for further processing
  • Text recognition in the image source files

As a result, PDF Render Center creates a multilayer PDF document that includes two layers: a bottom layer – with a copy of the digitized document (image) and a top text layer (invisible) – with the recognition results. This file structure allows content search within the PDF document and also preserves the source document (digitized image).

Metadata can be stored either as a separate file (XML) or as an integrated invisible stream within the PDF document. It is also possible to transfer attributes directly into storage by using PDF Render Center integration tools (web services).

Main advantages:

  • Quick implementation
  • Various ways of receiving input documents
  • Easy to manage
  • Plenty of electronic formats supported
  • Low cost of converting files
Calculate the costs for me

Thinking about such a solution?

Find out the costs! No obligations, you describe us your needs and we give you the idea about the range of costs required.

Calculate the costs for me