Skip to content.
Sections
You are here: Home » Products » ELAN Capture » Tutorials, training and flash movies » Setting up an Adobe Catalog search solution for PDF files generated by ELAN Capture
Personal tools

Setting up an Adobe Catalog search solution for PDF files generated by ELAN Capture

Document Actions
ELAN Capture's output is limited in scope by only able to publish one job at a time. If the PDF files come from more then one Capture job -- or other sources -- it is possible to use Adobe Acrobat's Catalog Full Text Search solution.

Dependencies:


ELAN Capture Pro 1.6+
Adobe Acrobat Pro version 6.0 +

Workflow Steps


  1. Set up ELAN Capture for creating searchable PDF files with embedded metadata
  2. Collect the the PDF into a release folder
  3. Create an Acrobat Catalog index file with custom field support
  4. Designate a Table of Contents PDF file and attach the index file to it
  5. Educate your users how to search for metadata only, full text or the combination thereof.

1. Set up ELAN Capture for creating searchable PDF files with embedded metadata


Setting up ELAN Capture for metadata harvesting is a standard procedure. It involves, on a basic level using the built-in metadata fields (Title, Author, Subject, Keywords) or setting up special fields for the project. Either way, ity is a simple task and it is available from the Index Setup screen of ELAN Capture. Here is a screen shot of a typical scenario:



After the setup, the metadata fields can now be populated with values. This task will be performed usually collecting the information from the screen using the "OCR Selected Area" tool and then pasting the value into the field. here is a screen of the work on progress:



After all the metadata is collected, the the PDF conversion can be performed. Create as many PDF files as the metadata requirements dictate. The important point is to set up ELAN Capture to correctly transfer the generated metadata to the PDF files. The next dialog shows the important "click" points:




... and



At this point we are done with ELAN Capture and now we have PDF files that are searchable (hidden text via OCR) and also have embedded metadata fields and field values.


2. Collect the the PDF into a release folder


This operation involves simple copying of the PDF files into a folder on the file system. There is no need to handle XML or CSV or TXT files, because all this information is now present in the generated PDF files. It is. however. a good idea to create an index PDF file that serves as the Table of Contents of the project. This file can be one of the files in the collection, or generated by you. The PDF files in this folder can be placed in a folder structure for better organization.

3. Create an Acrobat Catalog index file with custom field support


From here on you will need Adobe Acrobat Pro, which comes with the "Catalog" product. In this presentation we are using Acrobat 8.0 Pro, but this functionality did not change essentially since 6.0. Screen shots might look different
from version to version.

Open up your TOC pdf file (that will also be used as the "Autorun" file from a CD or DVD) and create the index. This option is available from the "Advanced" menu:



Create a new index and fill out the available slots and boxes in a logical fashion. Add the folder where the PDF files are collected! Now we must pay attention to the "Options" button. Here is the place where you can set up Abobe Catalog to recognize the embedded metadata fields in our PDF files.



You will now add all the fields that were set up in ELAN Capture's Index settings. You might conveniently open Capture and copy and paste the field values right from the setup dialogs. After finished, our Custom Properties dialog will look like this:



The "Build" catalog is the next step that should take just a quick moment and with that we have finished the index building operation.

Note: whenever new files are added (presumed with the same metadata structure) a re-indexing is needed. Just press the "Rebuild" button instead.

4. Designate a Table of Contents PDF file and attach the index file to it


There is one more task left over: attach the generated index to your TOC PDF file. This option is available form "File" -> "Properties" (Crtl+D) -> Advanced TAB -> PDF Settings -> Search Index -> Browse. Here you will navigate to the index file, save your dialog and the PDF file.

With this we have accomplished that the user can invoke a search for all of the PDF files in the folder structure, because the index is already attached to the file.  We will see this reason in the last step that follows. If you move this file and the index file itself, it is still OK, because internally the path to the indexes are stored with relative path.


5. Educate your users how to search.


Unfortunately, the advanced search interface is not too intuitive, so you will need to provide some help with using this capability. Hera are the steps:


Use the menu Edit -> Search (Ctrl+Shft+F) that will give you this or similar dialog:



You can get to this dialog by the "Search" (not the "Find"!!) icon or control on the Adobe Acrobat Reader toolbar.

Next step: Click on the "Use Advanced Search Options":



This is the ultimate search screen that your users should arrive. Please study the screen and experiment with selecting a metadata field and just search for that value, search for multiple values or a combination of Full Text Search and Metadata search. The search capability is very powerful , fast and easy fro this point.