What is PDE?
Elan GMK PDF Drawing Extractor (PDE) is a software application used to systematically extract images/illustrations from PDF files. The output is used by document assembly applications to re-create certain content, usually technical manuals and books. The XML-based workflow makes automatic document assembly possible.
Why PDE?
The main users of PDE are companies who re-publish existing documents that are in (scanned) PDF format or regular PDFs. There are plenty of tools to convert text entities (OCR, re-typing) but handling images is a more challenging task. Typically, customers of PDE are facing one or more of the following challenges:
- Missed images
- Bad size and quality of images
- Unorganized, hard to track, file naming etc.
- Missing captions and disassociated form figures
The main consumers of the output from PDE are XML-based publishing systems such as Siemens Teamcenter. Because each output image is accompanied by an xml file that contains the metadata, document hierarchy and caption information, it is easy to integrate in the newly created XML-based publication/manual.
Features
PDE has some unique features which makes it ideal for those needing to re-create technical manuals or books.
Suitable for operations at scale.
Every time the user opens a file, a job will be created for that particular extraction task.
PDF input and PDF output along with image (BMP, PBG, JPG, TIFF) formats.
Enables multiple users to open existing jobs and lock for the duration of editing.
Images, illustrations or drawings that are present in the PDF file can be detected automatically and/or marked up using the mark-up tool.
The image captions can also be extracted using user-friendly interface elements.
Operations can be performed on all of the extracted images. The imaging operations allow for clean-up, resizing, turning, margin adjustment and image editing.
Metadata sets can be defined, saved and loaded for a particular job or defaults created. Users can enter metadata (index) values associated with the job or the particular image/illustration or drawing. Default and dynamic values can be set up and a screen scraping (OCR) tool is also also provided.
All marked-up image areas will be extracted from the PDF file. The extracted images can be saved as PDF files or image files in configurable subdirectories.