PageTech – How to convert PCL to PDF and Extract Text
Posted on | October 19, 2012 | No Comments
Converting PCL to PDF with the new PageTech PCL Magic Printer Driver not only allows you to convert PCL to PDF it also allows you to extract all of the text found in the PCL file before it’s converted.
Just as the name suggests the new PageTech PCL Magic Printer Driver captures the text in the PCL print stream before it’s scrambled and replaces it with Unicode and UTF-8 text, giving developers the ability to capture text in any language and extract it later downstream for document splitting, auto-indexing, address block extraction, data migration, or converting to fully text-searchable PDF’s.
The following steps will show you how to use the new PageTech PCLMagic Driver to convert PCL to PDF and extract all of the text before it’s converted.
- When installing the PCL Tool SDK Live Evaluation or if you purchased Option I, III, IV, V or VI remember to let the Setup program install both our printer driver(s). After the install, open a text document in any Windows application and print it to the “PageTech PCL2PDF Driver”.
The default output directory of the PCL2PDF driver depends on the version of Windows you are running:
Program Data Folders:
32-bit XP- .\Documents and Settings\All Users\Application Data\PageTech\<product_VVv>\out
Vista/Win 7/Win 2008- .\ProgramData\PageTech\<product_VVv>\out
Vista64/Win7 64/Win 2008 64 – .\ProgramData\PageTech\<product64_VVv>\out - Use Windows Explorer to open the following output files created by the “PageTech PCL2PDF Driver” in the appropriate Program Data “.\out” folder:
filename.pdf – The text searchable PDF created by the PCLMagic Driver
filename.txt – The Unicode/UTF-8/ASCII text dump of the PCL (9 reporting types Avlb.)
filename.idx – The extracted metadata inserted into the PCL - A sample of the .idx data:
USERNAME = BP
DOCNAME = runcode.pgt
DRIVER = PCLMagic Driver
MACHINE = \\BP2
NOTIFY = BP
PRINTER = PageTech PCL2PDF Driver
PROCESSOR = WinPrint
TIMESTAMP = 20120823150024.252
JOBID = 62
TIMESUBMITTED = 20120823150023.909
DATATYPE = NT EMF 1.008
DRIVERVERSION = 3
COLOR = 1
DUPLEX = 1
COPIES = 1
QUALITY = 300
FONTOPTION = 3
PAPERSIZE = 1
PORTNAME = PMON1:
F1 = 111 (you can create your own field names and determine what to do with the data input by the user at print time using our PCLXForm script language.
For example, “F1″ could be “EMAILADDRESS” and the email address entered could be extracted using a custom script to create an external “mailto:<emailaddress>” file to know where to send the PDF that was created.)
Other optional fields:
F2 = 222
F3 = 333
F4 = 444
F5 = 555
F6 = 666
F7 = 777
F8 = 888
Please bear in mind, that this is just a small sample of PCLTool SDK capabilities. Most of our clients are retrofitting our tools into an existing legacy application workflow that cannot be changed. So our SDK provides the programming flexibility to integrate our tools into any workflow.
A fully functional evaluation copy of PCL Tool SDK (32-bit or 64-bit) including the PCL Magic Text Driver is available for download from the PageTech PCLTools website.
PageTech
Tags: black ice > extract text > PCL Conversion > PCL to PDF > PCL to Text > Text Extraction > text searchable PDF > Unicode > UTF-8
Comments
Leave a Reply
You must be logged in to post a comment.