PageTech – How to convert PCL to PDF and Extract Text

Posted on | October 19, 2012 | No Comments

Converting PCL to PDF with the new PageTech PCL Magic Printer Driver not only allows you to convert PCL to PDF it also allows you to extract all of the text found in the PCL file before it’s converted.

Just as the name suggests the new PageTech PCL Magic Printer Driver captures the text in the PCL print stream before it’s scrambled and replaces it with Unicode and UTF-8 text, giving developers the ability to capture text in any language and extract it later downstream for document splitting, auto-indexing, address block extraction, data migration,  or converting to fully text-searchable PDF’s.

The following steps will show you how to use the new PageTech PCLMagic Driver to convert PCL to PDF and extract all of the text before it’s converted.

  1. When installing the PCL Tool SDK Live Evaluation or if you purchased Option I, III, IV, V or VI remember to let the Setup program install both our printer driver(s). After the install, open a text document in any Windows application and print it to the “PageTech PCL2PDF Driver”. 

    The default output directory of the PCL2PDF driver depends on the version of Windows you are running:

    Program Data Folders:

    32-bit XP- .\Documents and Settings\All Users\Application Data\PageTech\<product_VVv>\out

    Vista/Win 7/Win 2008- .\ProgramData\PageTech\<product_VVv>\out

    Vista64/Win7 64/Win 2008 64
    – .\ProgramData\PageTech\<product64_VVv>\out
  2. Use Windows Explorer to open the following output files created by the “PageTech PCL2PDF Driver” in the appropriate Program Data “.\out” folder:

    filename.pdf  – The text searchable PDF created by the PCLMagic Driver

    filename.txt – The Unicode/UTF-8/ASCII text dump of the PCL (9 reporting types Avlb.)

    filename.idx – The extracted metadata inserted into the PCL
  3. A sample of the .idx data:

    USERNAME = BP
    DOCNAME = runcode.pgt
    DRIVER = PCLMagic Driver
    MACHINE = \\BP2
    NOTIFY = BP
    PRINTER = PageTech PCL2PDF Driver
    PROCESSOR = WinPrint
    TIMESTAMP = 20120823150024.252
    JOBID = 62
    TIMESUBMITTED = 20120823150023.909
    DATATYPE = NT EMF 1.008
    DRIVERVERSION = 3
    COLOR = 1
    DUPLEX = 1
    COPIES = 1
    QUALITY = 300
    FONTOPTION = 3
    PAPERSIZE = 1
    PORTNAME = PMON1:
    F1 = 111        (you can create your own field names and determine what to do with the data input by the user at print time using our PCLXForm script language. 

    For example, “F1″ could be “EMAILADDRESS” and the email address entered could be extracted using a custom script to create an external “mailto:<emailaddress>” file to know where to send the PDF that was created.)


    Other optional fields:
    F2 = 222
    F3 = 333
    F4 = 444
    F5 = 555
    F6 = 666
    F7 = 777
    F8 = 888

Please bear in mind, that this is just a small sample of PCLTool SDK capabilities.  Most of our clients are retrofitting our tools into an existing legacy application workflow that cannot be changed.  So our SDK provides the programming flexibility to integrate our tools into any workflow.

A fully functional evaluation copy of PCL Tool SDK (32-bit or 64-bit) including the PCL Magic Text Driver is available for download from the PageTech PCLTools website.

PageTech

If you enjoyed this post, make sure you subscribe to my RSS feed!

Comments

Leave a Reply

You must be logged in to post a comment.