Tag Archive for 'php'

Document Processing Source Code

This is the PHP console program service that Anthony and I programmed at work. We got the owner of the company to sign off a release form finally making the code available under the GPL.

Please note that this code will probably not work for you as a drop in place system, it will require tweaking, and lots of setup on your part. Please do not ask me for assistance.

Use this code at your own risk. You will need knowledge of Linux, PHP, Shell Scripts, and some common computer programming knowledge.

Folder structure for this program is as follows:
/tmp – Used for temporary storage of files for FDF merge into interactive PDF.
/usr/local/bin – Storage for all scripts this program uses. (dps.php, forge_fdf.php, file2pdf, pollerctl, dps_poller, hud-process.php)
/usr/local/dpsdocs – Root storage for following folders:
dpsdocs/merged – If save mode is on, saves raw ODT and PDF files in named folders.
dpsdocs/failed – Saves documents that failed to merge for some reason.
dpsdocs/monitor – If user chooses fake printer, will save the PDF here, instead of really printing it. Good for NFS/SMB share.
dpsdocs/templates – Live templates that the system pulls.
dpsdocs/test_templates – Test templates that you can SMB/NFS share for users to add tempaltes and you can sync to live later.
dpsdocs/xml_request – Where all the XML files get dumped to from which ever system you have generating them. Good to NFS/SMB share.

Files used and purpose:
dps.php – Main PHP program, does the merging and OOO operations. Takes in the XML as the first parameter and some others. Open file and view for yourself.
dps_poller – The program that loops continuously looking for newly dumped XML files. XML files that have specific printers are SCP’ed over the network to other servers running program so they build and print locally at that location.
file2pdf – This shell script gets called by dps.php and finishes the job, runs pdftk operations, and reads the sorting.txt file to decide where to print.
hud-process.php – Example PHP file that takes in FDF data and merges with specific PDF files. Called from dps.php.
pollerctl – Service script that makes sure the dps_poller is always running and never died for some reason.
tbs_class.php – TinyButStrong PHP class file, called from tbsooo_class.php.
tbsooo_class.php – TinyButStrong OpenOfficeOrg class file, called from dps.php.
zint – Console program that takes in data and creates a barcode. dps.php calls this, and creates a PDF417 barcode.

Document Processing PHP System

You want more?!

Awhile back, (awhile being a few years now) Anthony and I wrote a document processing system in PHP for the company. It takes in XML data and merges this with OpenOffice documents as templates to create final dynamic merged PDF documents for printing, email, and storage on disk. It has been running ever since, processing a few hundred documents a week.

This system actually replaced another one that did close to the same thing, but required Microsoft Word, licenses for Office, and multiple servers and Windows installations and .NET programming. It was horrible, slow, and unstable. Which lead us to create the new system.

A few days ago I was asked to add another template to the server, which happens a few times a month. But this time the document to be added was a PDF file already, very complex actually. Having multiple form fields and such. I tried to recreate / convert it to OpenOffice but it always comes out ugly. So instead of working around this, I went ahead and added some more code to the program to take in the XML data as usual, and for this one document (or more in the future) to merge FDF data into the PDF.

Searching around where to start, and found a good site: http://www.mactech.com/articles/mactech/Vol.20/20.11/FillOnlinePDFFormsUsingHTML/index.html which gave some very useful information. What I really needed was the php function he offers here: http://www.pdfhacks.com/forge_fdf/. Take a look, it’s kinda neat.

It is in production now, and working very well!