P
Peter Jamieson
I am using the following script to parse a collection of
supplied pdf files. Most of the files parsed as expected but
with some the script fell over, no output was produced,
as though the file was invisible. The files are only
alpha-numeric text, no images or graphics.
On looking at the file Properties I noticed the succesfully
parsed files had PDF Producer: doPDF Ver 6.0 Build 262
PDF Version 1.4
whilst the failures had PDF Producer: GPL Ghostscript 8.15
PDF Version 1.3
Anyone have a clue as to how I could get these errant files parsed?
Any suggestions appreciated!, Cheers, Peter
#!/usr/bin/perl -w
use strict;
use warnings;
use CAM:DF;
my $file = 'C:/test.pdf';
my $pdf = CAM:DF->new($file);
for my $page (1 .. $pdf->numPages()) {
my $text = $pdf->getPageText($page);
my @lines = split (/\n/, $text);
foreach my $line (@lines) {
# parse out useful information
}
}
supplied pdf files. Most of the files parsed as expected but
with some the script fell over, no output was produced,
as though the file was invisible. The files are only
alpha-numeric text, no images or graphics.
On looking at the file Properties I noticed the succesfully
parsed files had PDF Producer: doPDF Ver 6.0 Build 262
PDF Version 1.4
whilst the failures had PDF Producer: GPL Ghostscript 8.15
PDF Version 1.3
Anyone have a clue as to how I could get these errant files parsed?
Any suggestions appreciated!, Cheers, Peter
#!/usr/bin/perl -w
use strict;
use warnings;
use CAM:DF;
my $file = 'C:/test.pdf';
my $pdf = CAM:DF->new($file);
for my $page (1 .. $pdf->numPages()) {
my $text = $pdf->getPageText($page);
my @lines = split (/\n/, $text);
foreach my $line (@lines) {
# parse out useful information
}
}