B
blivori
I am trying to create a Python script using the PyPDF Module. What the
script does it take the 'Root' folder, merges all the PDFs in it and
outputs the merged PDF in an 'Output' folder and renames it to
'Root.pdf' (the folder which containes the split PDFs). What it does
then is do the same with the sub-directories, giving the final output
a name equal to the sub-directories.
I'm stuck when coming to process the sub-directories, giving me an
error code related to some hex values. (it seems that it is getting a
null value which is not in hex)
Please not that this happens only with certain PDF files. All of them
are non-corrupted PDFs and can be opened with any PDFViewer.
This is the error I get:
Traceback (most recent call last):
File "C:\Documents and Settings\student3\Desktop\Test\pdfMerger… line
76, in <module>
files_recursively(path)
File "C:\Documents and Settings\student3\Desktop\Test\pdfMerger… line
74, in files_recursively
os.path.walk(path, process_file, ())
File "C:\Python27\lib\ntpath.py", line 263, in walk
walk(name, func, arg)
File "C:\Python27\lib\ntpath.py", line 259, in walk
func(arg, top, names)
File "C:\Documents and Settings\student3\Desktop\Test\pdfMerger… line
38, in process_file
pdf = PdfFileReader(file( filename, "rb"))
File "C:\Python27\lib\site-packages\pyPdf\pdf… line 374, in __init__
self.read(stream)
File "C:\Python27\lib\site-packages\pyPdf\pdf… line 775, in read
newTrailer = readObject(stream, self)
File "C:\Python27\lib\site-packages\pyPdf\gen… line 67, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Python27\lib\site-packages\pyPdf\gen… line 531, in
readFromStream
value = readObject(stream, pdf)
File "C:\Python27\lib\site-packages\pyPdf\gen… line 58, in readObject
return ArrayObject.readFromStream(stream, pdf)
File "C:\Python27\lib\site-packages\pyPdf\gen… line 153, in
readFromStream
arr.append(readObject(stream, pdf))
File "C:\Python27\lib\site-packages\pyPdf\gen… line 69, in readObject
return readHexStringFromStream(stream)
File "C:\Python27\lib\site-packages\pyPdf\gen… line 276, in
readHexStringFromStream
txt += chr(int(x, base=16))
ValueError: invalid literal for int() with base 16: '\x00\x00'
script does it take the 'Root' folder, merges all the PDFs in it and
outputs the merged PDF in an 'Output' folder and renames it to
'Root.pdf' (the folder which containes the split PDFs). What it does
then is do the same with the sub-directories, giving the final output
a name equal to the sub-directories.
I'm stuck when coming to process the sub-directories, giving me an
error code related to some hex values. (it seems that it is getting a
null value which is not in hex)
Please not that this happens only with certain PDF files. All of them
are non-corrupted PDFs and can be opened with any PDFViewer.
This is the error I get:
Traceback (most recent call last):
File "C:\Documents and Settings\student3\Desktop\Test\pdfMerger… line
76, in <module>
files_recursively(path)
File "C:\Documents and Settings\student3\Desktop\Test\pdfMerger… line
74, in files_recursively
os.path.walk(path, process_file, ())
File "C:\Python27\lib\ntpath.py", line 263, in walk
walk(name, func, arg)
File "C:\Python27\lib\ntpath.py", line 259, in walk
func(arg, top, names)
File "C:\Documents and Settings\student3\Desktop\Test\pdfMerger… line
38, in process_file
pdf = PdfFileReader(file( filename, "rb"))
File "C:\Python27\lib\site-packages\pyPdf\pdf… line 374, in __init__
self.read(stream)
File "C:\Python27\lib\site-packages\pyPdf\pdf… line 775, in read
newTrailer = readObject(stream, self)
File "C:\Python27\lib\site-packages\pyPdf\gen… line 67, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Python27\lib\site-packages\pyPdf\gen… line 531, in
readFromStream
value = readObject(stream, pdf)
File "C:\Python27\lib\site-packages\pyPdf\gen… line 58, in readObject
return ArrayObject.readFromStream(stream, pdf)
File "C:\Python27\lib\site-packages\pyPdf\gen… line 153, in
readFromStream
arr.append(readObject(stream, pdf))
File "C:\Python27\lib\site-packages\pyPdf\gen… line 69, in readObject
return readHexStringFromStream(stream)
File "C:\Python27\lib\site-packages\pyPdf\gen… line 276, in
readHexStringFromStream
txt += chr(int(x, base=16))
ValueError: invalid literal for int() with base 16: '\x00\x00'