A
Axel Etzold
Dear all,
I have many scanned pages which I'd like to cut to prepare them
for OCR.
There are two things I'd like to do:
1.) Cut off a header of each page containing the page number,
2.) Find the largest horizontal blanks in a page (which are supposed
to separate chapters) like this:
Chapter1's text Chapter1's text
Chapter1's text Chapter1's text
Chapter1's text Chapter1's text
Chapter1's text Chapter1's text
<---- cut here, at this blank
Chapter2's text Chapter2's text
Chapter2's text Chapter2's text
Chapter2's text Chapter2's text
Chapter2's text Chapter2's text
^
|
--- (Then cut vertically)
I have tried to convert my pages, which are A4 and 600 dpi, to pixel arrays,
but this is quite slow. Is there a better method, ie. using to_blob ?
Thank you very much,
Axel
I have many scanned pages which I'd like to cut to prepare them
for OCR.
There are two things I'd like to do:
1.) Cut off a header of each page containing the page number,
2.) Find the largest horizontal blanks in a page (which are supposed
to separate chapters) like this:
Chapter1's text Chapter1's text
Chapter1's text Chapter1's text
Chapter1's text Chapter1's text
Chapter1's text Chapter1's text
<---- cut here, at this blank
Chapter2's text Chapter2's text
Chapter2's text Chapter2's text
Chapter2's text Chapter2's text
Chapter2's text Chapter2's text
^
|
--- (Then cut vertically)
I have tried to convert my pages, which are A4 and 600 dpi, to pixel arrays,
but this is quite slow. Is there a better method, ie. using to_blob ?
Thank you very much,
Axel