Pythonise this algorithm ?

N

news

Don't you hate the *.ps/*.pdf texts which are arranged in columns
as if it was a newspaper ? Especially when you want to email
a section after using 'pdftotxt'.

I'm guessing that an algorithm to extract colums could work
like this : [assume 2 column, but 3, 4.. should be similar, remember
that the RHS-colm of pageN continues to the LHS-colm of pageN+1]

Initialise;
Repeat (* NextBlok or exit DO *)
BeginBloks:-
Mark the TopLeftCorner -> get(StartRow,StartColm);
Mark the BotmRightCorner -> get(EndRow,EndColm);
Extract the Blok's text :-
For Row = StartRow to EndRow;
For Colm = StartColm to EndColm
PutCharToBufr;
DoLineTerminator;
Until ExitBloks.

Obviously the nesting is: Bloks > Rows > Colms.

Then it can be morphed to clean up the ">>>>" in newsgroup
threads as the lines get too long for the extra ">" ?

Thanks for any input,

== Chris Glur.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,279
Messages
2,571,387
Members
48,089
Latest member
H_coding

Latest Threads

Top