seraching backward in files

P

Patrick Gundlach

Dear ruby hackers,

I'd wonder if there is a simple method in searching backward in
files. I want to read a pdf file, so I have to do the following:

1) goto the end of file (pdffile.seek(0,IO::SEEK_END))
2) search backward for the string "%%EOF" within the last 1024 bytes
3) search backward for "startxref"
4) read the next bytes,.....


Do I have to write my own seach backward routine or is there some
functionality for it?

Best regards,

Patrick
 
S

Simon Strandgaard

Patrick said:
I'd wonder if there is a simple method in searching backward in
files. I want to read a pdf file, so I have to do the following:

1) goto the end of file (pdffile.seek(0,IO::SEEK_END))
2) search backward for the string "%%EOF" within the last 1024 bytes
3) search backward for "startxref"
4) read the next bytes,.....


Do I have to write my own seach backward routine or is there some
functionality for it?

Ruby's native regexp engine cannot go backwards... thats why I
am working on my own regexp-engine. However I don't have a
FileInputStream class yet (but I can easily add one).

The text is spelled backwards, because we want to search backwards.

# pseudo code
iterator = file.create_iterator_end.reverse
re = NewRegexp.new('(?xm) .{0,1024} FOE%% .{0,4000} ferxtrats')
matchdata = re.match(iterator)

The engine is written entirely in Ruby, so speed isn't impressive.
Its fairly compatible with Ruby1.8's builtin GNU regexp, and its
carefully tested (more than 2000 tests).
http://raa.ruby-lang.org/list.rhtml?name=regexp
 
C

Carlos

Dear ruby hackers,

I'd wonder if there is a simple method in searching backward in
files. I want to read a pdf file, so I have to do the following:

1) goto the end of file (pdffile.seek(0,IO::SEEK_END))
2) search backward for the string "%%EOF" within the last 1024 bytes
3) search backward for "startxref"
4) read the next bytes,.....


Do I have to write my own seach backward routine or is there some
functionality for it?

You can load the last 1024 characters in a string and use String#rindex for
(2). Then, use String#rindex again to find (3). If you don't find, load the
second-to-last 1024 characters, use String#rindex again, etc...

Good luck.

--
 
A

Ara.T.Howard

Dear ruby hackers,

I'd wonder if there is a simple method in searching backward in
files. I want to read a pdf file, so I have to do the following:

1) goto the end of file (pdffile.seek(0,IO::SEEK_END))
2) search backward for the string "%%EOF" within the last 1024 bytes
3) search backward for "startxref"
4) read the next bytes,.....


Do I have to write my own seach backward routine or is there some
functionality for it?

Best regards,

Patrick

if you are on linux you might want to simply

tac = IO.popen "tac #{ path }"

# process file in 'normal' fashion - only backwards...




man 1 tac

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| URL :: http://www.ngdc.noaa.gov/stp/
| "640K ought to be enough for anybody." - Bill Gates, 1981
===============================================================================
 
P

Patrick Gundlach

Hi,

Carlos said:
You can load the last 1024 characters in a string and use String#rindex for
(2). Then, use String#rindex again to find (3). If you don't find, load the
second-to-last 1024 characters, use String#rindex again, etc...

Yes, this is cool. It works like charm. Thanks for pointing this out.


Patrick
 
P

Patrick Gundlach

Hi,

your solution would probably the most clean one, but would require an
additional library. I think I'll go for IO->String, String#rindex.


Thank you for your answer,


Patrick
 
P

Patrick Gundlach

Ara.T.Howard said:
if you are on linux you might want to simply

tac = IO.popen "tac #{ path }"

# process file in 'normal' fashion - only backwards...

Oh, right, I forgot about tac. But this could be slow on large files
(and non portable).

Thank you for your answer,

Patrick
 
S

Simon Strandgaard

Patrick Gundlach said:
your solution would probably the most clean one, but would require an
additional library. I think I'll go for IO->String, String#rindex.

Good luck.

BTW: I have just made an experimental File iterator.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,826
Members
47,373
Latest member
Desiree036

Latest Threads

Top