Need help to find byte offsets for regexps in a file

R

Robert Dodier

Hello,

I am hoping to find byte offsets of regular expressions in a file.

I'm working on the built-in doc system for Maxima, an open-
source computer algebra system. The doc text is a Texinfo
output file. I want to find the strings " -- Function: FOO (x, y, z)
...."
and print their byte offsets, and the number of bytes from one such
string to the end of the corresponding documentation item
(which might be the next " -- Function: " item or a different regex).

Here is some pseudocode to illustrate what I am attempting --

let re1 = " --Function: <some name>"
let re2 = FOO (not sure what to put here yet)
slurp file into string S (this is OK, texinfo limits file to 300 k)
byte_offset_1 = 0
while seach for re1 beginning from byte_offset_1 succeeds
extract <some name> from re1 match
search for re2 beginnng from byte_offset_1
let byte_offset_2 = byte offset of re2 match
print <some name>, byte_offset_1, byte_offset_2
let byte_offset_1 = byte_offset_2


I'm planning to slurp the resulting output into another program
that will then carry out matching on the list of <some name> strings
and use file seek to grab the corresponding texts. That program
will be written in another programming language so let's not worry
about that now.

If anyone has some advice about making a workable Perl
program from this pseudocode, I'll be very grateful.
Thanks in advance & all the best.

Robert Dodier
 
X

Xicheng Jia

Robert said:
Hello,

I am hoping to find byte offsets of regular expressions in a file.

I'm working on the built-in doc system for Maxima, an open-
source computer algebra system. The doc text is a Texinfo
output file. I want to find the strings " -- Function: FOO (x, y, z)
..."
and print their byte offsets, and the number of bytes from one such
string to the end of the corresponding documentation item
(which might be the next " -- Function: " item or a different regex).

Here is some pseudocode to illustrate what I am attempting --

let re1 = " --Function: <some name>"
let re2 = FOO (not sure what to put here yet)
slurp file into string S (this is OK, texinfo limits file to 300 k)
byte_offset_1 = 0
while seach for re1 beginning from byte_offset_1 succeeds
extract <some name> from re1 match
search for re2 beginnng from byte_offset_1
let byte_offset_2 = byte offset of re2 match
print <some name>, byte_offset_1, byte_offset_2
let byte_offset_1 = byte_offset_2


I'm planning to slurp the resulting output into another program
that will then carry out matching on the list of <some name> strings
and use file seek to grab the corresponding texts. That program
will be written in another programming language so let's not worry
about that now.

If anyone has some advice about making a workable Perl
program from this pseudocode, I'll be very grateful.
Thanks in advance & all the best.

Robert Dodier

you can use *closures* and a subroutine, check another similar problem
in this group:

http://groups.google.com/group/comp...1ff2f39de4d?q=&rnum=14&hl=en#2c0f61ff2f39de4d

the detailed soluton should be different, but the way is quite
similar..the thing you want to change, from my understanding, is to
check the number of characters instead of number of newline before the
function-definition point, so change from tr/\n// to tr///. Also change
the $pattern and the s/// expression to suit your problem.

you might also try 'c', 'g' modifiers of m// expression and the '\G'
anchor. that might also be helpful.

Good luck,
Xicheng
 
T

Tad McClellan

Robert Dodier said:
I am hoping to find byte offsets of regular expressions in a file.


perldoc -f pos

Here is some pseudocode to illustrate what I am attempting --

let re1 = " --Function: <some name>"


Why _pseudo_ when making it Real Perl is so darn easy?


my $re1 = " --Function: <some name>";
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top