Need help to find byte offsets for regexps in a file

Robert Dodier · Jul 8, 2006

Hello,

I am hoping to find byte offsets of regular expressions in a file.

I'm working on the built-in doc system for Maxima, an open-
source computer algebra system. The doc text is a Texinfo
output file. I want to find the strings " -- Function: FOO (x, y, z)
...."
and print their byte offsets, and the number of bytes from one such
string to the end of the corresponding documentation item
(which might be the next " -- Function: " item or a different regex).

Here is some pseudocode to illustrate what I am attempting --

let re1 = " --Function: <some name>"
let re2 = FOO (not sure what to put here yet)
slurp file into string S (this is OK, texinfo limits file to 300 k)
byte_offset_1 = 0
while seach for re1 beginning from byte_offset_1 succeeds
extract <some name> from re1 match
search for re2 beginnng from byte_offset_1
let byte_offset_2 = byte offset of re2 match
print <some name>, byte_offset_1, byte_offset_2
let byte_offset_1 = byte_offset_2

I'm planning to slurp the resulting output into another program
that will then carry out matching on the list of <some name> strings
and use file seek to grab the corresponding texts. That program
will be written in another programming language so let's not worry
about that now.

If anyone has some advice about making a workable Perl
program from this pseudocode, I'll be very grateful.
Thanks in advance & all the best.

Robert Dodier

Xicheng Jia · Jul 8, 2006

Robert said:
Hello,

I am hoping to find byte offsets of regular expressions in a file.

I'm working on the built-in doc system for Maxima, an open-
source computer algebra system. The doc text is a Texinfo
output file. I want to find the strings " -- Function: FOO (x, y, z)
..."
and print their byte offsets, and the number of bytes from one such
string to the end of the corresponding documentation item
(which might be the next " -- Function: " item or a different regex).

Here is some pseudocode to illustrate what I am attempting --

let re1 = " --Function: <some name>"
let re2 = FOO (not sure what to put here yet)
slurp file into string S (this is OK, texinfo limits file to 300 k)
byte_offset_1 = 0
while seach for re1 beginning from byte_offset_1 succeeds
extract <some name> from re1 match
search for re2 beginnng from byte_offset_1
let byte_offset_2 = byte offset of re2 match
print <some name>, byte_offset_1, byte_offset_2
let byte_offset_1 = byte_offset_2

I'm planning to slurp the resulting output into another program
that will then carry out matching on the list of <some name> strings
and use file seek to grab the corresponding texts. That program
will be written in another programming language so let's not worry
about that now.

If anyone has some advice about making a workable Perl
program from this pseudocode, I'll be very grateful.
Thanks in advance & all the best.

Robert Dodier

you can use *closures* and a subroutine, check another similar problem
in this group:

http://groups.google.com/group/comp...1ff2f39de4d?q=&rnum=14&hl=en#2c0f61ff2f39de4d

the detailed soluton should be different, but the way is quite
similar..the thing you want to change, from my understanding, is to
check the number of characters instead of number of newline before the
function-definition point, so change from tr/\n// to tr///. Also change
the $pattern and the s/// expression to suit your problem.

you might also try 'c', 'g' modifiers of m// expression and the '\G'
anchor. that might also be helpful.

Good luck,
Xicheng

Tad McClellan · Jul 9, 2006

Robert Dodier said:
I am hoping to find byte offsets of regular expressions in a file.

perldoc -f pos

Here is some pseudocode to illustrate what I am attempting --

let re1 = " --Function: <some name>"

Why _pseudo_ when making it Real Perl is so darn easy?

my $re1 = " --Function: <some name>";

I need help with my python assignment and I'm stuck can't find any solution for it. Convert CSV string format to JSON format	0	Oct 12, 2021
Processing in Python help	0	Aug 31, 2022
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
Need to find list of things in a file	2	Mar 20, 2011
How to try a range of hex values in C# code ?	0	Nov 19, 2022
need help in adding xlrd file to the dictionary.	1	Apr 27, 2014
OpenSP API, Unicode character byte offsets	0	Aug 20, 2003
Reading a file in chunks, to a byte array	1	Jan 29, 2009

Need help to find byte offsets for regexps in a file

Robert Dodier

Xicheng Jia

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads