Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Need help to find byte offsets for regexps in a file

Reply
Thread Tools

Need help to find byte offsets for regexps in a file

 
 
Robert Dodier
Guest
Posts: n/a
 
      07-08-2006
Hello,

I am hoping to find byte offsets of regular expressions in a file.

I'm working on the built-in doc system for Maxima, an open-
source computer algebra system. The doc text is a Texinfo
output file. I want to find the strings " -- Function: FOO (x, y, z)
...."
and print their byte offsets, and the number of bytes from one such
string to the end of the corresponding documentation item
(which might be the next " -- Function: " item or a different regex).

Here is some pseudocode to illustrate what I am attempting --

let re1 = " --Function: <some name>"
let re2 = FOO (not sure what to put here yet)
slurp file into string S (this is OK, texinfo limits file to 300 k)
byte_offset_1 = 0
while seach for re1 beginning from byte_offset_1 succeeds
extract <some name> from re1 match
search for re2 beginnng from byte_offset_1
let byte_offset_2 = byte offset of re2 match
print <some name>, byte_offset_1, byte_offset_2
let byte_offset_1 = byte_offset_2


I'm planning to slurp the resulting output into another program
that will then carry out matching on the list of <some name> strings
and use file seek to grab the corresponding texts. That program
will be written in another programming language so let's not worry
about that now.

If anyone has some advice about making a workable Perl
program from this pseudocode, I'll be very grateful.
Thanks in advance & all the best.

Robert Dodier

 
Reply With Quote
 
 
 
 
Xicheng Jia
Guest
Posts: n/a
 
      07-08-2006
Robert Dodier wrote:
> Hello,
>
> I am hoping to find byte offsets of regular expressions in a file.
>
> I'm working on the built-in doc system for Maxima, an open-
> source computer algebra system. The doc text is a Texinfo
> output file. I want to find the strings " -- Function: FOO (x, y, z)
> ..."
> and print their byte offsets, and the number of bytes from one such
> string to the end of the corresponding documentation item
> (which might be the next " -- Function: " item or a different regex).
>
> Here is some pseudocode to illustrate what I am attempting --
>
> let re1 = " --Function: <some name>"
> let re2 = FOO (not sure what to put here yet)
> slurp file into string S (this is OK, texinfo limits file to 300 k)
> byte_offset_1 = 0
> while seach for re1 beginning from byte_offset_1 succeeds
> extract <some name> from re1 match
> search for re2 beginnng from byte_offset_1
> let byte_offset_2 = byte offset of re2 match
> print <some name>, byte_offset_1, byte_offset_2
> let byte_offset_1 = byte_offset_2
>
>
> I'm planning to slurp the resulting output into another program
> that will then carry out matching on the list of <some name> strings
> and use file seek to grab the corresponding texts. That program
> will be written in another programming language so let's not worry
> about that now.
>
> If anyone has some advice about making a workable Perl
> program from this pseudocode, I'll be very grateful.
> Thanks in advance & all the best.
>
> Robert Dodier


you can use *closures* and a subroutine, check another similar problem
in this group:

http://groups.google.com/group/comp....0f61ff2f39de4d

the detailed soluton should be different, but the way is quite
similar..the thing you want to change, from my understanding, is to
check the number of characters instead of number of newline before the
function-definition point, so change from tr/\n// to tr///. Also change
the $pattern and the s/// expression to suit your problem.

you might also try 'c', 'g' modifiers of m// expression and the '\G'
anchor. that might also be helpful.

Good luck,
Xicheng

 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      07-09-2006
Robert Dodier <(E-Mail Removed)> wrote:

> I am hoping to find byte offsets of regular expressions in a file.



perldoc -f pos


> Here is some pseudocode to illustrate what I am attempting --
>
> let re1 = " --Function: <some name>"



Why _pseudo_ when making it Real Perl is so darn easy?


my $re1 = " --Function: <some name>";


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
writing to different offsets of a file in parallel Ironhide Perl Misc 5 03-26-2011 12:11 AM
Byte Offsets of Tokens, Ngrams and Sentences? Muhammad Adeel Python 2 08-06-2010 10:06 AM
find out whther byte two .pyc files contain the same byte code. gelonida Python 1 05-05-2010 11:04 PM
checksum calculation for file offsets Ironhide Perl Misc 2 04-27-2010 06:10 PM
OpenSP API, Unicode character byte offsets Phillip Farber XML 0 08-20-2003 09:13 PM



Advertisments