Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > How to extract part of the text (htm) file after start word until end word?

Reply
Thread Tools

How to extract part of the text (htm) file after start word until end word?

 
 
Guest
Posts: n/a
 
      05-12-2006
How to extract part of the text (htm) file after start word until end word?

start word is <! start >
end word is <! end>
Eg: some. html
-----------------------------------
not interested part of file...
not interested part of file... <! start >Interested
part
123456789<! end> not interested part..
------------------------------------


"Interested
part
123456789"

Thanks


 
Reply With Quote
 
 
 
 
Guest
Posts: n/a
 
      05-12-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
: How to extract part of the text (htm) file after start word until end word?

: start word is <! start >
: end word is <! end>
: Eg: some. html
: -----------------------------------
: not interested part of file...
: not interested part of file... <! start >Interested
: part
: 123456789<! end> not interested part..
: ------------------------------------

Slurp your file in paragraph mode (search perldoc perlvar) by saying

local $/;
local $_ = <FH>;
if ( /<! start>(.*)<! end>/ ) {
$text=$1;
}
print $text;

Build a loop around this construct if you have more than one start..end
segment per file.

Oliver.

--
Dr. Oliver Corff e-mail: http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de
 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      05-12-2006
<(E-Mail Removed)-berlin.de> <(E-Mail Removed)-berlin.de> wrote:
> (E-Mail Removed) wrote:
>: How to extract part of the text (htm) file after start word until end word?
>
>: start word is <! start >
>: end word is <! end>
>: Eg: some. html
>: -----------------------------------
>: not interested part of file...
>: not interested part of file... <! start >Interested
>: part
>: 123456789<! end> not interested part..
>: ------------------------------------
>
> Slurp your file in paragraph mode (search perldoc perlvar) by saying
>
> local $/;
> local $_ = <FH>;
> if ( /<! start>(.*)<! end>/ ) {



if ( /<! start>(.*)<! end>/s ) { # interesting part contains newlines


> $text=$1;
> }
> print $text;
>
> Build a loop around this construct if you have more than one start..end
> segment per file.



If there is more than one, then you'd better make that:

if ( /<! start>(.*?)<! end>/s ) { # non-greedy


--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      05-12-2006
Aukjan van Belkum <(E-Mail Removed)> wrote:

> if ( m/<\! start \!>/ .. m/<\! end \!>/){



There is no upside to gratuitous backslashing.

Exclamation marks are not special in regular expressions, so there
is no need to backslash them.

(and your patterns do not match the strings the OP posted.)


--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Guest
Posts: n/a
 
      05-12-2006
Tad McClellan <(E-Mail Removed)> wrote:

: if ( /<! start>(.*)<! end>/s ) { # interesting part contains newlines

Thanks for the correction, I felt I was missing something.

: If there is more than one, then you'd better make that:

: if ( /<! start>(.*?)<! end>/s ) { # non-greedy

And thank you for that, too.

Oliver.
--
Dr. Oliver Corff e-mail: (E-Mail Removed)-berlin.de
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
writing on file not until the end Alexzive Python 8 05-25-2009 02:06 PM
extract a range start/end? Michael Linfield Ruby 8 09-03-2007 12:40 AM
regex: \<start-of-word, and end-of-word\> not in gsub? Shea Martin Ruby 1 01-15-2007 08:15 PM
Extract until unquote or EOL Mats Perl Misc 4 07-18-2005 07:09 PM



Advertisments