Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Please help me how is easiest way to extract text between some variable text

Reply
Thread Tools

Please help me how is easiest way to extract text between some variable text

 
 
Mladen
Guest
Posts: n/a
 
      02-20-2011

Please help me how is easiest way to extract text between some variable text



Original text



<TH class=name width=100>New name</TH> need to
extract: New name

<TH class=name width=50>Test name </TH> need to
extract: Test name

<TH class=name width=65>Name 2</TH> need
to extract: Name 2



Thanks in advance


 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      02-21-2011
"Mladen" <(E-Mail Removed)> wrote:
>Please help me how is easiest way to extract text between some variable text
>
>Original text
><TH class=name width=100>New name</TH> need to
>extract: New name
>
><TH class=name width=50>Test name </TH> need to
>extract: Test name
>
><TH class=name width=65>Name 2</TH> need
>to extract: Name 2


You have a well-defined data structure. Treating it and analysing it as
if it were plain text would be foolish. Instead take advantage of the
existing structure and use a parser that can parse this data structure.

jue
 
Reply With Quote
 
 
 
 
sharma__r@hotmail.com
Guest
Posts: n/a
 
      02-21-2011
On Feb 20, 11:33*pm, "Mladen" <(E-Mail Removed)> wrote:
> Please help me how is easiest way to extract text between some variable text
>
> Original text
>
> <TH class=name width=100>New name</TH> * * * * * * * * * * * * * *need to
> extract: New name
>
> <TH class=name width=50>Test name </TH> * * * * * * * * * * * * * * need to
> extract: Test name
>
> <TH class=name width=65>Name 2</TH> * * * * * * * * * * * * * * * * * *need
> to extract: Name 2
>
> Thanks in advance




#!/usr/local/bin/perl
use strict;
use warnings;
local $\ = qq{\n};
my $np;
$np =
qr{
[<]
(?:
(?> [^<>]+ )
|
(??{ $np })
)*
[>]
}xms
;

my $var ='
original text
<TH class=name width=100>New name</TH>
<TH class=name width=50>Test name </TH>
need to
<TH class=name width=65>Name 2</TH>
need
Thanks in advance
';
while ($var =~ m/ $np /xmsg) {
print $1 if $var =~ m/\G(.*?)<\/TH>/xmscg;
}
__END__
 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      02-21-2011
On Feb 20, 1:33*pm, "Mladen" <(E-Mail Removed)> wrote:
> Please help me how is easiest way to extract text between some variable text
> <TH class=name width=100>New name</TH> * * * need to extract: New name


A couple of weeks back, hymie! posted a thread enditled 'table -->
pre'. He wanted to extract the content of an HTML table to preformat
it. I posted the following script and output.

Perl gives you a number of ways to do what you want, many of them
simple minded and primitive, others pretty sophisticated. I generally
prefer the former, the more simple minded and primitive the better.
You probably should approach a problem like this in an incremental
fashion, by first matching the least possible amount of what you want,
and adding to it little by little until you get what you want. You
don't need to use a regular expression, index() and substr() will do
the same kind of thing.

Other technologies will do the same kind of thing. I routinely do this
in vi (vim), when I want to transfer some content from one function to
another function, for instance, converting a SQL query to a hash
declaration.

CC.

SCRIPT
#! perl
use strict;
use warnings;

my $content = '';
while (<DATA>)
{
next unless /\w/;
chomp;
if ($_ =~ m!<(\/?)table!)
{
$content .= "<$1pre>";
next;
}
elsif ($_ =~ m!<\/?tr!)
{
$content .= "
\n";
next;
}
elsif ($_ =~ m!<t[dh]>([^<]*)<\/t[dh]>!)
{
$content .= sprintf("%-20s", $1);
next;
}
else
{
warn "ERROR: $_\n";
}
}

print $content;

exit(0);

__DATA__
<table>
<tr>
<td>George</td>
<td>Washington</td>
<td>Virginia</td>
<td>1788</td>
</tr>
<tr>
<td>George</td>
<td>Washington</td>
<td>Virginia</td>
<td>1792</td>
</tr>
<tr>
<td>John</td>
<td>Adams</td>
<td>Massachesetts</td>
<td>1796</td>
</tr>
<tr>
<td>Thomas</td>
<td>Jefferson</td>
<td>Virginia</td>
<td>1800</td>
</tr>
<tr>
<td>Thomas</td>
<td>Jefferson</td>
<td>Virginia</td>
<td>1804</td>
</tr>
</table>

OUTPUT'
<pre>
George Washington Virginia 1788

George Washington Virginia 1792

John Adams Massachesetts 1796

Thomas Jefferson Virginia 1800

Thomas Jefferson Virginia 1804
</pre>
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      02-21-2011
On Sun, 20 Feb 2011 19:33:18 +0100, "Mladen" <(E-Mail Removed)> wrote:

>Please help me how is easiest way to extract text between some variable text


Output:
'New name'
'Test name '
'Name 2'

If you wish to run the @content elements through a sub-container to extract
more, you must set up a sub that re-defines the 'Container Expression' regex
for each sub-container you need. There are variations on the theme of the
container expressions, but this superficiously get you started.

-sln

-------------
ie:
my ($open, $close, $rx);
my $comment = qr{ see below };
my $attrib = qr{ see below };
...
defineContainer ( '(?i:TH)' );
...
defineContainer ( '(?i:TR)' );
...
sub defineContainer {
my $tag = shift;
$open = qr{ see below <$tag ... }
$rx = qr( see below }
}
-------------------------

use strict;
use warnings;

# Primitive Definitions
#
my $comment = qr{(?xs)
<! (?:\[CDATA\[.*?\]\]|--.*?--|\[[A-Z][A-Z\ ]*\[.*?\]\]) >
};

my $attrib = qr{(?x)
(?:\s+ (?: [^>"'\/]* (?:"[^"]*"|'[^']*'|["']|(?:\/(?!>))?))++)
};

my $open = qr{(?x) <TH (?: \s*|$attrib ) > };
my $close = qr{(?x) </TH \s*> };

# Container Expression
#
my $rx = qr{(?xs)
$comment
| ( # Recursion group, the 'container'
$open
( # Container 'contents' to capture
(?:
$comment
| (??!$open|$close|$comment).)++
| (?1)
)*
)
$close
)
};

# Parse Code
#
my $tog;
my $text = join '', <DATA>;

my @Contents = map { !($tog=!$tog) && defined() ? $_ : () } $text =~ /$rx/g;

for (@Contents) {
print "'$_'\n";
}


__DATA__
<TH class=name width=100>New name</TH> need to
extract: New name

<TH class=name width=50>Test name </TH> need to
extract: Test name

<TH class=name width=65>Name 2</TH> need
to extract: Name 2

 
Reply With Quote
 
Peter Scott
Guest
Posts: n/a
 
      02-22-2011
On Mon, 21 Feb 2011 07:38:33 -0600, Tad McClellan wrote:
> http://www.velocityreviews.com/forums/(E-Mail Removed) <(E-Mail Removed)> wrote:
>> }xms

> ^^
> ^^
> ^^
>> ;

>
>
> The "m" modifier affects only the "^" and "$" anchors. It is a no-op if
> your pattern does not contain those anchors.
>
> The "s" modifier affects only the "." metacharacter. It is a no-op if
> your pattern does not contain that character.
>
> You should not enable special treatment if you are not going to make use
> of that special treatment.


The poster is following the principles in Damian Conway's "Perl Best
Practices," which state: "Use the /xms flags on every regular expression
you ever write [...] It takes about a week to accustom your fingers to
automatically typing /xms on every [regex]..."

--
Peter Scott
http://www.perlmedic.com/ http://www.perldebugged.com/
http://www.informit.com/store/produc...sbn=0137001274
http://www.oreillyschool.com/courses/perl3/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
An easiest way to pass IT exam like mcse,ccna.......and many others garv mehta Cisco 1 01-28-2013 08:58 PM
How do i extract vidios when winrar wont extract them??? help plzzzzzzzz smuttdog@sc.rr.com Computer Support 2 12-23-2007 07:03 AM
noob question: Trying to extract part of a string in a variable to another variable cayenne Perl Misc 19 05-19-2004 11:22 PM



Advertisments