Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > split a multiple lines text

Reply
Thread Tools

split a multiple lines text

 
 
bingfeng
Guest
Posts: n/a
 
      10-20-2008
Hello,
Assume I have following string:
my $cmds = <<DOC
__begin {
abc;
def;
{foo;bar}
} __end;
__begin {
cde;
} __end;
abc;
bad;
DOC
;

I want to split it into an array, the first item is "__begin {
abc;
def;
{foo;bar}
} __end", the second item is "__begin {
cde;
} __end", and the third is "abc" and the fourth is "bad".

split obviously cannot be used here, so I use following regex:
my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

but it does not work at all. so how can I do with this?

regards,
bingfeng
 
Reply With Quote
 
 
 
 
John W. Krahn
Guest
Posts: n/a
 
      10-20-2008
bingfeng wrote:
>
> Assume I have following string:
> my $cmds = <<DOC
> __begin {
> abc;
> def;
> {foo;bar}
> } __end;
> __begin {
> cde;
> } __end;
> abc;
> bad;
> DOC
> ;
>
> I want to split it into an array, the first item is "__begin {
> abc;
> def;
> {foo;bar}
> } __end", the second item is "__begin {
> cde;
> } __end", and the third is "abc" and the fourth is "bad".
>
> split obviously cannot be used here, so I use following regex:
> my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);
>
> but it does not work at all. so how can I do with this?


my @lines = $cmds =~ /^\s*__begin(?s:.*?)__end;$|^\s*\S+;$/mg;



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
 
Reply With Quote
 
 
 
 
Dr.Ruud
Guest
Posts: n/a
 
      10-20-2008
bingfeng schreef:

> Assume I have following string:
> my $cmds = <<DOC
> __begin {
> abc;
> def;
> {foo;bar}
> } __end;
> __begin {
> cde;
> } __end;
> abc;
> bad;
> DOC
> ;
>
> I want to split it into an array, the first item is "__begin {
> abc;
> def;
> {foo;bar}
> } __end", the second item is "__begin {
> cde;
> } __end", and the third is "abc" and the fourth is "bad".
>
> split obviously cannot be used here, so I use following regex:
> my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);
>
> but it does not work at all. so how can I do with this?


my @blocks;

for my $re ( qr/__begin.*?__end/s, qr/^[^;]+/m ) {
while ($cmds =~ s/\s*($re)\s*;/" "x ($+[0] - $-[0])/es) {
push @blocks, [ $-[0], $1 ];
}
}

for (sort { $a->[0] <=> $b->[0] } @blocks) {
print "<--\n", $_->[1], "\n-->\n";
}


In the end, $cmds will still have the same length, but will contain only
whitespace (and any unmatched content).

--
Affijn, Ruud

"Gewoon is een tijger."

 
Reply With Quote
 
J黵gen Exner
Guest
Posts: n/a
 
      10-20-2008
bingfeng <> wrote:
>Hello,
>Assume I have following string:
>my $cmds = <<DOC
> __begin {
> abc;
> def;
> {foo;bar}
> } __end;
> __begin {
> cde;
> } __end;
> abc;
> bad;
>DOC
>;
>
>I want to split it into an array, the first item is "__begin {
> abc;
> def;
> {foo;bar}
> } __end", the second item is "__begin {
> cde;
> } __end", and the third is "abc" and the fourth is "bad".


This sounds suspiciously like an X-Y problem to me. Are you reading this
text from a file? If yes, then if a block starts with '__begin{' you can
read that block until the '__end;' token is reached.

And yes, you can use split() if you do it in two steps:
First split on '__end;' . And in the second step repair the now missing
'__end;' for those items, that have a leading '__begin{' and for the
others split again at ';'.

jue
 
Reply With Quote
 
bingfeng
Guest
Posts: n/a
 
      10-20-2008
On 10月20日, 下午7时36分, "John W. Krahn" <some...@example.com> wrote:
> bingfeng wrote:
>
> > Assume I have following string:
> > my $cmds = <<DOC
> > __begin {
> > abc;
> > def;
> > {foo;bar}
> > } __end;
> > __begin {
> > cde;
> > } __end;
> > abc;
> > bad;
> > DOC
> > ;

>
> > I want to split it into an array, the first item is "__begin {
> > abc;
> > def;
> > {foo;bar}
> > } __end", the second item is "__begin {
> > cde;
> > } __end", and the third is "abc" and the fourth is "bad".

>
> > split obviously cannot be used here, so I use following regex:
> > my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

>
> > but it does not work at all. so how can I do with this?

>
> my @lines = $cmds =~ /^\s*__begin(?s:.*?)__end;$|^\s*\S+;$/mg;
>
> John
> --
> Perl isn't a toolbox, but a small machine shop where you
> can special-order certain sorts of tools at low cost and
> in short order. -- Larry Wall- 隐藏被引用文字 -
>
> - 显示引用的文字 -


Thank you, John, it works very well. You help save some hours!
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      10-20-2008
On Mon, 20 Oct 2008 02:42:03 -0700 (PDT), bingfeng <> wrote:

>Hello,
>Assume I have following string:
>my $cmds = <<DOC
> __begin {
> abc;
> def;
> {foo;bar}
> } __end;
> __begin {
> cde;
> } __end;
> abc;
> bad;
>DOC
>;
>
>I want to split it into an array, the first item is "__begin {
> abc;
> def;
> {foo;bar}
> } __end", the second item is "__begin {
> cde;
> } __end", and the third is "abc" and the fourth is "bad".
>
>split obviously cannot be used here, so I use following regex:
>my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

^^
my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);

You were on the right track. [^;] however is first to match all before ';',
which means it grabs the' __begin { .. abc;' then the next, then next.
'__begin.*?__end' is never matched. By including not whitespace, [^\s;] in
the character class, begin and end have a chance.

sln

--------------------

use strict;
use warnings;

my $cmds = <<DOC
__begin {
abc;
def;
{foo;bar}
} __end;
__begin {
cde;
} __end;
abc;
bad;
DOC
;

my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);

for (my $i = 0; $i < @lines; $i++) {
print "\n\$lines[$i] = \n\n\"$lines[$i]\"\n";
}

__END__

output:

$lines[0] =

"__begin {
abc;
def;
{foo;bar}
} __end"

$lines[1] =

"__begin {
cde;
} __end"

$lines[2] =

"abc"

$lines[3] =

"bad"


 
Reply With Quote
 
bingfeng
Guest
Posts: n/a
 
      10-21-2008
On Oct 21, 1:27*am, s...@netherlands.com wrote:
> On Mon, 20 Oct 2008 02:42:03 -0700 (PDT), bingfeng <bfz...@gmail.com> wrote:
> >Hello,
> >Assume I have following string:
> >my $cmds = <<DOC
> > *__begin {
> > * * abc;
> > * * def;
> > * * {foo;bar}
> > *} __end;
> > *__begin {
> > * * cde;
> > *} __end;
> > *abc;
> > *bad;
> >DOC
> >;

>
> >I want to split it into an array, the first item is "__begin {
> > * * abc;
> > * * def;
> > * * {foo;bar}
> > *} __end", the second item *is *"__begin {
> > * * cde;
> > *} __end", and the third is "abc" and the fourth is "bad".

>
> >split obviously cannot be used here, so I use following regex:
> >my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

>
> * * * * * * * * * * * * * * * * * * * * *^^
> my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);
>
> You were on the right track. [^;] however is first to match all before ';',
> which means it grabs the' * __begin { .. abc;' then the next, then next..
> '__begin.*?__end' is never matched. By including not whitespace, [^\s;] in
> the character class, begin and end have a chance.
>

You are right. Thanks for your explanation. My sample is some
oversimple. the standalone sentence may contain other word and space,
with following test message:
my $cmds = <<DOC
__begin {
abc sss;
def;
{foo;bar}
} __end;
__begin {
cde;
} __end;
abc kkk;
bad fde;
DOC
;

you solution gives following Dumper result:
$VAR1 = '__begin {
abc sss;
def;
{foo;bar}
} __end';
$VAR2 = '__begin {
cde;
} __end';
$VAR3 = 'abc';
$VAR4 = 'kkk';
$VAR5 = 'bad';
$VAR6 = 'fde';

that's not what I want. Apart from John's solution, I have no other
solution. Thank you

> sln
>
> --------------------
>
> use strict;
> use warnings;
>
> my $cmds = <<DOC
> * __begin {
> * * *abc;
> * * *def;
> * * *{foo;bar}
> * } __end;
> * __begin {
> * * *cde;
> * } __end;
> * abc;
> * bad;
> DOC
> ;
>
> my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);
>
> for (my $i = 0; $i < @lines; $i++) {
> * * * * print "\n\$lines[$i] = \n\n\"$lines[$i]\"\n";
>
> }
>
> __END__
>
> output:
>
> $lines[0] =
>
> "__begin {
> * * *abc;
> * * *def;
> * * *{foo;bar}
> * } __end"
>
> $lines[1] =
>
> "__begin {
> * * *cde;
> * } __end"
>
> $lines[2] =
>
> "abc"
>
> $lines[3] =
>
> "bad"


 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      10-21-2008
On Mon, 20 Oct 2008 20:59:35 -0700 (PDT), bingfeng <> wrote:

>On Oct 21, 1:27*am, s...@netherlands.com wrote:
>> On Mon, 20 Oct 2008 02:42:03 -0700 (PDT), bingfeng <bfz...@gmail.com> wrote:
>> >Hello,
>> >Assume I have following string:
>> >my $cmds = <<DOC
>> > *__begin {
>> > * * abc;
>> > * * def;
>> > * * {foo;bar}
>> > *} __end;
>> > *__begin {
>> > * * cde;
>> > *} __end;
>> > *abc;
>> > *bad;
>> >DOC
>> >;

>>
>> >I want to split it into an array, the first item is "__begin {
>> > * * abc;
>> > * * def;
>> > * * {foo;bar}
>> > *} __end", the second item *is *"__begin {
>> > * * cde;
>> > *} __end", and the third is "abc" and the fourth is "bad".

>>
>> >split obviously cannot be used here, so I use following regex:
>> >my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

>>
>> * * * * * * * * * * * * * * * * * * * * *^^
>> my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);
>>
>> You were on the right track. [^;] however is first to match all before ';',
>> which means it grabs the' * __begin { .. abc;' then the next, then next.
>> '__begin.*?__end' is never matched. By including not whitespace, [^\s;] in
>> the character class, begin and end have a chance.
>>

>You are right. Thanks for your explanation. My sample is some
>oversimple. the standalone sentence may contain other word and space,
>with following test message:
>my $cmds = <<DOC
> __begin {
> abc sss;
> def;
> {foo;bar}
> } __end;
> __begin {
> cde;
> } __end;
> abc kkk;
> bad fde;
>DOC
>;
>
>you solution gives following Dumper result:
>$VAR1 = '__begin {
> abc sss;
> def;
> {foo;bar}
> } __end';
>$VAR2 = '__begin {
> cde;
> } __end';
>$VAR3 = 'abc';
>$VAR4 = 'kkk';
>$VAR5 = 'bad';
>$VAR6 = 'fde';
>


Thats too bad. You made a good attempt and I gave
you credit by saying you almost had it right the first time.
And the regex was altered slightly from why you yourself tried.

I didn't write a regex for you. Because if I did that, you could
always come back and say for example:

#>You are right. Thanks for your explanation. My sample is some
#>oversimple. the standalone sentence may contain other word and space,
#>with following test message ...

But you didn't say that in the first place.

>that's not what I want. Apart from John's solution, I have no other
>solution.
>

^^^^^^^^^^^^^
Think again ... you just invalidated his regex.

my @lines = $str =~ /^\s*__begin(?s:.*?)__end;$|^\s*\S+;$/mg;

$lines[0] =
" __begin {
abc sss;
def;
{foo;bar}
} __end;"

$lines[1] =
" __begin {
cde;
} __end;"

What are you going to do now?
We're still in the extremely simple stage.
In fact, the more you add, the simpler it gets.

sln

-------------------------------

Version 2

#################
# Misc Parse 2
#################

use strict;
use warnings;

# the old
my $cmd1 = <<DOC1
__begin {
abc;
def;
{foo;bar}
} __end;
__begin {
cde;
} __end;
abc;
bad;
DOC1
;

# the new
my $cmds2 = <<DOC2
__begin {
abc sss;
def;
{foo;bar}
} __end;
__begin {
cde;
} __end;
abc kkk;
bad fde;
DOC2
;

my $str = $cmds2;

my @lines = ($str =~ /\s*(__begin.*?__end|.*?);/sg);

for (my $i = 0; $i < @lines; $i++) {
print "\n\$lines[$i] = \n\n\"$lines[$i]\"\n";
}

__END__

output:

$lines[0] =

"__begin {
abc sss;
def;
{foo;bar}
} __end"

$lines[1] =

"__begin {
cde;
} __end"

$lines[2] =

"abc kkk"

$lines[3] =

"bad fde"


 
Reply With Quote
 
bingfeng
Guest
Posts: n/a
 
      10-21-2008
On Oct 21, 1:46*pm, s...@netherlands.com wrote:
> On Mon, 20 Oct 2008 20:59:35 -0700 (PDT), bingfeng <bfz...@gmail.com> wrote:
> >On Oct 21, 1:27*am, s...@netherlands.com wrote:
> >> On Mon, 20 Oct 2008 02:42:03 -0700 (PDT), bingfeng <bfz...@gmail.com> wrote:
> >> >Hello,
> >> >Assume I have following string:
> >> >my $cmds = <<DOC
> >> > *__begin {
> >> > * * abc;
> >> > * * def;
> >> > * * {foo;bar}
> >> > *} __end;
> >> > *__begin {
> >> > * * cde;
> >> > *} __end;
> >> > *abc;
> >> > *bad;
> >> >DOC
> >> >;

>
> >> >I want to split it into an array, the first item is "__begin {
> >> > * * abc;
> >> > * * def;
> >> > * * {foo;bar}
> >> > *} __end", the second item *is *"__begin {
> >> > * * cde;
> >> > *} __end", and the third is "abc" and the fourth is "bad".

>
> >> >split obviously cannot be used here, so I use following regex:
> >> >my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

>
> >> * * * * * * * * * * * * * * * * * * * * *^^
> >> my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);

>
> >> You were on the right track. [^;] however is first to match all before';',
> >> which means it grabs the' * __begin { .. abc;' then the next, then next.
> >> '__begin.*?__end' is never matched. By including not whitespace, [^\s;] in
> >> the character class, begin and end have a chance.

>
> >You are right. Thanks for your explanation. My sample is some
> >oversimple. the standalone sentence may contain other word and space,
> >with following test message:
> >my $cmds = <<DOC
> > *__begin {
> > * * abc sss;
> > * * def;
> > * * {foo;bar}
> > *} __end;
> > *__begin {
> > * * cde;
> > *} __end;
> > *abc kkk;
> > *bad fde;
> >DOC
> >;

>
> >you solution gives following Dumper result:
> >$VAR1 = '__begin {
> > * * abc sss;
> > * * def;
> > * * {foo;bar}
> > *} __end';
> >$VAR2 = '__begin {
> > * * cde;
> > *} __end';
> >$VAR3 = 'abc';
> >$VAR4 = 'kkk';
> >$VAR5 = 'bad';
> >$VAR6 = 'fde';

>
> Thats too bad. You made a good attempt and I gave
> you credit by saying you almost had it right the first time.
> And the regex was altered slightly from why you yourself tried.
>
> I didn't write a regex for you. Because if I did that, you could
> always come back and say for example:
>
> * #>You are right. Thanks for your explanation. My sample is some
> * #>oversimple. the standalone sentence may contain other word and space,
> * #>with following test message ...
>
> But you didn't say that in the first place.
>
> >that's not what I want. Apart from John's solution, I have no other
> >solution.

>
> * * ^^^^^^^^^^^^^
> Think again ... you just invalidated his regex.
>
> my @lines = $str =~ /^\s*__begin(?s:.*?)__end;$|^\s*\S+;$/mg;
>
> $lines[0] =
> " *__begin {
> * * *abc sss;
> * * *def;
> * * *{foo;bar}
> * } __end;"
>
> $lines[1] =
> " *__begin {
> * * *cde;
> * } __end;"
>
> What are you going to do now?
> We're still in the extremely simple stage.
> In fact, the more you add, the simpler it gets.
>
> sln
>
> -------------------------------
>
> Version 2
>
> #################
> # Misc Parse 2
> #################
>
> use strict;
> use warnings;
>
> # the old
> my $cmd1 = <<DOC1
> * __begin {
> * * *abc;
> * * *def;
> * * *{foo;bar}
> * } __end;
> * __begin {
> * * *cde;
> * } __end;
> * abc;
> * bad;
> DOC1
> ;
>
> # the new
> my $cmds2 = <<DOC2
> * __begin {
> * * *abc sss;
> * * *def;
> * * *{foo;bar}
> * } __end;
> * __begin {
> * * *cde;
> * } __end;
> * abc kkk;
> * bad fde;
> DOC2
> ;
>
> my $str = $cmds2;
>
> my @lines = ($str =~ /\s*(__begin.*?__end|.*?);/sg);
>
> for (my $i = 0; $i < @lines; $i++) {
> * * * * print "\n\$lines[$i] = \n\n\"$lines[$i]\"\n";
>
> }
>
> __END__
>
> output:
>
> $lines[0] =
>
> "__begin {
> * * *abc sss;
> * * *def;
> * * *{foo;bar}
> * } __end"
>
> $lines[1] =
>
> "__begin {
> * * *cde;
> * } __end"
>
> $lines[2] =
>
> "abc kkk"
>
> $lines[3] =
>
> "bad fde"


Wow, I had to admit your regex is simpler, easy to understand and
elegant. I'll study what you said carefully. Anyway, thank you very
much.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to split text into lines? kj Python 5 08-01-2008 05:16 AM
Split Menu into multiple lines? Andy B ASP .Net 0 05-06-2008 03:07 PM
How to import text file and split it up by lines Al Cholic Ruby 4 07-02-2007 07:20 AM
Preserve blank lines when add multiple lines of text to a cell Cah Sableng Javascript 0 04-23-2007 04:46 AM
Why does split operate over multiple lines in the absence of "ms" ? And why doesn't $_ work with split? Sara Perl Misc 6 04-12-2004 09:07 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57