On Oct 21, 1:46*pm, s...@netherlands.com wrote:
> On Mon, 20 Oct 2008 20:59:35 -0700 (PDT), bingfeng <bfz...@gmail.com> wrote:
> >On Oct 21, 1:27*am, s...@netherlands.com wrote:
> >> On Mon, 20 Oct 2008 02:42:03 -0700 (PDT), bingfeng <bfz...@gmail.com> wrote:
> >> >Hello,
> >> >Assume I have following string:
> >> >my $cmds = <<DOC
> >> > *__begin {
> >> > * * abc;
> >> > * * def;
> >> > * * {foo;bar}
> >> > *} __end;
> >> > *__begin {
> >> > * * cde;
> >> > *} __end;
> >> > *abc;
> >> > *bad;
> >> >DOC
> >> >;
>
> >> >I want to split it into an array, the first item is "__begin {
> >> > * * abc;
> >> > * * def;
> >> > * * {foo;bar}
> >> > *} __end", the second item *is *"__begin {
> >> > * * cde;
> >> > *} __end", and the third is "abc" and the fourth is "bad".
>
> >> >split obviously cannot be used here, so I use following regex:
> >> >my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);
>
> >> * * * * * * * * * * * * * * * * * * * * *^^
> >> my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);
>
> >> You were on the right track. [^;] however is first to match all before';',
> >> which means it grabs the' * __begin { .. abc;' then the next, then next.
> >> '__begin.*?__end' is never matched. By including not whitespace, [^\s;] in
> >> the character class, begin and end have a chance.
>
> >You are right. Thanks for your explanation. My sample is some
> >oversimple. the standalone sentence may contain other word and space,
> >with following test message:
> >my $cmds = <<DOC
> > *__begin {
> > * * abc sss;
> > * * def;
> > * * {foo;bar}
> > *} __end;
> > *__begin {
> > * * cde;
> > *} __end;
> > *abc kkk;
> > *bad fde;
> >DOC
> >;
>
> >you solution gives following Dumper result:
> >$VAR1 = '__begin {
> > * * abc sss;
> > * * def;
> > * * {foo;bar}
> > *} __end';
> >$VAR2 = '__begin {
> > * * cde;
> > *} __end';
> >$VAR3 = 'abc';
> >$VAR4 = 'kkk';
> >$VAR5 = 'bad';
> >$VAR6 = 'fde';
>
> Thats too bad. You made a good attempt and I gave
> you credit by saying you almost had it right the first time.
> And the regex was altered slightly from why you yourself tried.
>
> I didn't write a regex for you. Because if I did that, you could
> always come back and say for example:
>
> * #>You are right. Thanks for your explanation. My sample is some
> * #>oversimple. the standalone sentence may contain other word and space,
> * #>with following test message ...
>
> But you didn't say that in the first place.
>
> >that's not what I want. Apart from John's solution, I have no other
> >solution.
>
> * * ^^^^^^^^^^^^^
> Think again ... you just invalidated his regex.
>
> my @lines = $str =~ /^\s*__begin(?s:.*?)__end;$|^\s*\S+;$/mg;
>
> $lines[0] =
> " *__begin {
> * * *abc sss;
> * * *def;
> * * *{foo;bar}
> * } __end;"
>
> $lines[1] =
> " *__begin {
> * * *cde;
> * } __end;"
>
> What are you going to do now?
> We're still in the extremely simple stage.
> In fact, the more you add, the simpler it gets.
>
> sln
>
> -------------------------------
>
> Version 2
>
> #################
> # Misc Parse 2
> #################
>
> use strict;
> use warnings;
>
> # the old
> my $cmd1 = <<DOC1
> * __begin {
> * * *abc;
> * * *def;
> * * *{foo;bar}
> * } __end;
> * __begin {
> * * *cde;
> * } __end;
> * abc;
> * bad;
> DOC1
> ;
>
> # the new
> my $cmds2 = <<DOC2
> * __begin {
> * * *abc sss;
> * * *def;
> * * *{foo;bar}
> * } __end;
> * __begin {
> * * *cde;
> * } __end;
> * abc kkk;
> * bad fde;
> DOC2
> ;
>
> my $str = $cmds2;
>
> my @lines = ($str =~ /\s*(__begin.*?__end|.*?);/sg);
>
> for (my $i = 0; $i < @lines; $i++) {
> * * * * print "\n\$lines[$i] = \n\n\"$lines[$i]\"\n";
>
> }
>
> __END__
>
> output:
>
> $lines[0] =
>
> "__begin {
> * * *abc sss;
> * * *def;
> * * *{foo;bar}
> * } __end"
>
> $lines[1] =
>
> "__begin {
> * * *cde;
> * } __end"
>
> $lines[2] =
>
> "abc kkk"
>
> $lines[3] =
>
> "bad fde"
Wow, I had to admit your regex is simpler, easy to understand and
elegant. I'll study what you said carefully. Anyway, thank you very
much.
|