Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Split, variable delimiter

Reply
Thread Tools

Split, variable delimiter

 
 
Heath
Guest
Posts: n/a
 
      02-21-2006
Hello,

I'm using perl v5.8.7.

When $_ is set to something like "\t\t\t4.34\t4.65\t3.25\t9.54\n",
I do the following:

my @line = split ' ';
print "$#line, $line[0]\n";

And get what I expect:

3, 4.34

But when I do this:

my $delim = ' ';
my @line = split $delim;
print "$#line, $line[0]\n";
if ($line[0] eq $_) {print "Equal\n"; }

I get this:

0, 4.34 4.65
3.25 9.54

Equal

Which is not exactly what I'm after. So, why are these two
snippets behaving differently? What can I do to make them
equivalent?

I'm probably missing something obvious. If so, please be nice.

 
Reply With Quote
 
 
 
 
usenet@DavidFilmer.com
Guest
Posts: n/a
 
      02-21-2006
Heath wrote:
> my $delim = ' ';
> my @line = split $delim;


You still need delims around your delim. Try something like /$delim/

--
http://DavidFilmer.com

 
Reply With Quote
 
 
 
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      02-21-2006

Heath wrote:
> Hello,
>
> I'm using perl v5.8.7.
>
> When $_ is set to something like "\t\t\t4.34\t4.65\t3.25\t9.54\n",
> I do the following:
>
> my @line = split ' ';
> print "$#line, $line[0]\n";
>
> And get what I expect:
>
> 3, 4.34
>
> But when I do this:
>
> my $delim = ' ';
> my @line = split $delim;
> print "$#line, $line[0]\n";
> if ($line[0] eq $_) {print "Equal\n"; }
>
> I get this:
>
> 0, 4.34 4.65
> 3.25 9.54
>
> Equal
>
> Which is not exactly what I'm after. So, why are these two
> snippets behaving differently? What can I do to make them
> equivalent?


first, look at: perldoc -f split | more
or, browser-friendly:
http://www.physiol.ox.ac.uk/Computin...ons/split.html

see the part about the special case of split ' '

i would say the best thing to do would be not to use split ' ' at all.
==============
use strict; use warnings;

$_ = "\t\t\t4.34\t4.65\t3.25\t9.54\n";

$_ =~ s/^\s+//;

my @line = split /\s/;
print "$#line, $line[0]\n";

my $delim = '\s';
my @line2 = split /$delim/;
print "$#line2, $line2[0]\n";

 
Reply With Quote
 
Heath
Guest
Posts: n/a
 
      02-21-2006
I've tried using $delim, /$delim/, "$delim", '$delim', and even
"\'$delim\'". All do the same as when using $delim.

 
Reply With Quote
 
Heath
Guest
Posts: n/a
 
      02-21-2006

it_says_BALLS_on_your forehead wrote:
> first, look at: perldoc -f split | more
> or, browser-friendly:
> http://www.physiol.ox.ac.uk/Computin...ons/split.html
>
> see the part about the special case of split ' '


Yes, I read through that before I ever posted. The behavior I'm
after is that of [split ' ']. I don't get that behavior when I pass
the space char to split via a variable. I would simply just like to
know why that is and how I can get that behavior by passing a variable,
if it is possible at all.

> i would say the best thing to do would be not to use split ' ' at all.
> ==============
> use strict; use warnings;
>
> $_ = "\t\t\t4.34\t4.65\t3.25\t9.54\n";
>
> $_ =~ s/^\s+//;
>
> my @line = split /\s/;
> print "$#line, $line[0]\n";
>
> my $delim = '\s';
> my @line2 = split /$delim/;
> print "$#line2, $line2[0]\n";


This works great, but it accomplishes the exact same thing as:

==============
use strict; use warnings;

$_ = "\t\t\t4.34\t4.65\t3.25\t9.54\n";

my @line = split;
print "$#line, $line[0]\n";
==============

All I need is a value to assign to $delim such that a [split $delim]
will give me the same behavior as a [split].

 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      02-21-2006
>>>>> "H" == Heath <(E-Mail Removed)> writes:

H> Yes, I read through that before I ever posted. The behavior I'm
H> after is that of [split ' ']. I don't get that behavior when I pass
H> the space char to split via a variable. I would simply just like to
H> know why that is and how I can get that behavior by passing a variable,
H> if it is possible at all.

rtfm some more:

As a special case, specifying a PATTERN of space
(' ') will split on white space just as "split" with
no arguments does. Thus, "split(' ')" can be used
to emulate awk's default behavior, ...

note that PATTERN is the actual literal passed to split so it can't be a
variable. otherwise how could it tell / / from ' ' from $foo = ' '? this
is a very odd way to get that special behavior as it is inband and very
special cased. if you must have that vs other splits on demand, use a
sub to handle your cased and do 2 different splits based on $foo eq '
'. or use code refs for each split type. many ways to handle it.

H> All I need is a value to assign to $delim such that a [split $delim]
H> will give me the same behavior as a [split].

can't be done. so choose another solution.

uri

--
Uri Guttman ------ http://www.velocityreviews.com/forums/(E-Mail Removed) -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
 
Reply With Quote
 
robic0
Guest
Posts: n/a
 
      02-21-2006
On 21 Feb 2006 10:11:56 -0800, "Heath" <(E-Mail Removed)> wrote:

>Hello,
>
> I'm using perl v5.8.7.
>
> When $_ is set to something like "\t\t\t4.34\t4.65\t3.25\t9.54\n",
> I do the following:
>
> my @line = split ' ';
> print "$#line, $line[0]\n";
>
> And get what I expect:
>
> 3, 4.34
>
> But when I do this:
>
> my $delim = ' ';
> my @line = split $delim;
> print "$#line, $line[0]\n";
> if ($line[0] eq $_) {print "Equal\n"; }
>
> I get this:
>
> 0, 4.34 4.65
> 3.25 9.54
>
> Equal
>
> Which is not exactly what I'm after. So, why are these two
> snippets behaving differently? What can I do to make them
> equivalent?
>
> I'm probably missing something obvious. If so, please be nice.



It is a bug (I mean a feature) of split. According to the docs
the Perl parser seems to look for the single quoted space ' ' and that
differentiates it from a space " " as a pattern. So obviously the single
quote is as significant as the space. However, the 3 character string is
parsed as a first level parse. So you can't even assign $delim = "' '" or
$delim = "" (null string) ... which is a special case as well ass the ' '.

It would appear to be a very useful case since it populates every array position
with a non-whitespace. If you want to run dynamic patterns, the ' ' case will have
to be an exclusion, tested for and hardcoded.

ie: if ($delim ne ' ') {split $delim;} else {split ' '}

Quote:
As a special case, specifying a PATTERN of space (' ') will split on white space
just as split with no arguments does. Thus, split(' ') can be used to emulate awk's
default behavior, whereas split(/ /) will give you as many null initial fields as
there are leading spaces. A split on /\s+/ is like a split(' ') except that any
leading whitespace produces a null first field. A split with no arguments really
does a split(' ', $_) internally.

Some code:

use strict;
use warnings;


$_ = "\t\t\t4.34\t4.65\t3.25\t9.54\n";

my @line = split /' '/;
print "$#line", @line, "\n";
if ($line[0] eq $_) {print "Equal\n"; }

my $dlim = ' ';
@line = split /$dlim/;
print "$#line", @line, "\n";
if ($line[0] eq $_) {print "Equal\n"; }

print "-------------\n";

@line = split ' ';
print "$#line", @line, "\n";
if ($line[0] eq $_) {print "Equal\n"; }

$dlim = '\s+';
@line = split /$dlim/;
print "$#line", @line, "\n";
if ($line[0] eq $_) {print "Equal\n"; }

print "---------\nbut\n";
$dlim = ' ';
@line = split /$dlim/;
print "$#line", @line, "\n";
if ($line[0] eq $_) {print "Equal\n"; }

__END__
Output:
0 4.34 4.65 3.25 9.54

Equal
0 4.34 4.65 3.25 9.54

Equal
-------------
34.344.653.259.54
44.344.653.259.54
---------
but
0 4.34 4.65 3.25 9.54

Equal
 
Reply With Quote
 
robic0
Guest
Posts: n/a
 
      02-21-2006
On Tue, 21 Feb 2006 15:58:06 -0500, Uri Guttman <(E-Mail Removed)> wrote:

>>>>>> "H" == Heath <(E-Mail Removed)> writes:

>
> H> Yes, I read through that before I ever posted. The behavior I'm
> H> after is that of [split ' ']. I don't get that behavior when I pass
> H> the space char to split via a variable. I would simply just like to
> H> know why that is and how I can get that behavior by passing a variable,
> H> if it is possible at all.
>
>rtfm some more:
>
> As a special case, specifying a PATTERN of space
> (' ') will split on white space just as "split" with
> no arguments does. Thus, "split(' ')" can be used
> to emulate awk's default behavior, ...
>
>note that PATTERN is the actual literal passed to split so it can't be a
>variable. otherwise how could it tell / / from ' ' from $foo = ' '? this


I don't know, I would consider this a bug, aka, left out check. Within split, if
the $foo name is passed as a literal name, the contents have to be obtained.
So if $foo = "' '", it should be fairly obvious what the meaning is.

But I don't think some intrinsics work that way. I think as far as the Pattern
in split, the parser looks for a split ' ' or split pattern and internally changes
the call to a different function with different parameters, than any other form of split.
There may be several internal split functions.
Since it has to be parsed anyway, its easier to redirect different "forms" to predefined
functions that handle specific ones. Thereby speeding up the processor.

>is a very odd way to get that special behavior as it is inband and very
>special cased. if you must have that vs other splits on demand, use a
>sub to handle your cased and do 2 different splits based on $foo eq '
>'. or use code refs for each split type. many ways to handle it.
>
> H> All I need is a value to assign to $delim such that a [split $delim]
> H> will give me the same behavior as a [split].
>
>can't be done. so choose another solution.
>
>uri


 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      02-21-2006
>>>>> "r" == robic0 <robic0> writes:


r> I don't know, I would consider this a bug, aka, left out
r> check. Within split, if the $foo name is passed as a literal name,
r> the contents have to be obtained. So if $foo = "' '", it should be
r> fairly obvious what the meaning is.

i consider you a genomic bug.

r> But I don't think some intrinsics work that way. I think as far as
r> the Pattern in split, the parser looks for a split ' ' or split
r> pattern and internally changes the call to a different function
r> with different parameters, than any other form of split. There may
r> be several internal split functions. Since it has to be parsed
r> anyway, its easier to redirect different "forms" to predefined
r> functions that handle specific ones. Thereby speeding up the
r> processor.

speeding up the processor? what kind of crack are you smoking? this
whole discussion has nothing to do with the speed of split. the various
special behaviors of split do not need seperate implmentations.

just another useless reply to a troll,

uri

--
Uri Guttman ------ (E-Mail Removed) -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
 
Reply With Quote
 
robic0
Guest
Posts: n/a
 
      02-22-2006
On Tue, 21 Feb 2006 18:52:51 -0500, Uri Guttman <(E-Mail Removed)> wrote:

>>>>>> "r" == robic0 <robic0> writes:

>
>
> r> I don't know, I would consider this a bug, aka, left out
> r> check. Within split, if the $foo name is passed as a literal name,
> r> the contents have to be obtained. So if $foo = "' '", it should be
> r> fairly obvious what the meaning is.
>
>i consider you a genomic bug.
>
> r> But I don't think some intrinsics work that way. I think as far as
> r> the Pattern in split, the parser looks for a split ' ' or split
> r> pattern and internally changes the call to a different function
> r> with different parameters, than any other form of split. There may
> r> be several internal split functions. Since it has to be parsed
> r> anyway, its easier to redirect different "forms" to predefined
> r> functions that handle specific ones. Thereby speeding up the
> r> processor.
>
>speeding up the processor? what kind of crack are you smoking? this
>whole discussion has nothing to do with the speed of split. the various
>special behaviors of split do not need seperate implmentations.
>

I'm speechless.. You just discounted all compiled and semi-compiled (fixup)
languages. You must think Perl core is written in Perl.

The "Processor" is commonly known as the "engine", the core. Perl follows
that and has multiple core implementations of intrinsics, it does a modified
compile at loadtime and further compiles at runtime. Try compiling
C/C++ code with standard library calls, then look at the assembly.

>just another useless reply to a troll,
>
>uri


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: String.Split with multi character delimiter Kevin Spencer ASP .Net 5 01-21-2004 05:31 PM
Cookies delimiter in ASP.NET Calvin Lai ASP .Net 3 01-19-2004 12:32 AM
Have perl increment a number that shows up before a delimiter john brown Perl 6 10-22-2003 09:18 PM
Delimiter Split Mark Fox ASP .Net 2 08-11-2003 07:19 AM
Javascript intrepreting an apostrophe as a delimiter while displaying a perl variable Jeanne Javascript 3 06-30-2003 02:35 PM



Advertisments