Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Ruby (http://www.velocityreviews.com/forums/f66-ruby.html)
-   -   [bug] String#split returns extra empty string (http://www.velocityreviews.com/forums/t814880-bug-string-split-returns-extra-empty-string.html)

Simon Strandgaard 05-31-2004 07:44 AM

[bug] String#split returns extra empty string
 
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]

--
Simon Strandgaard



Simon Strandgaard 05-31-2004 08:46 AM

Re: [bug] String#split returns extra empty string
 
Simon Strandgaard wrote:
> While extending my own regexp-engine with a split method,
> I discovered something odd about Ruby's split.
>
> irb(main):001:0> 'ab1ab'.split(/\D+/)
> => ["", "1"]
>
> Its asymmetric, it has a special case for eliminating
> the last empty string.. but apparently not the first empty string.
>
> I would have expected above to be symmetric, and output:
> => ["1"]
>


[10 minutes of experimenting later]
I wasn't aware that Ruby inserts subcaptures this way.

irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
=> ["", "ab", "2cd3"]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

--
Simon Strandgaard



Robert Klemme 05-31-2004 11:59 AM

Re: [bug] String#split returns extra empty string
 

"Simon Strandgaard" <neoneye@adslhome.dk> schrieb im Newsbeitrag
news:20040531104155.074a42b0.neoneye@adslhome.dk.. .
> Simon Strandgaard wrote:
> > While extending my own regexp-engine with a split method,
> > I discovered something odd about Ruby's split.
> >
> > irb(main):001:0> 'ab1ab'.split(/\D+/)
> > => ["", "1"]
> >
> > Its asymmetric, it has a special case for eliminating
> > the last empty string.. but apparently not the first empty string.
> >
> > I would have expected above to be symmetric, and output:
> > => ["1"]
> >

>
> [10 minutes of experimenting later]
> I wasn't aware that Ruby inserts subcaptures this way.
>
> irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
> => ["", "ab", "2cd3"]
>
> Because of subcapture insertion, it make sense to keep the
> first empty string.
>
> I withdraw this bug-report.


But what about:

>> 'ab'.split(/\D+/)

=> []

You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.

robert


Simon Strandgaard 05-31-2004 12:09 PM

Re: [bug] String#split returns extra empty string
 
"Robert Klemme" <bob.news@gmx.net> wrote:
> "Simon Strandgaard" <neoneye@adslhome.dk> schrieb im Newsbeitrag
> news:20040531104155.074a42b0.neoneye@adslhome.dk.. .
> > Simon Strandgaard wrote:
> > > While extending my own regexp-engine with a split method,
> > > I discovered something odd about Ruby's split.
> > >
> > > irb(main):001:0> 'ab1ab'.split(/\D+/)
> > > => ["", "1"]
> > >
> > > Its asymmetric, it has a special case for eliminating
> > > the last empty string.. but apparently not the first empty string.
> > >
> > > I would have expected above to be symmetric, and output:
> > > => ["1"]
> > >

> >
> > [10 minutes of experimenting later]
> > I wasn't aware that Ruby inserts subcaptures this way.
> >
> > irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
> > => ["", "ab", "2cd3"]
> >
> > Because of subcapture insertion, it make sense to keep the
> > first empty string.
> >
> > I withdraw this bug-report.

>
> But what about:
>
> >> 'ab'.split(/\D+/)

> => []
>
> You would at least expect one empty string in the result since there is at
> least one separator. This strikes me as odd.
>


Guy Decoux very recently explained that to me.

When split has no limit, it wipes empty strings.

In your case you would have expected it to output [""].. but
because its an empty-string in the tail.. it gets wiped.

def split(pattern, limit=0)
...
unless limit # lets wipe tailing elements which are empty
result.pop while result.size > 0 and result.last.empty?
end
result
end

--
Simon Strandgaard



Robert Klemme 05-31-2004 08:54 PM

Re: [bug] String#split returns extra empty string
 

"Simon Strandgaard" <neoneye@adslhome.dk> schrieb im Newsbeitrag
news:20040531140451.3abb4fb2.neoneye@adslhome.dk.. .
> "Robert Klemme" <bob.news@gmx.net> wrote:
> > "Simon Strandgaard" <neoneye@adslhome.dk> schrieb im Newsbeitrag
> > news:20040531104155.074a42b0.neoneye@adslhome.dk.. .
> > > Simon Strandgaard wrote:
> > > > While extending my own regexp-engine with a split method,
> > > > I discovered something odd about Ruby's split.
> > > >
> > > > irb(main):001:0> 'ab1ab'.split(/\D+/)
> > > > => ["", "1"]
> > > >
> > > > Its asymmetric, it has a special case for eliminating
> > > > the last empty string.. but apparently not the first empty string.
> > > >
> > > > I would have expected above to be symmetric, and output:
> > > > => ["1"]
> > > >
> > >
> > > [10 minutes of experimenting later]
> > > I wasn't aware that Ruby inserts subcaptures this way.
> > >
> > > irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
> > > => ["", "ab", "2cd3"]
> > >
> > > Because of subcapture insertion, it make sense to keep the
> > > first empty string.
> > >
> > > I withdraw this bug-report.

> >
> > But what about:
> >
> > >> 'ab'.split(/\D+/)

> > => []
> >
> > You would at least expect one empty string in the result since there is

at
> > least one separator. This strikes me as odd.
> >

>
> Guy Decoux very recently explained that to me.
>
> When split has no limit, it wipes empty strings.
>
> In your case you would have expected it to output [""].. but
> because its an empty-string in the tail.. it gets wiped.
>
> def split(pattern, limit=0)
> ...
> unless limit # lets wipe tailing elements which are empty
> result.pop while result.size > 0 and result.last.empty?
> end
> result
> end


But I though it will strip trailing empty strings - what about the leading
empty string in my example? I'd expect that to be preserved.

Hm...

robert


Simon Strandgaard 05-31-2004 10:48 PM

Re: [bug] String#split returns extra empty string
 
Robert Klemme wrote:
> But I though it will strip trailing empty strings - what about the leading
> empty string in my example? I'd expect that to be preserved.
>


Let take another example both with leading and tailing empty strings.

irb(main):005:0> '34ab34'.split(/\d+/, 10)
=> ["", "ab", ""]
irb(main):006:0> '34ab34'.split(/\d+/)
=> ["", "ab"]


When no limit are specified, Ruby wipes the tailing empty strings,
until it reaches a non-empty string.


In your case there are zero non-empty strings.. so Ruby wipes everything.

irb(main):001:0> 'ab'.split(/\D+/)
=> []
irb(main):002:0> 'ab'.split(/\D+/, 10)
=> ["", ""]


FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

--
Simon Strandgaard



David Alan Black 06-01-2004 03:21 AM

Re: [bug] String#split returns extra empty string
 
Hi --

Simon Strandgaard <neoneye@adslhome.dk> writes:

> FYI: I have no idea when this wiping empty tail elements are useful.
> Any ideas ?


Maybe a case like:

irb(main):006:0> "one two three ".split(" ")
=> ["one", "two", "three"]

(though there you don't need an argument to split at all I guess) or
something like:

irb(main):016:0> "one!two!three!".split("!")
=> ["one", "two", "three"]


David

--
David A. Black
dblack@wobblini.net

Florian Gross 06-01-2004 01:28 PM

Re: [bug] String#split returns extra empty string
 
David Alan Black wrote:
> Hi --


Moin!

>>FYI: I have no idea when this wiping empty tail elements are useful.
>>Any ideas ?

>
> Maybe a case like:
>
> irb(main):006:0> "one two three ".split(" ")
> => ["one", "two", "three"]
>
> (though there you don't need an argument to split at all I guess) or
> something like:
>
> irb(main):016:0> "one!two!three!".split("!")
> => ["one", "two", "three"]


Hm, I think that it causes more trouble than it's worth. It's very easy
to remove empty elements anyway:

"one!two!three!".split("!").reject { |item| item.empty? }

Maybe it would be better to create a reject_at_end/at_start or something
similar?

Regards,
Florian Gross

David A. Black 06-01-2004 01:52 PM

Re: [bug] String#split returns extra empty string
 
Hi --

On Tue, 1 Jun 2004, Florian Gross wrote:

> David Alan Black wrote:
> > Hi --

>
> Moin!
>
> >>FYI: I have no idea when this wiping empty tail elements are useful.
> >>Any ideas ?

> >
> > Maybe a case like:
> >
> > irb(main):006:0> "one two three ".split(" ")
> > => ["one", "two", "three"]
> >
> > (though there you don't need an argument to split at all I guess) or
> > something like:
> >
> > irb(main):016:0> "one!two!three!".split("!")
> > => ["one", "two", "three"]

>
> Hm, I think that it causes more trouble than it's worth.


I'm not sure what you mean; what trouble does it cause?

> It's very easy to remove empty elements anyway:
>
> "one!two!three!".split("!").reject { |item| item.empty? }


It's even easier than that :-)

"one!two!three!".split("!").grep(/\S/)

though I'm still not sure what's undesireable about having split do
different things.

> Maybe it would be better to create a reject_at_end/at_start or something
> similar?


That seems like an awfully specific case for a whole separate method.
(I admit, though, that I'm somewhat conservative about proliferation
of methods :-)


David

--
David A. Black
dblack@wobblini.net



All times are GMT. The time now is 08:11 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.