![]() |
[bug] String#split returns extra empty string
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split. irb(main):001:0> 'ab1ab'.split(/\D+/) => ["", "1"] Its asymmetric, it has a special case for eliminating the last empty string.. but apparently not the first empty string. I would have expected above to be symmetric, and output: => ["1"] -- Simon Strandgaard |
Re: [bug] String#split returns extra empty string
Simon Strandgaard wrote:
> While extending my own regexp-engine with a split method, > I discovered something odd about Ruby's split. > > irb(main):001:0> 'ab1ab'.split(/\D+/) > => ["", "1"] > > Its asymmetric, it has a special case for eliminating > the last empty string.. but apparently not the first empty string. > > I would have expected above to be symmetric, and output: > => ["1"] > [10 minutes of experimenting later] I wasn't aware that Ruby inserts subcaptures this way. irb(main):001:0> "ab2cd3".split(/(\D+)/, 2) => ["", "ab", "2cd3"] Because of subcapture insertion, it make sense to keep the first empty string. I withdraw this bug-report. -- Simon Strandgaard |
Re: [bug] String#split returns extra empty string
"Simon Strandgaard" <neoneye@adslhome.dk> schrieb im Newsbeitrag news:20040531104155.074a42b0.neoneye@adslhome.dk.. . > Simon Strandgaard wrote: > > While extending my own regexp-engine with a split method, > > I discovered something odd about Ruby's split. > > > > irb(main):001:0> 'ab1ab'.split(/\D+/) > > => ["", "1"] > > > > Its asymmetric, it has a special case for eliminating > > the last empty string.. but apparently not the first empty string. > > > > I would have expected above to be symmetric, and output: > > => ["1"] > > > > [10 minutes of experimenting later] > I wasn't aware that Ruby inserts subcaptures this way. > > irb(main):001:0> "ab2cd3".split(/(\D+)/, 2) > => ["", "ab", "2cd3"] > > Because of subcapture insertion, it make sense to keep the > first empty string. > > I withdraw this bug-report. But what about: >> 'ab'.split(/\D+/) => [] You would at least expect one empty string in the result since there is at least one separator. This strikes me as odd. robert |
Re: [bug] String#split returns extra empty string
"Robert Klemme" <bob.news@gmx.net> wrote:
> "Simon Strandgaard" <neoneye@adslhome.dk> schrieb im Newsbeitrag > news:20040531104155.074a42b0.neoneye@adslhome.dk.. . > > Simon Strandgaard wrote: > > > While extending my own regexp-engine with a split method, > > > I discovered something odd about Ruby's split. > > > > > > irb(main):001:0> 'ab1ab'.split(/\D+/) > > > => ["", "1"] > > > > > > Its asymmetric, it has a special case for eliminating > > > the last empty string.. but apparently not the first empty string. > > > > > > I would have expected above to be symmetric, and output: > > > => ["1"] > > > > > > > [10 minutes of experimenting later] > > I wasn't aware that Ruby inserts subcaptures this way. > > > > irb(main):001:0> "ab2cd3".split(/(\D+)/, 2) > > => ["", "ab", "2cd3"] > > > > Because of subcapture insertion, it make sense to keep the > > first empty string. > > > > I withdraw this bug-report. > > But what about: > > >> 'ab'.split(/\D+/) > => [] > > You would at least expect one empty string in the result since there is at > least one separator. This strikes me as odd. > Guy Decoux very recently explained that to me. When split has no limit, it wipes empty strings. In your case you would have expected it to output [""].. but because its an empty-string in the tail.. it gets wiped. def split(pattern, limit=0) ... unless limit # lets wipe tailing elements which are empty result.pop while result.size > 0 and result.last.empty? end result end -- Simon Strandgaard |
Re: [bug] String#split returns extra empty string
"Simon Strandgaard" <neoneye@adslhome.dk> schrieb im Newsbeitrag news:20040531140451.3abb4fb2.neoneye@adslhome.dk.. . > "Robert Klemme" <bob.news@gmx.net> wrote: > > "Simon Strandgaard" <neoneye@adslhome.dk> schrieb im Newsbeitrag > > news:20040531104155.074a42b0.neoneye@adslhome.dk.. . > > > Simon Strandgaard wrote: > > > > While extending my own regexp-engine with a split method, > > > > I discovered something odd about Ruby's split. > > > > > > > > irb(main):001:0> 'ab1ab'.split(/\D+/) > > > > => ["", "1"] > > > > > > > > Its asymmetric, it has a special case for eliminating > > > > the last empty string.. but apparently not the first empty string. > > > > > > > > I would have expected above to be symmetric, and output: > > > > => ["1"] > > > > > > > > > > [10 minutes of experimenting later] > > > I wasn't aware that Ruby inserts subcaptures this way. > > > > > > irb(main):001:0> "ab2cd3".split(/(\D+)/, 2) > > > => ["", "ab", "2cd3"] > > > > > > Because of subcapture insertion, it make sense to keep the > > > first empty string. > > > > > > I withdraw this bug-report. > > > > But what about: > > > > >> 'ab'.split(/\D+/) > > => [] > > > > You would at least expect one empty string in the result since there is at > > least one separator. This strikes me as odd. > > > > Guy Decoux very recently explained that to me. > > When split has no limit, it wipes empty strings. > > In your case you would have expected it to output [""].. but > because its an empty-string in the tail.. it gets wiped. > > def split(pattern, limit=0) > ... > unless limit # lets wipe tailing elements which are empty > result.pop while result.size > 0 and result.last.empty? > end > result > end But I though it will strip trailing empty strings - what about the leading empty string in my example? I'd expect that to be preserved. Hm... robert |
Re: [bug] String#split returns extra empty string
Robert Klemme wrote:
> But I though it will strip trailing empty strings - what about the leading > empty string in my example? I'd expect that to be preserved. > Let take another example both with leading and tailing empty strings. irb(main):005:0> '34ab34'.split(/\d+/, 10) => ["", "ab", ""] irb(main):006:0> '34ab34'.split(/\d+/) => ["", "ab"] When no limit are specified, Ruby wipes the tailing empty strings, until it reaches a non-empty string. In your case there are zero non-empty strings.. so Ruby wipes everything. irb(main):001:0> 'ab'.split(/\D+/) => [] irb(main):002:0> 'ab'.split(/\D+/, 10) => ["", ""] FYI: I have no idea when this wiping empty tail elements are useful. Any ideas ? -- Simon Strandgaard |
Re: [bug] String#split returns extra empty string
Hi --
Simon Strandgaard <neoneye@adslhome.dk> writes: > FYI: I have no idea when this wiping empty tail elements are useful. > Any ideas ? Maybe a case like: irb(main):006:0> "one two three ".split(" ") => ["one", "two", "three"] (though there you don't need an argument to split at all I guess) or something like: irb(main):016:0> "one!two!three!".split("!") => ["one", "two", "three"] David -- David A. Black dblack@wobblini.net |
Re: [bug] String#split returns extra empty string
David Alan Black wrote:
> Hi -- Moin! >>FYI: I have no idea when this wiping empty tail elements are useful. >>Any ideas ? > > Maybe a case like: > > irb(main):006:0> "one two three ".split(" ") > => ["one", "two", "three"] > > (though there you don't need an argument to split at all I guess) or > something like: > > irb(main):016:0> "one!two!three!".split("!") > => ["one", "two", "three"] Hm, I think that it causes more trouble than it's worth. It's very easy to remove empty elements anyway: "one!two!three!".split("!").reject { |item| item.empty? } Maybe it would be better to create a reject_at_end/at_start or something similar? Regards, Florian Gross |
Re: [bug] String#split returns extra empty string
Hi --
On Tue, 1 Jun 2004, Florian Gross wrote: > David Alan Black wrote: > > Hi -- > > Moin! > > >>FYI: I have no idea when this wiping empty tail elements are useful. > >>Any ideas ? > > > > Maybe a case like: > > > > irb(main):006:0> "one two three ".split(" ") > > => ["one", "two", "three"] > > > > (though there you don't need an argument to split at all I guess) or > > something like: > > > > irb(main):016:0> "one!two!three!".split("!") > > => ["one", "two", "three"] > > Hm, I think that it causes more trouble than it's worth. I'm not sure what you mean; what trouble does it cause? > It's very easy to remove empty elements anyway: > > "one!two!three!".split("!").reject { |item| item.empty? } It's even easier than that :-) "one!two!three!".split("!").grep(/\S/) though I'm still not sure what's undesireable about having split do different things. > Maybe it would be better to create a reject_at_end/at_start or something > similar? That seems like an awfully specific case for a whole separate method. (I admit, though, that I'm somewhat conservative about proliferation of methods :-) David -- David A. Black dblack@wobblini.net |
| All times are GMT. The time now is 04:22 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.