Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > mixed cmp operator for sorting

Reply
Thread Tools

mixed cmp operator for sorting

 
 
Marc Girod
Guest
Posts: n/a
 
      09-22-2013
Hi,

I often have to sort strings with embedded numbers, which gets into the problem that neither cmp nor <=> is appropriate to deal with them.
Making ad-hoc sort functions is nearly trivial, but cumbersome e.g. for one-liners.

I tried thus to write a sort function which would parse out the string parts from the numeric ones.
Here is what I came up with.

-8<---------------
package UCmp;

require Exporter;
@ISA = qw(Exporter);
@EXPORT = qw(ucmp);

sub ucmp {
my $a = eval '$' . caller . '::a';
my $b = eval '$' . caller . '::b';
my @t = $a =~ /(\D*)(\d*)/g; pop @t; pop @t;
my @s = $b =~ /(\D*)(\d*)/g; pop @s; pop @s;
my $l = ((@t <= @s)? @t : @s) / 2;
my $ret;
while ($l--) {
$ret = ((shift @t) cmp (shift @s) or (shift @t or 0) <=> (shift @s or 0));
$l = 0 if $ret;
}
return ($ret or @t <=> @s);
}

1;
-8<-----------------

And a test example:

$ cat bar
#!/usr/bin/perl -w

use feature qw(say);
use UCmp;

say for sort ucmp qw(
a12b34
a2c
b23
a7
a7b
23
);
$ ./bar
23
a2c
a7
a7b
a12b34
b23

Would you care to criticize it?
Thanks,
Marc
 
Reply With Quote
 
 
 
 
Marc Girod
Guest
Posts: n/a
 
      09-22-2013
On Sunday, 22 September 2013 20:07:34 UTC+1, Ben Morrow wrote:

> You would be better off prototyping ucmp ($$), which makes sort pass the
> values to be sorted in @_. This is slightly slower than using $a and $b,
> which is why it isn't the default, but using eval will be *much* slower,
> as well as rather less safe.


OK. I knew of this option, but I wanted to learn about the pros and cons...

> If you must access your caller's variables, you should use symrefs in
> preference to eval:
>
> my ($a, $b) = do {
> no strict "refs";
> my $pkg = caller;
> ${"$pkg\::a"}, ${"$pkg\::b"};
> };


Thank you! You remind me of an advice you already gave me previously in another context!
Now, how would this compare with prototyping the function?

> > my @t = $a =~ /(\D*)(\d*)/g; pop @t; pop @t;
> > my @s = $b =~ /(\D*)(\d*)/g; pop @s; pop @s;

>
> Why the pops?


I found that since my regexp matches an empty string, the 'g' flag will stop after matching one (which is very nice of it), leaving me with one pair of empty strings in every case...
I am just removing them.

> > my $l = ((@t <= @s)? @t : @s) / 2;
> > my $ret;
> > while ($l--) {
> > $ret = ((shift @t) cmp (shift @s) or (shift @t or 0) <=> (shift @s
> > or 0));


> You try string comparison first; this means numerical sections will be
> sorted in string order and the numerical comparison is pointless.


No? I try it on the string token, possibly empty.

> You need to consider what to do if the sections you are comparing are of
> different types (a numeric vs a non-numeric section).


My regexp matching has taken care of this...

> You should use || for or-as-an-expression; 'or' is for flow control.


OK. The only result there would be to change the precedence, allowing me to drop one set of parentheses...?

> > $l = 0 if $ret;
> > }
> > return ($ret or @t <=> @s);


> I don't understand what $l is doing for you here. AFAICS these two
> strings


> a1b2z9
> a1c3z9


> will compare equal; is that what you want?


$l is initialised to the length of the shorter string (in token pairs).
Your two strings do sort right...

$ perl -MUCmp -lE 'print for sort ucmp qw(a1b2z9 a1c3z9)'
a1b2z9
a1c3z9

The first pass of the loop will handle the two (a, 1); they compare doubly equal, but since $l is not down to 0 yet, the next pass will handle (b, 2) and (c, 3), and there the comparison of b and c will suffice:
$l will be forcibly set to 0, and since $ret is non null, the original lengths will not need to be compared.

Marc
 
Reply With Quote
 
 
 
 
Marc Girod
Guest
Posts: n/a
 
      09-22-2013
On Sunday, 22 September 2013 22:17:34 UTC+1, Ben Morrow wrote:

> You'd have to benchmark it to see, but I would expect that all using $a
> and $b saves is the cost of pushing the values on the stack and popping
> them off again, which is exactly what that 'do' does. So I would expect
> this to be (slightly) slower than the prototype.


Fair enough, and understood.

> Ah yes; I missed that this could match the empty string. I might rather
> modify the pattern so it doesn't, though that's not terribly
> straightforward; something like
>
> /(?=.)(\D*)(\d*)/gs
>
> is probably as simple as it will go.


OK. I see this is better.
Why do you use the 's' modifier?
To sort multiline items?...

> In that case: there's no need to count the loop, since Perl arrays know
> how long they are:
>
> while (@s && @t) {
> $ret = ...
> and last;
> }


OK. This one I get.

> or


> my $ret = ...;
> $ret and return $ret;


But this one I don't...
I have now:

return $ret || @t <=> @s;

> (once again I ask myself why 'my' variables aren't introduced until the
> end of the statement...)


And neither this comment...
Thanks!
Marc
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      09-23-2013
>>>>> "MG" == Marc Girod <(E-Mail Removed)> writes:

MG> Hi, I often have to sort strings with embedded numbers, which gets
MG> into the problem that neither cmp nor <=> is appropriate to deal
MG> with them. Making ad-hoc sort functions is nearly trivial, but
MG> cumbersome e.g. for one-liners.

have you looked at Sort::Maker on cpan? you can make the sort elsewhere
and call the single code ref to sort stuff. you can even run it offline
and get the source for the sorter to use in your code.

MG> sub ucmp {
MG> my $a = eval '$' . caller . '::a';
MG> my $b = eval '$' . caller . '::b';
MG> my @t = $a =~ /(\D*)(\d*)/g; pop @t; pop @t;
MG> my @s = $b =~ /(\D*)(\d*)/g; pop @s; pop @s;

that is redundant and prone to error. that is one area sort::maker
helps. you only need to describe how to extract each key from the
elements to be sorted. the module does the rest.

uri
 
Reply With Quote
 
$Bill
Guest
Posts: n/a
 
      09-23-2013
On 9/22/2013 11:15, Marc Girod wrote:
> Hi,
>
> I often have to sort strings with embedded numbers, which gets into the problem that neither cmp nor <=> is appropriate to deal with them.
> Making ad-hoc sort functions is nearly trivial, but cumbersome e.g. for one-liners.
>
> I tried thus to write a sort function which would parse out the string parts from the numeric ones.
> Here is what I came up with.

....
> sub ucmp {


> my $a = eval '$' . caller . '::a';
> my $b = eval '$' . caller . '::b';
> my @t = $a =~ /(\D*)(\d*)/g; pop @t; pop @t;
> my @s = $b =~ /(\D*)(\d*)/g; pop @s; pop @s;
> my $l = ((@t <= @s)? @t : @s) / 2;


> my $ret;
> while ($l--) {
> $ret = ((shift @t) cmp (shift @s) or (shift @t or 0) <=> (shift @s or 0));


I might use:
$ret = ((shift @t || '') cmp (shift @s || '') or (shift @t || 0) <=> (shift @s || 0));
or:
$ret = (shift @t || '') cmp (shift @s || '');
$ret = (shift @t || 0) <=> (shift @s || 0) if not $ret;

> $l = 0 if $ret;


or:
last if $ret;

> }
> return ($ret or @t <=> @s);
> }


Interesting sub - not sure where it would come in handy though
with that specific kind of data string(s) [mixed alpha/numeric].






 
Reply With Quote
 
Tim McDaniel
Guest
Posts: n/a
 
      09-23-2013
In article <(E-Mail Removed)>,
Marc Girod <(E-Mail Removed)> wrote:
>say for sort ucmp qw(
> a12b34
> a2c
> b23
> a7
> a7b
> 23
> );
>$ ./bar
>23
>a2c
>a7
>a7b
>a12b34
>b23
>
>Would you care to criticize it?


You don't document what the result should be. I can make a guess from
that sample data, but I'd rather not guess; I'd rather have a spec
to verify results.

--
Tim McDaniel, http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      09-23-2013
Marc Girod wrote:
> Hi,
>
> I often have to sort strings with embedded numbers, which gets into the problem that neither cmp nor<=> is appropriate to deal with them.
> Making ad-hoc sort functions is nearly trivial, but cumbersome e.g. for one-liners.
>
> I tried thus to write a sort function which would parse out the string parts from the numeric ones.
> Here is what I came up with.
>
> -8<---------------
> package UCmp;
>
> require Exporter;
> @ISA = qw(Exporter);
> @EXPORT = qw(ucmp);
>
> sub ucmp {
> my $a = eval '$' . caller . '::a';
> my $b = eval '$' . caller . '::b';
> my @t = $a =~ /(\D*)(\d*)/g; pop @t; pop @t;
> my @s = $b =~ /(\D*)(\d*)/g; pop @s; pop @s;


Better as:

my @t = $a =~ /\D+|\d+/g;
my @s = $b =~ /\D+|\d+/g;

Avoids zero length strings.



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
 
Reply With Quote
 
Charles DeRykus
Guest
Posts: n/a
 
      09-23-2013
On 9/22/2013 3:38 PM, Ben Morrow wrote:
> ...
>
> I would like to be able to write
>
> my $ret = ...
> and return $ret;
>
> but I can't, because Perl does not let you refer to a variable in the
> same statement as it was declared. This restriction is entirely
> artificial (it would be easier for perl to bring $ret into scope
> immediately) and I consider it unnecessary and unhelpful, for reasons
> I've explained before and so won't bore people by explaining again.
>


Of course it currently works fine with a previous $ret in scope.
There's no error but the sub returns "foo" instead of "bar":

{ my $ret="foo"; sub {my $ret="bar" and return $ret} }

A side-benefit would be this scope surprise would be eliminated because
the later $ret would be seen.

But, there's always some catch: you'd still need: no warnings 'syntax'
though in order to quiet: "Found = in conditional, should be ==".

--
Charles DeRykus

p.s. A global would eliminate the problem but has a considerable yuck
factor..

 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      09-23-2013
Ben Morrow <(E-Mail Removed)> writes:

[...]

> Maybe I should write the whole thing out:
>
> while (@s && @t) {
> my $ret = ...;
> $ret and return $ret;
> }
> return @t <=> @s;
>
> Rather than using 'last' to get out of the loop and using $ret to carry
> the return value out of the loop, simply return immediately from inside
> the loop. Then, if the loop ends and the sub hasn't returned yet, do the
> final comparison. This avoids having to have $ret's scope extend over
> every iteration of the loop and beyond, which I consider a Good Thing.
> (Limiting the scope of variables is always a Good Thing.)


Subroutines which are so complicated that people start to loose track of
which variables were being used in what places are 'always a bad thing'
and this solution is not to create more variables but the split the
sequential tapeworm into multiple subroutines (with individual sets of
variables)[*].
[*] I keep being amazed at this idea that the solution to "It is too
complicated already!" must be "Make it more complicated!" (and maybe,
add some neat warnings to help the poor guy who can stop out-clevering
himself in code ...).

>> > (once again I ask myself why 'my' variables aren't introduced until the
>> > end of the statement...)

>>
>> And neither this comment...

>
> I would like to be able to write
>
> my $ret = ...
> and return $ret;
>
> but I can't, because Perl does not let you refer to a variable in the
> same statement as it was declared. This restriction is entirely
> artificial (it would be easier for perl to bring $ret into scope
> immediately) and I consider it unnecessary and unhelpful, for reasons
> I've explained before and so won't bore people by explaining again.


The reason for this is that code like this

my $v = 12;
{
my $v = $v;
print("$v\n");
}

works as intended, ie, the inner $v gets initialized to the value of the
outer $v. Granted, this is a silly example, but I've already used
similar constructs in real code.
 
Reply With Quote
 
Marc Girod
Guest
Posts: n/a
 
      09-23-2013
On Sunday, 22 September 2013 23:38:35 UTC+1, Ben Morrow wrote:

> /s makes . match newline. Without it the pattern will fail to match if a
> non-digit section starts with a newline.


Indeed.

> Maybe I should write the whole thing out:


Thanks.

> I would like to be able to write

....
> I've explained before and so won't bore people by explaining again.


Thanks again.

Marc
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
GCC warnings for comma operator W Karas C Programming 16 04-12-2013 08:03 AM
Python3: sorting image objects according to a cmp function Poor Yorick Python 0 12-03-2008 02:06 PM
Replacing cmp with key for sorting George Sakkis Python 8 11-03-2008 09:40 PM
cmp and sorting non-symmetric types Adam Olsen Python 5 11-14-2007 09:21 PM
no cmp field defined in cmp ejb Andrea Sansottera Java 0 07-16-2004 02:24 PM



Advertisments