Velocity Reviews > Multiple "cmp"s chained one after another

Multiple "cmp"s chained one after another

Volker Grabsch
Guest
Posts: n/a

 05-14-2005
Hello!

Ich just found a very nice 'pythonic' solution for an often appearing
problem. I didn't find it documented anywhere, so I'm posting it here.
If this isn't new in any way, I'd really like to get to know it.

Example problem:
I have some "datetime" objects and want to sort them, as here:

birthdays = [d1,d2,d3,d4]
birthdays.sort()

However, I don't want to sort them the default way. These are birthdays,
so only the month and day do matter, not the year. E.g.:

2003-01-01 should be smaller than 1984-05-01

So I have to write the comparison on my own, e.g.

def cmp_birthdays(d1,d2):
if d1.month > d2.month: return 1
if d1.month < d2.month: return -1
if d1.day > d2.day: return 1
if d1.day < d2.day: return -1
return 0

...
birthdays.sort(cmp_birthdays)

This implementation of cmp_birthdays is very ugly. Image you want to
chain more than 2 values in that "cmp_birthdays". I also want to use the
builtin "cmp" function, not ">" and "<".

After thinking some minutes about it, I found a very nice solution:
I have some "cmp"s one aftter another. If one if them return 1 oder -1,
it sould be returned. If it returns 0, the next "cmp" is used. In other
words: I have a sequence of numbers, and want to get the first one that
is not 0. (or return 0, if all numbers were 0)

But this is exactly what the "or" operator does, due to short-circuit
evaluation. In this example, that means:

def cmp_bithdays(d1,d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

The generic pattern is:

return cmp(...) or cmp (...) or cmp(...) or ...

I'm not sure whether this pattern is already a "common recipe", but
I found it to be a very nice idea.

Any opinions?

Greets,

Volker

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\} \right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

vincent wehren
Guest
Posts: n/a

 05-14-2005

"Volker Grabsch" <(E-Mail Removed)> schrieb im
Newsbeitrag news:d64io9$s6o$05$(E-Mail Removed)-online.com... | Hello! | | Ich just found a very nice 'pythonic' solution for an often appearing | problem. I didn't find it documented anywhere, so I'm posting it here. | If this isn't new in any way, I'd really like to get to know it. | | Example problem: | I have some "datetime" objects and want to sort them, as here: | | birthdays = [d1,d2,d3,d4] | birthdays.sort() | | However, I don't want to sort them the default way. These are birthdays, | so only the month and day do matter, not the year. E.g.: | | 2003-01-01 should be smaller than 1984-05-01 | | So I have to write the comparison on my own, e.g. | | def cmp_birthdays(d1,d2): | if d1.month > d2.month: return 1 | if d1.month < d2.month: return -1 | if d1.day > d2.day: return 1 | if d1.day < d2.day: return -1 | return 0 | | ... | birthdays.sort(cmp_birthdays) If you don't care about the year, why not just "normalize" the year to all be the same using the replace method of the date instance? Something like: d1 = datetime.date(2004, 12, 2) d2 = datetime.date(2001, 12, 3) d3 = datetime.date(2002, 12, 6) d4 = datetime.date(1977, 12, 7) dates =[d1,d2,d3,d4] datesNorm = [obj.replace(year=1900) for obj in (dates)] datesNorm.sort() print datesNorm # etcetera HTH, --- Vincent | | This implementation of cmp_birthdays is very ugly. Image you want to | chain more than 2 values in that "cmp_birthdays". I also want to use the | builtin "cmp" function, not ">" and "<". | | After thinking some minutes about it, I found a very nice solution: | I have some "cmp"s one aftter another. If one if them return 1 oder -1, | it sould be returned. If it returns 0, the next "cmp" is used. In other | words: I have a sequence of numbers, and want to get the first one that | is not 0. (or return 0, if all numbers were 0) | | But this is exactly what the "or" operator does, due to short-circuit | evaluation. In this example, that means: | | def cmp_bithdays(d1,d2): | return cmp(d1.month,d2.month) or cmp(d1.day,d2.day) | | The generic pattern is: | | return cmp(...) or cmp (...) or cmp(...) or ... | | I'm not sure whether this pattern is already a "common recipe", but | I found it to be a very nice idea. | | Any opinions? | | | Greets, | | Volker | | -- | Volker Grabsch | ---<<(())>>--- | \frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\} \right|}{\sqrt | [G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}} Robert Kern Guest Posts: n/a  05-14-2005 Volker Grabsch wrote: > Hello! > > Ich just found a very nice 'pythonic' solution for an often appearing > problem. I didn't find it documented anywhere, so I'm posting it here. > If this isn't new in any way, I'd really like to get to know it. > > Example problem: > I have some "datetime" objects and want to sort them, as here: > > birthdays = [d1,d2,d3,d4] > birthdays.sort() > > However, I don't want to sort them the default way. These are birthdays, > so only the month and day do matter, not the year. E.g.: > > 2003-01-01 should be smaller than 1984-05-01 [snip] > Any opinions? I find that using the "key" argument to sort is much nicer than "cmp" for these tasks. In [5]:L = [datetime.date(2005,5,2), datetime.date(1984,12,15), datetime.date(1954,1,1)] In [7]:L.sort(key=lambda x: (x.month, x.day)) In [8]:L Out[8]: [datetime.date(1954, 1, 1), datetime.date(2005, 5, 2), datetime.date(1984, 12, 15)] -- Robert Kern http://www.velocityreviews.com/forums/(E-Mail Removed) "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter Peter Hansen Guest Posts: n/a  05-14-2005 vincent wehren wrote: > "Volker Grabsch" <(E-Mail Removed)> schrieb im > Newsbeitrag news:d64io9$s6o$05$(E-Mail Removed)-online.com...
> | However, I don't want to sort them the default way. These are birthdays,
> | so only the month and day do matter, not the year. E.g.:
> |

....
> If you don't care about the year, why not just "normalize" the year
> to all be the same using the replace method of the date instance?
> Something like:
>
> d1 = datetime.date(2004, 12, 2)
> d2 = datetime.date(2001, 12, 3)
> d3 = datetime.date(2002, 12, 6)
> d4 = datetime.date(1977, 12, 7)
> dates =[d1,d2,d3,d4]
> datesNorm = [obj.replace(year=1900) for obj in (dates)]
> datesNorm.sort()
> print datesNorm # etcetera

Or just use the .timetuple() method on datetime objects and sort on the
8th element of the 9-element tuple, which is the day-of-the-year.

-Peter

Steven Bethard
Guest
Posts: n/a

 05-14-2005
Robert Kern wrote:
> I find that using the "key" argument to sort is much nicer than "cmp"
>
> In [5]:L = [datetime.date(2005,5,2), datetime.date(1984,12,15),
> datetime.date(1954,1,1)]
>
> In [7]:L.sort(key=lambda x: (x.month, x.day))
>
> In [8]:L
> Out[8]:
> [datetime.date(1954, 1, 1),
> datetime.date(2005, 5, 2),
> datetime.date(1984, 12, 15)]

Yes, definitely. Also worth noting in Robert Kern's solution is that

def mycmp(d1, d2):
return cmp(d1.month,d2.month) or cmp(d1.day,d2.day)

you can write:

def mycmp(d1, d2):
return cmp((d1.month, d1.day), (d2.month, d2.day))

or if you're using the key= argument (like you probably should):

def mykey(d):
return (d.month, d.day)

The point here is that rather than chaining cmp() calls with ors, you
should just use a tuple -- the standard comparison order in tuples is
exactly what you're looking for.

STeVe

Volker Grabsch
Guest
Posts: n/a

 05-14-2005
Steven Bethard wrote:
> Robert Kern wrote:
> def mykey(d):
> return (d.month, d.day)
>
> The point here is that rather than chaining cmp() calls with ors, you
> should just use a tuple -- the standard comparison order in tuples is
> exactly what you're looking for.

That's an excellent idea! Thanks a lot. I really didn't think of the "key="
argument.

Greets,

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\} \right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

Volker Grabsch
Guest
Posts: n/a

 05-14-2005
vincent wehren wrote:
>
> If you don't care about the year, why not just "normalize" the year
> to all be the same using the replace method of the date instance?

That's a very bad idea. In my example, this would work, but in "reality"
I don't sort datetime objects, of course! (Is there any real application
where you want to do that?)

Instead, I'm sorting "Person" objects using a "birthday" attribute.
Since I use these Person objects also in other places, they should never
be modified without just to be sorted. In general, the number side effects
should always be minimized.

> datesNorm = [obj.replace(year=1900) for obj in (dates)]
> datesNorm.sort()

This code would go bad especially in my situation, where my "Person"
objects are SQLObjects, thus the "normalisation" would be written into
the database. Okay, one could use transactions and rollback", but I
think, my point is clear now.

Nevertheless, I think you idea is very interesting. Is there any "real"
application where normalizing just for sorting would be reasonable?

Greets,

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\} \right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

Volker Grabsch
Guest
Posts: n/a

 05-14-2005
Peter Hansen wrote:
>
> Or just use the .timetuple() method on datetime objects and sort on the
> 8th element of the 9-element tuple, which is the day-of-the-year.

An interesting idea. But wouldn't sorting by (dd.month,dd.day) be more
effective?

In other words: Does .timetuple() create a tuple, or does it just return
a tuple which is present anyway?

Greets,

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\} \right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

vincent wehren
Guest
Posts: n/a

 05-14-2005

"Volker Grabsch" <(E-Mail Removed)> schrieb im
Newsbeitrag news:d65h73$fgr$02\$(E-Mail Removed)-online.com...
| vincent wehren wrote:
| >
| > If you don't care about the year, why not just "normalize" the year
| > to all be the same using the replace method of the date instance?
|
| That's a very bad idea. In my example, this would work, but in "reality"
| I don't sort datetime objects, of course! (Is there any real application
| where you want to do that?)
|
| Instead, I'm sorting "Person" objects using a "birthday" attribute.
| Since I use these Person objects also in other places, they should never
| be modified without just to be sorted. In general, the number side effects
| should always be minimized.

Can you explain where you see a modification to the orginal object
happening?
(or in any of the other solutions proposed for that matter...)

Not here I hope:

|
| > datesNorm = [obj.replace(year=1900) for obj in (dates)]
| > datesNorm.sort()

If you print datesNorm, you'll see:

[datetime.date(1900, 12, 2), datetime.date(1900, 12, 3), datetime.date(1900,
12, 6), datetime.date(1900, 12, 7)]

However, dates is still the same:

[datetime.date(2004, 12, 2), datetime.date(2001, 12, 3), datetime.date(2002,
12, 6), datetime.date(1977, 12, 7)]

|
| This code would go bad especially in my situation, where my "Person"
| objects are SQLObjects, thus the "normalisation" would be written into
| the database. Okay, one could use transactions and rollback", but I
| think, my point is clear now.

Since you wouldn't need to change the attribute object to
perform a sort (on the instances) using it (or portions of it) as key, it's
not.
Or is there something fundamental I am missing about your particular use
case?

| Nevertheless, I think you idea is very interesting. Is there any "real"
| application where normalizing just for sorting would be reasonable?

How about a case-insensitive sort of strings? (uppering being the
normalization step)
Or getting rid of accented / special characters before sorting.
These sound like fairly straight-forward use cases to me

Anyway, I was doing some SQL this morning where getting rid of portions of
'datetime' fields in
a WHERE clause is sometimes just the ticket - hence the connection.

--

Vincent

|
| --
| Volker Grabsch
| ---<<(())>>---
| \frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\} \right|}{\sqrt
| [G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}

Volker Grabsch
Guest
Posts: n/a

 05-14-2005
vincent wehren wrote:
>
>| > If you don't care about the year, why not just "normalize" the year
>| > to all be the same using the replace method of the date instance?
>|
>| That's a very bad idea. In my example, this would work, but in "reality"
>| I don't sort datetime objects, of course! (Is there any real application
>| where you want to do that?)
>|
>| Instead, I'm sorting "Person" objects using a "birthday" attribute.
>| Since I use these Person objects also in other places, they should never
>| be modified without just to be sorted. In general, the number side effects
>| should always be minimized.
>
> Can you explain where you see a modification to the orginal object
> happening?
> (or in any of the other solutions proposed for that matter...)

Sorry, my fault. I didn't read carefully enough. X-)

> Not here I hope:
>
>| > datesNorm = [obj.replace(year=1900) for obj in (dates)]
>| > datesNorm.sort()

While you don't change the original objects, there's still a problem since
you're sorting the normalized values. However, I want to sort the original
list (i.e. the list of "Person" objects).

But that's not a real problem if one normalizes in a key function:

def key_birthday(d):
return d.replace(year=1900)
...
dates.sort(key=key_birthday)

...as suggested in other followups of my posting.

>| Nevertheless, I think you idea is very interesting. Is there any "real"
>| application where normalizing just for sorting would be reasonable?
>
> How about a case-insensitive sort of strings? (uppering being the
> normalization step)
> Or getting rid of accented / special characters before sorting.
> These sound like fairly straight-forward use cases to me

For your solution these are good examples. But my question was, whether
normalizing first, and just sorting the normalized values (not the original
values) is reasonable.

I.e., when I sort some strings case-insensitive, I don't want my resulting
(sorted) list to contain only lowercase string. But that's what I would
get if I used the algorithm you described above.

Greets,

--
Volker Grabsch
---<<(())>>---
\frac{\left|\vartheta_0\times\{\ell,\kappa\in\Re\} \right|}{\sqrt
[G]{-\Gamma(\alpha)\cdot\mathcal{B}^{\left[\oint\!c_\hbar\right]}}}