Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Confused compare function :)

Reply
Thread Tools

Re: Confused compare function :)

 
 
Steven D'Aprano
Guest
Posts: n/a
 
      12-07-2012
On Thu, 06 Dec 2012 13:51:29 +0000, Neil Cerutti wrote:

> On 2012-12-06, Steven D'Aprano
> <(E-Mail Removed)> wrote:
>> total = 0
>> for s in list_of_strings:
>> try:
>> total += int(s)
>> except ValueError:
>> pass # Not a number, ignore it.

>
> If it's internal data, perhaps. Of course, that would mean I had the
> option of *not* creating that stupid list_of_strings.



Not necessarily, it depends on the application.

If you have a spreadsheet, and create a formula =SUM(A1:A50) the expected
behaviour is to just skip anything that is not a valid number, not raise
an error. Sometimes correct application-level behaviour is to just ignore
input that it doesn't care about.

One of the things that I used to despise with a passion about MYOB is
that if you clicked on the window outside of a button or field, it would
scream at you "ERROR ERROR ERROR - that was not a button or field!!!!"
That is to say, it would beep. I mean, *who fscking cares* that the user
clicks in the window background? It's a harmless tick, like tapping your
desk, just ignore it.

As a general rule, library functions should be strict about reporting
errors, while applications may be more forgiving about errors that they
don't care about. The "don't care about" part is important though -- your
word processor shouldn't care about low level networking errors, but it
should care if it can't save a file to a network drive.


--
Steven
 
Reply With Quote
 
 
 
 
Anatoli Hristov
Guest
Posts: n/a
 
      12-07-2012
>> Calling it 'found' is misleading, because it's True only if it updated.
>> If it found a match but didn't update, 'found' will still be False.
>> Using a loop within a loop like this could be the cause of your
>> problem. It's certainly not the most efficient way of doing it.

>
> I will keep you posted THANK YOU


And here is my final code -- I hope you will like it a little more


def Change_price(): # Changes the price in the DB if the price in the
CSV is changed
TotalUpdated = 0 # Counter for total updated
TotalSKUNotFound = 0 # Total SKU from the DB coresponds to the one
in the CSV
TotalSKUinDB = 0

found = None
for row in PRODUCTSDB:
TotalSKUinDB +=1
db_sku = row["sku"].lower()
db_price = float(row["price"])
if CompareWithCSV(db_sku, db_price) == True:
TotalUpdated +=1
else:
TotalSKUNotFound +=1
Update_SQL_stock(db_sku)
WriteLog(db_sku, row["product_id"])

print "Total updated: %s" % TotalUpdated
print"Total not found with in the distributor: %s" % TotalSKUNotFound
print "Total products in the DB %s" % TotalSKUinDB

def CompareWithCSV(db_sku, db_price):
try:
for x in pricelist:
try:
csv_price = x[6]
csv_price = csv_price.replace(",",".")
csv_price = float(csv_price)
csv_new_price = csv_price*1.10
csv_sku = x[4].lower()
csv_stock = int(x[7]) # I used this as normally I used
stock in the condition
if len(db_sku) != 0 and db_sku == csv_sku and csv_stock > 0:
print db_sku, csv_price, db_price, csv_new_price
Update_SQL(csv_new_price, db_sku)
return True
except (IndexError, ValueError, TypeError, db_sku): # I
have a lot of index error in the CSV (empty fields) and the loop gives
"index error" I don't care about them
WriteErrorLog("Error with CSV file loop: ", db_sku)
except IndexError:
WriteErrorLog("Error with CSV file loop: ", db_sku)
return False
 
Reply With Quote
 
 
 
 
Neil Cerutti
Guest
Posts: n/a
 
      12-07-2012
On 2012-12-07, Steven D'Aprano <(E-Mail Removed)> wrote:
> On Thu, 06 Dec 2012 13:51:29 +0000, Neil Cerutti wrote:
>
>> On 2012-12-06, Steven D'Aprano
>> <(E-Mail Removed)> wrote:
>>> total = 0
>>> for s in list_of_strings:
>>> try:
>>> total += int(s)
>>> except ValueError:
>>> pass # Not a number, ignore it.

>>
>> If it's internal data, perhaps. Of course, that would mean I
>> had the option of *not* creating that stupid list_of_strings.

>
> Not necessarily, it depends on the application.


I agree. I personally wouldn't want, e.g., 12.0 to get silently
skipped so I've never done this. I've never done anything like
that, but I can imagine it.

> If you have a spreadsheet, and create a formula =SUM(A1:A50)
> the expected behaviour is to just skip anything that is not a
> valid number, not raise an error. Sometimes correct
> application-level behaviour is to just ignore input that it
> doesn't care about.
>
> One of the things that I used to despise with a passion about
> MYOB is that if you clicked on the window outside of a button
> or field, it would scream at you "ERROR ERROR ERROR - that was
> not a button or field!!!!" That is to say, it would beep. I
> mean, *who fscking cares* that the user clicks in the window
> background? It's a harmless tick, like tapping your desk, just
> ignore it.


What happens in Word during a Mail Merge if an invalid field is
in the data file, one you don't even care about: You get to click
on a modular dialog box for every record you're merging with to
say IGNORE. And you can't quit.

> As a general rule, library functions should be strict about
> reporting errors, while applications may be more forgiving
> about errors that they don't care about. The "don't care about"
> part is important though -- your word processor shouldn't care
> about low level networking errors, but it should care if it
> can't save a file to a network drive.


You have to draw the line somewhere in any case, and drawing it
all the way over to IGNORE is bound right some of the time. It
would be, I guess, Cargo Cult programming to never ignore errors.

--
Neil Cerutti
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      12-07-2012
On Thu, 06 Dec 2012 23:14:17 +1100, Chris Angelico wrote:

> Setting up the try/except is a constant time cost,


It's not just constant time, it's constant time and *cheap*. Doing
nothing inside a try block takes about twice as long as doing nothing:

[steve@ando ~]$ python2.7 -m timeit "try: pass
> except: pass"

10000000 loops, best of 3: 0.062 usec per loop

[steve@ando ~]$ python2.7 -m timeit "pass"
10000000 loops, best of 3: 0.0317 usec per loop


> while the duplicated
> search for k inside the dictionary might depend on various other
> factors.


It depends on the type, size and even the history of the dict, as well as
the number, type and values of the keys. Assuming a built-in dict, we can
say that in the absence of many collisions, key lookup can be amortized
over many lookups as constant time.


> In the specific case of a Python dictionary, the membership
> check is fairly cheap (assuming you're not the subject of a hash
> collision attack - Py3.3 makes that a safe assumption),


Don't be so sure -- the hash randomization algorithm for Python 3.3 is
trivially beaten by an attacker.

http://bugs.python.org/issue14621#msg173455

but in general, yes, key lookup in dicts is fast. But not as fast as
setting up a try block.

Keep in mind too that the "Look Before You Leap" strategy is
fundamentally unsound if you are using threads:

# in main thread:
if key in mydict: # returns True
x = mydict[key] # fails with KeyError

How could this happen? In the fraction of a second between checking
whether the key exists and actually looking up the key, another thread
could delete it! This is a classic race condition, also known as a Time
Of Check To Time Of Use bug.


--
Steven
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      12-08-2012
On 12/7/2012 5:16 PM, Steven D'Aprano wrote:
> On Thu, 06 Dec 2012 23:14:17 +1100, Chris Angelico wrote:
>
>> Setting up the try/except is a constant time cost,

>
> It's not just constant time, it's constant time and *cheap*. Doing
> nothing inside a try block takes about twice as long as doing nothing:
>
> [steve@ando ~]$ python2.7 -m timeit "try: pass
>> except: pass"

> 10000000 loops, best of 3: 0.062 usec per loop
>
> [steve@ando ~]$ python2.7 -m timeit "pass"
> 10000000 loops, best of 3: 0.0317 usec per loop
>
>
>> while the duplicated
>> search for k inside the dictionary might depend on various other
>> factors.

>
> It depends on the type, size and even the history of the dict, as well as
> the number, type and values of the keys. Assuming a built-in dict, we can
> say that in the absence of many collisions, key lookup can be amortized
> over many lookups as constant time.
>
>
>> In the specific case of a Python dictionary, the membership
>> check is fairly cheap (assuming you're not the subject of a hash
>> collision attack - Py3.3 makes that a safe assumption),

>
> Don't be so sure -- the hash randomization algorithm for Python 3.3 is
> trivially beaten by an attacker.
>
> http://bugs.python.org/issue14621#msg173455
>
> but in general, yes, key lookup in dicts is fast. But not as fast as
> setting up a try block.
>
> Keep in mind too that the "Look Before You Leap" strategy is
> fundamentally unsound if you are using threads:
>
> # in main thread:
> if key in mydict: # returns True
> x = mydict[key] # fails with KeyError
>
> How could this happen? In the fraction of a second between checking
> whether the key exists and actually looking up the key, another thread
> could delete it! This is a classic race condition, also known as a Time
> Of Check To Time Of Use bug.


I generally agree with everything Steven has said here and in previous
responses and add the following.

There are two reasons to not execute a block of code.

1. It could and would run, but we do not want it to run because a) we do
not want an answer, even if correct; b) it would return a wrong answer
(which of course we do not want); or c) it would run forever and never
give any answer. To not run code, for any of these reasons, requires an
if statement.

2. It will not run but will raise an exception instead. In this case, we
can always use try-except. Sometimes we can detect that it would not run
before running it, and can use an if statement instead. (But as Steven
points out, this is sometimes trickier than it might seem.) However,
even if we can reliably detect that code would either run or raise an
exception, this often or even usually requires doing redundant calculation.

For example, 'key in mydict' must hash the key, mod the hash according
to the size of the dict, find the corresponding slot in the dict, and do
an equality comparison with the existing key in the dict. If not equal,
repeat according to the collision algorithm for inserting keys.

In other words, 'key in mydict' does everything done by 'mydict[key]'
except to actually fetch the value when the right slot is found or raise
an exception if there is no right slot.

So why ever use a redundant condition check? A. esthetics. B.
practicality. Unfortunately, catching exceptions may be and often is as
slow as the redundant check and even multiple redundant checks.

--
Terry Jan Reedy

 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      12-08-2012
On Sat, Dec 8, 2012 at 6:01 PM, Terry Reedy <(E-Mail Removed)> wrote:
> Unfortunately, catching exceptions may be and often is as slow as the
> redundant check and even multiple redundant checks.


It depends on how often you're going to catch and how often just flow
through. In Python, as in most other modern languages, exceptions only
cost you when they get thrown. The extra check, though, costs you in
the normal case.

ChrisA
 
Reply With Quote
 
MRAB
Guest
Posts: n/a
 
      12-08-2012
On 2012-12-08 07:17, Chris Angelico wrote:
> On Sat, Dec 8, 2012 at 6:01 PM, Terry Reedy <(E-Mail Removed)> wrote:
>> Unfortunately, catching exceptions may be and often is as slow as the
>> redundant check and even multiple redundant checks.

>
> It depends on how often you're going to catch and how often just flow
> through. In Python, as in most other modern languages, exceptions only
> cost you when they get thrown. The extra check, though, costs you in
> the normal case.
>

That's where the .get method comes in handy:

MISSING = object()
....
value = my_dict.get(key, MISSING)
if value is not MISSING:
...

It could be faster if the dict often doesn't contain the key.
 
Reply With Quote
 
Ramchandra Apte
Guest
Posts: n/a
 
      12-09-2012
On Thursday, 6 December 2012 17:44:17 UTC+5:30, Chris Angelico wrote:
> On Thu, Dec 6, 2012 at 10:47 PM, Steven D'Aprano
>
> <(E-Mail Removed)> wrote:
>
> > Not so. Which one is faster will depend on how often you expect to fail.

>
> > If the keys are nearly always present, then:

>
> >

>
> > try:

>
> > do_stuff(mydict[k])

>
> > except KeyError:

>
> > pass

>
> >

>
> > will be faster. Setting up a try block is very fast, about as fast as

>
> > "pass", and faster than "if k in mydict".

>
> >

>
> > But if the key is often missing, then catching the exception will be

>
> > slow, and the "if k in mydict" version may be faster. It depends on how

>
> > often the key is missing.

>
> >

>
>
>
> Setting up the try/except is a constant time cost, while the
>
> duplicated search for k inside the dictionary might depend on various
>
> other factors. In the specific case of a Python dictionary, the
>
> membership check is fairly cheap (assuming you're not the subject of a
>
> hash collision attack - Py3.3 makes that a safe assumption), but if
>
> you were about to execute a program and wanted to first find out if it
>
> existed, that extra check could be ridiculously expensive, eg if the
>
> path takes you on a network drive - or, worse, on multiple network
>
> drives, which I have had occasion to do!
>
>
>
> ChrisA


Not really. I remember a bug saying that only 256 hashes were required of known texts and then the randomization becomes useless.
 
Reply With Quote
 
Ramchandra Apte
Guest
Posts: n/a
 
      12-09-2012
On Thursday, 6 December 2012 17:44:17 UTC+5:30, Chris Angelico wrote:
> On Thu, Dec 6, 2012 at 10:47 PM, Steven D'Aprano
>
> <(E-Mail Removed)> wrote:
>
> > Not so. Which one is faster will depend on how often you expect to fail.

>
> > If the keys are nearly always present, then:

>
> >

>
> > try:

>
> > do_stuff(mydict[k])

>
> > except KeyError:

>
> > pass

>
> >

>
> > will be faster. Setting up a try block is very fast, about as fast as

>
> > "pass", and faster than "if k in mydict".

>
> >

>
> > But if the key is often missing, then catching the exception will be

>
> > slow, and the "if k in mydict" version may be faster. It depends on how

>
> > often the key is missing.

>
> >

>
>
>
> Setting up the try/except is a constant time cost, while the
>
> duplicated search for k inside the dictionary might depend on various
>
> other factors. In the specific case of a Python dictionary, the
>
> membership check is fairly cheap (assuming you're not the subject of a
>
> hash collision attack - Py3.3 makes that a safe assumption), but if
>
> you were about to execute a program and wanted to first find out if it
>
> existed, that extra check could be ridiculously expensive, eg if the
>
> path takes you on a network drive - or, worse, on multiple network
>
> drives, which I have had occasion to do!
>
>
>
> ChrisA


Not really. I remember a bug saying that only 256 hashes were required of known texts and then the randomization becomes useless.
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      12-09-2012
On Sun, Dec 9, 2012 at 2:07 PM, Ramchandra Apte <(E-Mail Removed)> wrote:
> Not really. I remember a bug saying that only 256 hashes were required of known texts and then the randomization becomes useless.


That requires that someone be able to get you to hash some text and
give back the hash. In any case, even if you _are_ dealing with the
worst-case hash collision attack, all it does is stop a Python
dictionary from being an exception to the general principle. If you're
doing a lookup in, say, a tree, then checking if the element exists
and then retrieving it means walking the tree twice - O(log n) if the
tree's perfectly balanced, though a splay tree would be potentially
quite efficient at that particular case. But there's still extra cost
to the check.

ChrisA
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Confused compare function :) Anatoli Hristov Python 2 12-06-2012 09:36 AM
Can I overload the compare (cmp()) function for a Lists ([]) index function? xkenneth Python 7 10-11-2007 05:10 AM
write a function such that when ever i call this function in some other function .it should give me tha data type and value of calling function parameter komal C++ 6 01-25-2005 11:13 AM
need help with Compare() function Wee Bubba ASP .Net 0 09-08-2004 07:03 PM



Advertisments