Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Sqlite3. Substitution of names in query.

Reply
Thread Tools

Sqlite3. Substitution of names in query.

 
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      11-01-2009
On Sun, 01 Nov 2009 19:08 +1300, Lawrence D'Oliveiro
<(E-Mail Removed)_zealand> declaimed the following in
gmane.comp.python.general:

> On the grounds that Python has more general and powerful string parameter-
> substitution mechanisms than anything built into any database API.


Really? In the case of MySQLdb, the DB-API /uses/ Pythons string
interpolation -- but first does all needed escaping of special
characters followed by putting quotes around the data. Of course, MySQL
didn't have parameterized/prepared queries until recently, so the API
has to submit complete SQL statements for every operation (this may
change in the future, but then means one can not run against a v4.x
MySQL engine)

Since SQLite uses prepared/parameterized queries, the parameters are
passed in a totally different means, and the engine itself ensures the
data can't be jiggered to look like a set of statements.

In both cases, the only thing that changes in the SQL is placeholder
-- for SQLite, the API uses "?", and for MySQL the API uses "%s"... But
the spec for the API also states that one can retrieve the parameter
marker dynamically. So one could moderately easily write code that is
engine agnostic... Something not that easy to do for data items if one
is going to ignore the DB-API parameter passing scheme totally.
--
Wulfraed Dennis Lee Bieber KD6MOG
http://www.velocityreviews.com/forums/(E-Mail Removed) HTTP://wlfraed.home.netcom.com/

 
Reply With Quote
 
 
 
 
Carsten Haese
Guest
Posts: n/a
 
      11-01-2009
Lawrence D'Oliveiro wrote:
>> On what grounds are you asserting that it's not necessary to mix the
>> two? Please elaborate your point.

>
> On the grounds that Python has more general and powerful string parameter-
> substitution mechanisms than anything built into any database API.


That statement is fundamentally flawed. You are assuming that the
preferred way of getting a value into a query is by substituting a
literal into the query string. That is, in general, not true, because
that would be horribly inefficient. This is also why I despise the term
"parameter substitution", since it implies incorrectly that passing
parameters to a query is merely a string formatting exercise. The
correct term is "parameter binding."

Most databases actually provide an API for supplying parameters
separately from the query string. This is more efficient, because it
eliminates the need to render the parameter value into a literal form on
the client side and to parse the literal form on the server side. Also,
it allows the engine to perform the same query multiple times with
different values without having to re-parse the query.

Finally, you're assuming that every value that can be supplied to a
query actually HAS a literal form. That is not true. For example, in
Informix databases, there are no literals for BYTE-type values. (You'd
probably call them blobs.) So, if vomiting literals into the query
string were your only way of conveying values to the database, you'd
never be able to populate a BYTE column on an Informix database. The
only way to pass a BYTE value to an Informix database is by parameter
binding.

Since parameter binding is in general much more than string
substitution, it is indeed necessary to mix the two.

--
Carsten Haese
http://informixdb.sourceforge.net

 
Reply With Quote
 
 
 
 
Lawrence D'Oliveiro
Guest
Posts: n/a
 
      11-01-2009
In message <(E-Mail Removed)>, Carsten
Haese wrote:

> Lawrence D'Oliveiro wrote:
>
>> In message <(E-Mail Removed)>,
>> Carsten Haese wrote:

>
>>> On what grounds are you asserting that it's not necessary to mix the
>>> two? Please elaborate your point.

>>
>> On the grounds that Python has more general and powerful string
>> parameter- substitution mechanisms than anything built into any database
>> API.

>
> That statement is fundamentally flawed. You are assuming that the
> preferred way of getting a value into a query is by substituting a
> literal into the query string. That is, in general, not true, because
> that would be horribly inefficient.


Says someone who hasn't realized where the real inefficiencies are. Remember
what Tony Hoare told us: "premature optimization is the root of all evil".
These are databases we're talking about. Real-world databases are large, and
reside on disk, which is several orders of magnitude slower than RAM. And
RAM is where string parameter substitutions take place. So a few hundred
extra RAM accesses isn't going to make any significant difference to the
speed of database queries.

> Finally, you're assuming that every value that can be supplied to a
> query actually HAS a literal form. That is not true. For example, in
> Informix databases, there are no literals for BYTE-type values.


Probably why I don't use Informix. What use is a binary data type if you
can't insert and retrieve binary data values?

 
Reply With Quote
 
Carsten Haese
Guest
Posts: n/a
 
      11-01-2009
Lawrence D'Oliveiro wrote:
> Says someone who hasn't realized where the real inefficiencies are. Remember
> what Tony Hoare told us: "premature optimization is the root of all evil".
> These are databases we're talking about. Real-world databases are large, and
> reside on disk, which is several orders of magnitude slower than RAM. And
> RAM is where string parameter substitutions take place. So a few hundred
> extra RAM accesses isn't going to make any significant difference to the
> speed of database queries.


You're just not getting it. The cost is not in performing the parameter
substitutions themselves. The cost is in parsing what's essentially the
same query one million times over when it could have been parsed only
once. You might find an increase of seven orders of magnitude
insignificant, but I don't.

> Probably why I don't use Informix. What use is a binary data type if you
> can't insert and retrieve binary data values?


You CAN insert and retrieve binary data values. You just have to use the
right tool for the job, and that is parameter binding.

--
Carsten Haese
http://informixdb.sourceforge.net

 
Reply With Quote
 
Lawrence D'Oliveiro
Guest
Posts: n/a
 
      11-02-2009
In message <(E-Mail Removed)>, Carsten
Haese wrote:

> Lawrence D'Oliveiro wrote:
>
>> Says someone who hasn't realized where the real inefficiencies are.
>> Remember what Tony Hoare told us: "premature optimization is the root of
>> all evil". These are databases we're talking about. Real-world databases
>> are large, and reside on disk, which is several orders of magnitude
>> slower than RAM. And RAM is where string parameter substitutions take
>> place. So a few hundred extra RAM accesses isn't going to make any
>> significant difference to the speed of database queries.

>
> You're just not getting it. The cost is not in performing the parameter
> substitutions themselves. The cost is in parsing what's essentially the
> same query one million times over when it could have been parsed only
> once. You might find an increase of seven orders of magnitude
> insignificant, but I don't.


There is no such parsing overhead. I speak from experience.

Look at the BulkInserter class here
<http://www.codecodex.com/wiki/Useful_MySQL_Routines>. I have successfully
used that to insert tens of thousands of records in just a few seconds. Yet
it makes no use of the parameter-substitution provided by MySQLdb; it is all
just straight Python, including the SQLString routine (also on that page),
which goes through every single character of each string value to decide
what needs escaping. Python can do all that, and do it fast.

You don't get to figure out what's efficient and what's not by mere hand-
waving; you have to do actual real-world tests.
 
Reply With Quote
 
Lawrence D'Oliveiro
Guest
Posts: n/a
 
      11-02-2009
In message <(E-Mail Removed)>, Dennis Lee Bieber wrote:

> On Sun, 01 Nov 2009 19:08 +1300, Lawrence D'Oliveiro
> <(E-Mail Removed)_zealand> declaimed the following in
> gmane.comp.python.general:
>
>> On the grounds that Python has more general and powerful string
>> parameter- substitution mechanisms than anything built into any database
>> API.

>
> Really? In the case of MySQLdb, the DB-API /uses/ Pythons string
> interpolation ...


Only a limited subset thereof. For instance, I'm not aware of any database
API that lets me do this:

sql.cursor.execute \
(
"update numbers set flags = flags | %(setflags)u where projectid = %(projectid)s"
"%(match_listid)s and number = %(number)s"
%
{
"projectid" : SQLString(ProjectID),
"match_listid" :
("", " and listid = %s" % SQLString(ListID))[ListID != None],
"number" : SQLString(number),
"setflags" : flags,
}
)

 
Reply With Quote
 
Robert Kern
Guest
Posts: n/a
 
      11-02-2009
On 2009-11-02 14:47 PM, Lawrence D'Oliveiro wrote:
> In message<(E-Mail Removed)>, Dennis Lee Bieber wrote:
>
>> On Sun, 01 Nov 2009 19:08 +1300, Lawrence D'Oliveiro
>> <(E-Mail Removed)_zealand> declaimed the following in
>> gmane.comp.python.general:
>>
>>> On the grounds that Python has more general and powerful string
>>> parameter- substitution mechanisms than anything built into any database
>>> API.

>>
>> Really? In the case of MySQLdb, the DB-API /uses/ Pythons string
>> interpolation ...

>
> Only a limited subset thereof. For instance, I'm not aware of any database
> API that lets me do this:
>
> sql.cursor.execute \
> (
> "update numbers set flags = flags | %(setflags)u where projectid = %(projectid)s"
> "%(match_listid)s and number = %(number)s"
> %
> {
> "projectid" : SQLString(ProjectID),
> "match_listid" :
> ("", " and listid = %s" % SQLString(ListID))[ListID != None],
> "number" : SQLString(number),
> "setflags" : flags,
> }
> )


When programmatically generating SQL, I like to use SQLAlchemy. This use case is
handled with .where() on Update expression objects. Personally, I find
manipulating the SQLAlchemy expression objects clearer, safer, and more portable
than building the raw SQL through string interpolation.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

 
Reply With Quote
 
Lawrence D'Oliveiro
Guest
Posts: n/a
 
      11-03-2009
In message <(E-Mail Removed)>, Robert
Kern wrote:

> On 2009-11-02 14:47 PM, Lawrence D'Oliveiro wrote:
>
>> For instance, I'm not aware of any database API that lets me do this:
>>
>> sql.cursor.execute \
>> (
>> "update numbers set flags = flags | %(setflags)u where
>> projectid = %(projectid)s" "%(match_listid)s and number =
>> %(number)s"
>> %
>> {
>> "projectid" : SQLString(ProjectID),
>> "match_listid" :
>> ("", " and listid = %s" % SQLString(ListID))[ListID
>> != None],
>> "number" : SQLString(number),
>> "setflags" : flags,
>> }
>> )

>
> When programmatically generating SQL, I like to use SQLAlchemy. This use
> case is handled with .where() on Update expression objects. Personally, I
> find manipulating the SQLAlchemy expression objects clearer, safer, and
> more portable than building the raw SQL through string interpolation.


Doesn't seem to support bulk insertions * la this
<http://www.codecodex.com/wiki/Useful_MySQL_Routines#Bulk_Insertion>. Or
even literal lists as per the SQLStringList routine on the same page.

 
Reply With Quote
 
Carsten Haese
Guest
Posts: n/a
 
      11-03-2009
Lawrence D'Oliveiro wrote:
> There is no such parsing overhead. I speak from experience.
>
> Look at the BulkInserter class here
> <http://www.codecodex.com/wiki/Useful_MySQL_Routines>. I have successfully
> used that to insert tens of thousands of records in just a few seconds. Yet
> it makes no use of the parameter-substitution provided by MySQLdb; it is all
> just straight Python, including the SQLString routine (also on that page),
> which goes through every single character of each string value to decide
> what needs escaping. Python can do all that, and do it fast.


With all due respect, but if your experience is exclusive to
MySQL/MySQLdb, your experience means very little for database
programming practices in general. Throughout most of its history, MySQL
did not support prepared statements and parameter binding, and MySQLdb
doesn't use any parameter binding API that might be available, so you're
comparing your own implementation of string interpolation to MySQLdb's
implementation of string interpolation. Your experience says nothing
about databases that have an *actual* parameter binding API.

> You don't get to figure out what's efficient and what's not by mere hand-
> waving;


I'm not handwaving.

> you have to do actual real-world tests.


I have.

See for example the timing test in
http://informixdb.blogspot.com/2007/...in-blanks.html . If you'd
like to try it for yourself, here is a version of the test for SQLite:

================================================== ===============
# querytest.py
class Tester(object):
def __init__(self):
import sqlite3
conn = sqlite3.connect(":memory:")
self.cur = conn.cursor()
self.cur.execute("create temp table t1(a int, b int)")
self.counter = 0

def with_params(self):
self.counter += 1
self.cur.execute("insert into t1 values(?,?)",
(self.counter,self.counter*2) )

def without_params(self):
self.counter += 1
self.cur.execute("insert into t1 values(%s,%s)" %
(self.counter,self.counter*2) )
================================================== ===============

And here are the corresponding results on my laptop:
$ python -mtimeit -s "from querytest import Tester; t=Tester()"
't.with_params()'
10000 loops, best of 3: 20.9 usec per loop
$ python -mtimeit -s "from querytest import Tester; t=Tester()"
't.without_params()'
10000 loops, best of 3: 36.2 usec per loop

So, you can say whatever you want, but you will never convince me that
string interpolation is better than parameter binding for getting
variable values into a query. Even if you don't accept my proof that it
is more efficient, you have not proved that parameter binding is less
efficient.

In addition to the efficiency factor, parameter binding is inherently
secure, whereas string interpolation is too easy to use insecurely.
Finally, parameter binding is the standard method, as defined by the SQL
standard, of getting variable values into a query.

You may call it "premature optimization", but I call it "choosing the
right tool for the job."

I assume that none of this will convince you, but that's fine. We'll
just agree to disagree on this.

--
Carsten Haese
http://informixdb.sourceforge.net

 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      11-03-2009
On Tue, 03 Nov 2009 09:41:10 +1300, Lawrence D'Oliveiro
<(E-Mail Removed)_zealand> declaimed the following in
gmane.comp.python.general:

> There is no such parsing overhead. I speak from experience.
>

<snort>

> Look at the BulkInserter class here
> <http://www.codecodex.com/wiki/Useful_MySQL_Routines>. I have successfully
> used that to insert tens of thousands of records in just a few seconds. Yet
> it makes no use of the parameter-substitution provided by MySQLdb; it is all


You picked the wrong database to use for your argument.

Prior to MySQL version 5, MySQL ITSELF did not support prepared
(that is, preparsed and parameterized) SQL with parameters passed
separately.

MySQLdb, being compatible with MySQL 3.x and 4.x (along with 5.x),
therefore, ITSELF creates full SQL statements using Python string
interpolation operations (the main reason it uses %s as the
placeholder).

So your main claim turns into:

"My Python string interpolation is just as fast as MySQLdb's Python
string interpolation"

Don't believe me? Just take a look at the MySQLdb Python code -- the
execute() method summarizes to (not the real names):

resultSQL = SQL_with_placeholders % escape_arguments(argument_tuple)
submit_to_MySQL(resultSQL)

where escape_arguments() generates a string representation of each
argument, escapes quotes and other dangerous characters, and then wraps
the stringified arguments with MySQL quotes (which is why you can't use
%f to specify a placeholder for floating numerics -- the numeric was
converted to a quoted string before getting fed to the %f!)
--
Wulfraed Dennis Lee Bieber KD6MOG
(E-Mail Removed) HTTP://wlfraed.home.netcom.com/

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting 'flat' gate level names to hierarchical names Paddy McCarthy VHDL 3 09-24-2004 05:34 PM
table field names vs. display names Bob ASP .Net 1 07-30-2004 05:06 PM
Matching attribute names to element names in a different path Carl XML 0 04-01-2004 01:15 PM
WSDL- Mapping Application Defined Names to XML Names Craig XML 0 02-09-2004 04:14 PM
XSL rules applying to XSD (XML schema) defined type names (as opposed to node names) Lewis G. Pringle, Jr. XML 0 09-30-2003 10:34 PM



Advertisments