Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > SQL and CSV

Reply
Thread Tools

SQL and CSV

 
 
Nick
Guest
Posts: n/a
 
      05-05-2009
I have a requirement to read a CSV file. Normally, no problem, just
import CSV and slurp the file up.

However, in this case I want to filter out lines that have fields set
to particular values.

It would be neat to be able to do something like this.

select * from test.csv where status <> "Canceled"

Using adodb I can do this, so long as I don't have the where clause. :-
(

Is there a reasonable lightweight way of doing this in Python?

I could write some python code that is used to filter rows, and inport
that from config, but it's not quite as elegant as an SQL route.

Thanks

Nick
 
Reply With Quote
 
 
 
 
Tim Golden
Guest
Posts: n/a
 
      05-05-2009
Nick wrote:
> I have a requirement to read a CSV file. Normally, no problem, just
> import CSV and slurp the file up.
>
> However, in this case I want to filter out lines that have fields set
> to particular values.
>
> It would be neat to be able to do something like this.
>
> select * from test.csv where status <> "Canceled"
>
> Using adodb I can do this, so long as I don't have the where clause. :-
> (
>
> Is there a reasonable lightweight way of doing this in Python?
>
> I could write some python code that is used to filter rows, and inport
> that from config, but it's not quite as elegant as an SQL route.



Not entirely clear what you are and aren't prepared to try here, but...
the most obvious Python-based way to do this is treating the csv reader
as an iterator and filtering there. Your last line suggests that's not
what you want but just in case I've misunderstood:

<test.csv>
id,code,status
1,"ONE","Active"
2,"TWO","Cancelled"
3,"THREE","Active"
</test.csv>

<code>
import csv

for row in csv.DictReader (open ("c:/temp/test.csv", "rb")):
if row['status'] != 'Cancelled':
print row

</code>

Doesn't seem too onerous, and could obviously be wrapped in
some useful class/module.

But if you really want to go the SQL route, I believe there are
ODBC adapters for CSV which, combined with PyODBC or CeODBC,
would probably take you where you want to go.

TJG
 
Reply With Quote
 
 
 
 
Nick
Guest
Posts: n/a
 
      05-05-2009
On May 5, 5:19*pm, Tim Golden <(E-Mail Removed)> wrote:
> Nick wrote:
> > I have a requirement to read a CSV file. Normally, no problem, just
> > import CSV and slurp the file up.

>
> > However, in this case I want to filter out lines that have fields set
> > to particular values.

>
> > It would be neat to be able to do something like this.

>
> > select * from test.csv where status <> "Canceled"

>
> > Using adodb I can do this, so long as I don't have the where clause. :-
> > (

>
> > Is there a reasonable lightweight way of doing this in Python?

>
> > I could write some python code that is used to filter rows, and inport
> > that from config, but it's not quite as elegant as an SQL route.

>
> Not entirely clear what you are and aren't prepared to try here, but...
> the most obvious Python-based way to do this is treating the csv reader
> as an iterator and filtering there. Your last line suggests that's not
> what you want but just in case I've misunderstood:
>
> <test.csv>
> id,code,status
> 1,"ONE","Active"
> 2,"TWO","Cancelled"
> 3,"THREE","Active"
> </test.csv>
>
> <code>
> import csv
>
> for row in csv.DictReader (open ("c:/temp/test.csv", "rb")):
> * if row['status'] != 'Cancelled':
> * * print row
>
> </code>
>
> Doesn't seem too onerous, and could obviously be wrapped in
> some useful class/module.
>
> But if you really want to go the SQL route, I believe there are
> ODBC adapters for CSV which, combined with PyODBC or CeODBC,
> would probably take you where you want to go.
>
> TJG


Part of the problem is that the 'selection' needs to be in a config
file. I can put the if row['status'] != 'Cancelled': return True into
a config, read it and eval it, but its not quite as clean as an sql
route.

Nick
 
Reply With Quote
 
Matimus
Guest
Posts: n/a
 
      05-05-2009
On May 5, 9:25*am, Nick <(E-Mail Removed)> wrote:
> On May 5, 5:19*pm, Tim Golden <(E-Mail Removed)> wrote:
>
>
>
> > Nick wrote:
> > > I have a requirement to read a CSV file. Normally, no problem, just
> > > import CSV and slurp the file up.

>
> > > However, in this case I want to filter out lines that have fields set
> > > to particular values.

>
> > > It would be neat to be able to do something like this.

>
> > > select * from test.csv where status <> "Canceled"

>
> > > Using adodb I can do this, so long as I don't have the where clause. :-
> > > (

>
> > > Is there a reasonable lightweight way of doing this in Python?

>
> > > I could write some python code that is used to filter rows, and inport
> > > that from config, but it's not quite as elegant as an SQL route.

>
> > Not entirely clear what you are and aren't prepared to try here, but...
> > the most obvious Python-based way to do this is treating the csv reader
> > as an iterator and filtering there. Your last line suggests that's not
> > what you want but just in case I've misunderstood:

>
> > <test.csv>
> > id,code,status
> > 1,"ONE","Active"
> > 2,"TWO","Cancelled"
> > 3,"THREE","Active"
> > </test.csv>

>
> > <code>
> > import csv

>
> > for row in csv.DictReader (open ("c:/temp/test.csv", "rb")):
> > * if row['status'] != 'Cancelled':
> > * * print row

>
> > </code>

>
> > Doesn't seem too onerous, and could obviously be wrapped in
> > some useful class/module.

>
> > But if you really want to go the SQL route, I believe there are
> > ODBC adapters for CSV which, combined with PyODBC or CeODBC,
> > would probably take you where you want to go.

>
> > TJG

>
> Part of the problem is that the 'selection' needs to be in a config
> file. I can put the if row['status'] != 'Cancelled': return True into
> a config, read it and eval it, but its not quite as clean as an sql
> route.
>
> Nick


Well, if you are using 2.5.x you could always stuff it into a sqlite
in-memory database, and then execute a SQL query. Heck, you don't even
_need_ 2.5, but in 2.5 sqlite is part of the distribution.

Matt
 
Reply With Quote
 
Tim Golden
Guest
Posts: n/a
 
      05-05-2009
Nick wrote:
> Part of the problem is that the 'selection' needs to be in a config
> file. I can put the if row['status'] != 'Cancelled': return True into
> a config, read it and eval it, but its not quite as clean as an sql
> route.



Still not clear what the restriction is. If you were writing
SQL you'd have to read *something* from your config file,
unless you're suggesting that the "config file" is in fact
a SQL file. Which is one way of doing it, but then you might
just as well have your config file as a Python file and
import it.

Have I missed the point somewhere here? Can you give an
example -- even a fictional one -- of what you couldn't
do using, say, the example I gave earlier?

TJG
 
Reply With Quote
 
Nick
Guest
Posts: n/a
 
      05-07-2009
On May 5, 8:27*pm, Tim Golden <(E-Mail Removed)> wrote:
> Nick wrote:
> > Part of the problem is that the 'selection' needs to be in a config
> > file. I can put the if row['status'] != 'Cancelled': return True into
> > a config, read it and eval it, but its not quite as clean as ansql
> > route.

>
> Still not clear what the restriction is. If you were writingSQLyou'd have to read *something* from your config file,
> unless you're suggesting that the "config file" is in fact
> aSQLfile. Which is one way of doing it, but then you might
> just as well have your config file as a Python file and
> import it.
>
> Have I missed the point somewhere here? Can you give an
> example -- even a fictional one -- of what you couldn't
> do using, say, the example I gave earlier?
>
> TJG


Solution found. In the end I used SQLite to read from a csv file, and
now I can query the CSV file. The file is read using the csv module

First create a function

def fraction(p, denom):
num, frac = p.split ('-')
return float (num) + float (frac) / denom

for use within queries.

Now build the class.

self.filename = filename
self.dialect = dialect
self.query = query
reader = csv.reader (open (filename, 'r'))
self.connection = sqlite.connect(":memory:")
self.connection.create_function("fraction", 2, fraction) #
Adds in function
self.cursor = self.connection.cursor()
first = True
for row in reader:
if first:
headers = []
for r in row:
n = r.strip().replace (' ', '_').replace ('-','_')
headers.append (n)
command = 'create table csv (%s)' % ','.join (headers)
self.cursor.execute (command)
first = False
else:
command = 'insert into csv values ("%s")' % '","'.join
(row)
self.cursor.execute (command)

and then I can use this

self.cursor.execute (self.query)
rows = self.cursor.fetchall()
headers = []
for r in self.cursor.description:
headers.append (r[0])
results = Results.Results (headers, self.name, {})
i = 0
for row in rows:
results.add (row, i)
i = i + 1
return results

to query the results.

Results.Results is one of my classes that's reused in lots of places.

The query then looks somethign like this

select
Client_Reference_Number as TrdNbr,
Asset_Number as ISIN,
Quantity as Qty,
status
from
csv
where status in ("CANCELLED")

union

select
Client_Reference_Number as TrdNbr,
Asset_Number as ISIN,
Quantity as Qty,
status
from
csv
where status not in ("CANCELLED")


All incredibly neat and the first time I've used SQLite.

nick
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      05-08-2009
On May 8, 1:45*am, Nick <(E-Mail Removed)> wrote:
> On May 5, 8:27*pm, Tim Golden <(E-Mail Removed)> wrote:
>
>
>
> > Nick wrote:
> > > Part of the problem is that the 'selection' needs to be in a config
> > > file. I can put the if row['status'] != 'Cancelled': return True into
> > > a config, read it and eval it, but its not quite as clean as ansql
> > > route.

>
> > Still not clear what the restriction is. If you were writingSQLyou'd have to read *something* from your config file,
> > unless you're suggesting that the "config file" is in fact
> > aSQLfile. Which is one way of doing it, but then you might
> > just as well have your config file as a Python file and
> > import it.

>
> > Have I missed the point somewhere here? Can you give an
> > example -- even a fictional one -- of what you couldn't
> > do using, say, the example I gave earlier?

>
> > TJG

>
> Solution found. In the end I used SQLite to read from a csv file, and
> now I can query the CSV file. The file is read using the csv module
>
> First create a function
>
> def fraction(p, denom):
> * * num, frac = p.split ('-')
> * * return float (num) + float (frac) / denom
>
> for use within queries.
>
> Now build the class.
>
> * * * * self.filename *= filename
> * * * * self.dialect * = dialect
> * * * * self.query * * = query
> * * * * reader = csv.reader (open (filename, 'r'))
> * * * * self.connection = sqlite.connect(":memory:")
> * * * * self.connection.create_function("fraction", 2, fraction) #
> Adds in function
> * * * * self.cursor = self.connection.cursor()
> * * * * first = True
> * * * * for row in reader:
> * * * * * * if first:
> * * * * * * * * headers = []
> * * * * * * * * for r in row:
> * * * * * * * * * * n = r.strip().replace (' ', '_').replace ('-','_')
> * * * * * * * * * * headers.append (n)
> * * * * * * * * command = 'create table csv (%s)' % ','..join (headers)
> * * * * * * * * self.cursor.execute (command)
> * * * * * * * * first = False
> * * * * * * else:
> * * * * * * * * command = 'insert into csv values ("%s")' % '","'.join
> (row)
> * * * * * * * * self.cursor.execute (command)
>
> and then I can use this
>
> * * * * self.cursor.execute (self.query)
> * * * * rows = self.cursor.fetchall()
> * * * * headers = []
> * * * * for r in self.cursor.description:
> * * * * * * headers.append (r[0])
> * * * * results = Results.Results (headers, self.name, {})
> * * * * i = 0
> * * * * for row in rows:
> * * * * * * results.add (row, i)
> * * * * * * i = i + 1
> * * * * return results
>
> to query the results.
>
> Results.Results is one of my classes that's reused in lots of places.
>
> The query then looks somethign like this
>
> * * * * * * * * select
> * * * * * * * * * * Client_Reference_Number as TrdNbr,
> * * * * * * * * * * Asset_Number as ISIN,
> * * * * * * * * * * Quantity as Qty,
> * * * * * * * * * * status
> * * * * * * * * from
> * * * * * * * * * * csv


The remaining lines of your SELECT statement are incredibly redundant
AFAICT. It seems you have pushed the contents of your csv file into a
data base and pulled them ALL out again ... not what I'd call a
"query". What's the point?


> * * * * * * * * where status in ("CANCELLED")
>
> * * * * * * * * union
>
> * * * * * * * * select
> * * * * * * * * * * Client_Reference_Number as TrdNbr,
> * * * * * * * * * * Asset_Number as ISIN,
> * * * * * * * * * * Quantity as Qty,
> * * * * * * * * * * status
> * * * * * * * * from
> * * * * * * * * * * csv
> * * * * * * * * where status not in ("CANCELLED")
>
> All incredibly neat and the first time I've used SQLite.
>
> nick


 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      05-08-2009
On Thu, 7 May 2009 08:45:06 -0700 (PDT), Nick <(E-Mail Removed)>
declaimed the following in gmane.comp.python.general:


> The query then looks somethign like this
>
> select
> Client_Reference_Number as TrdNbr,
> Asset_Number as ISIN,
> Quantity as Qty,
> status
> from
> csv
> where status in ("CANCELLED")
>
> union
>
> select
> Client_Reference_Number as TrdNbr,
> Asset_Number as ISIN,
> Quantity as Qty,
> status
> from
> csv
> where status not in ("CANCELLED")
>

Pardon? The union of "in ('CANCELLED')" and "not in ('CANCELLED')"
will be ALL records... -- possibly ordered with the "cancelled" first...

I'd probably just use

select ... from csv
order by status;

to group the values.
--
Wulfraed Dennis Lee Bieber KD6MOG
http://www.velocityreviews.com/forums/(E-Mail Removed) (E-Mail Removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (E-Mail Removed))
HTTP://www.bestiaria.com/

 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      05-08-2009
Nick wrote:

> self.cursor = self.connection.cursor()
> first = True
> for row in reader:
> if first:
> headers = []
> for r in row:
> n = r.strip().replace (' ', '_').replace ('-','_')
> headers.append (n)
> command = 'create table csv (%s)' % ','.join (headers)
> self.cursor.execute (command)
> first = False
> else:
> command = 'insert into csv values ("%s")' % '","'.join
> (row)
> self.cursor.execute (command)
>


You can simplify that a bit:

cursor = self.cursor = self.connection.cursor()

first_row = next(reader)
headers = [column.strip().replace(" ", "_").replace("-", "_") for column in
first_row]
cursor.execute("create table csv (%s)" % ", ".join(headers))

placeholders = ", ".join("?"*len(headers))
command = "insert into csv values (%s)" % placeholders
cursor.executemany(command, reader)

While it may not matter here using placeholders instead of manually escaping
user-provided values is a good habit to get into.

> self.cursor.execute (self.query)
> rows = self.cursor.fetchall()


rows = self.cursor.execute(self.query)

doesn't build an intermediate list.

> i = 0
> for row in rows:
> results.add (row, i)
> i = i + 1


This is written

for i, row in enumerate(rows):
results.add(row, i)

in idiomatic Python.

Peter

 
Reply With Quote
 
Lawrence D'Oliveiro
Guest
Posts: n/a
 
      05-08-2009
In message <gu0ofm$oj9$00$(E-Mail Removed)-online.com>, Peter Otten wrote:

> While it may not matter here using placeholders instead of manually
> escaping user-provided values is a good habit to get into.


Until you hit things it can't deal with.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
csv read clean up and write out to csv Sacha Rook Python 2 11-02-2012 07:51 PM
read and write csv file using csv module jliu66 Python 0 10-19-2007 03:12 PM
How to move data from a CSV file to a JTable, and from a JTable to a CSV file ? Tintin92 Java 1 02-14-2007 06:51 PM
Re: csv writerow creates double spaced excel csv files Skip Montanaro Python 0 02-13-2004 08:50 PM
csv writerow creates double spaced excel csv files Michal Mikolajczyk Python 0 02-13-2004 08:38 PM



Advertisments