Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > sync databse table based on current directory data without losignprevious values

Reply
Thread Tools

sync databse table based on current directory data without losignprevious values

 
 
33
Guest
Posts: n/a
 
      03-06-2013
I'am using this snipper to read a current directory and insert all filenames into a databse and then display them.

But what happens when files are get removed form the directory?
The inserted records into databse remain.
How can i update the databse to only contain the existing filenames without losing the previous stored data?

Here is what i ahve so far:

==================================
path = "/home/nikos/public_html/data/files/"

#read the containing folder and insert new filenames
for result in os.walk(path):
for filename in result[2]:
try:
#find the needed counter for the page URL
cur.execute('''SELECT URL FROM files WHERE URL = %s''', (filename,) )
data = cur.fetchone() #URL is unique, so should only be one

if not data:
#first time for file; primary key is automatic, hit is defaulted
cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (filename, host, date) )
except MySQLdb.Error, e:
print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
======================

Thank you.
 
Reply With Quote
 
 
 
 
Lele Gaifax
Guest
Posts: n/a
 
      03-06-2013
Νίκος Γκρ33κ <> writes:

> How can i update the databse to only contain the existing filenames without losing the previous stored data?


Basically you need to keep a list (or better, a set) containing all
current filenames that you are going to insert, and finally do another
"inverse" loop where you scan all the records and delete those that are
not present anymore.

Of course, this assume you have a "bidirectional" identity between the
filenames you are loading and the records you are inserting, which is
not the case in the code you show:

> #read the containing folder and insert new filenames
> for result in os.walk(path):
> for filename in result[2]:


Here "filename" is just that, not the full path: this could result in
collisions, if your are actually loading a *tree* instead of a flat
directory, that is multiple source files are squeezed into a single
record in your database (imagine "/foo/index.html" and
"/foo/subdir/index.html").

With that in mind, I would do something like the following:

# Compute a set of current fullpaths
current_fullpaths = set()
for root, dirs, files in os.walk(path):
for fullpath in files:
current_fullpaths.add(os.path.join(root, file))

# Load'em
for fullpath in current_fullpaths:

try:
#find the needed counter for the page URL
cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) )
data = cur.fetchone() #URL is unique, so should only be one

if not data:
#first time for file; primary key is automatic, hit is defaulted
cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
except MySQLdb.Error, e:
print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )

# Delete spurious
cur.execute('''SELECT url FROM files''')
for rec in cur:
fullpath = rec[0]
if fullpath not in current_fullpaths:
other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))

Of course here I am assuming a lot (a typical thing we do to answer your
questions , in particular that the "url" field content matches the
filesystem layout, which may not be the case. Adapt it to your usecase.

hope this helps,
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
| -- Fortunato Depero, 1929.

 
Reply With Quote
 
 
 
 
33
Guest
Posts: n/a
 
      03-06-2013
Τη Τετάρτη, 6 Μαρτίου 2013 10:19:06 π.μ. UTC+2, ο χρήστης Lele Gaifax *γραψε:
> Νίκος Γκρ33κ <> writes:
>
>
>
> > How can i update the databse to only contain the existing filenames without losing the previous stored data?

>
>
>
> Basically you need to keep a list (or better, a set) containing all
>
> current filenames that you are going to insert, and finally do another
>
> "inverse" loop where you scan all the records and delete those that are
>
> not present anymore.
>
>
>
> Of course, this assume you have a "bidirectional" identity between the
>
> filenames you are loading and the records you are inserting, which is
>
> not the case in the code you show:
>
>
>
> > #read the containing folder and insert new filenames

>
> > for result in os.walk(path):

>
> > for filename in result[2]:

>
>
>
> Here "filename" is just that, not the full path: this could result in
>
> collisions, if your are actually loading a *tree* instead of a flat
>
> directory, that is multiple source files are squeezed into a single
>
> record in your database (imagine "/foo/index.html" and
>
> "/foo/subdir/index.html").
>
>
>
> With that in mind, I would do something like the following:
>
>
>
> # Compute a set of current fullpaths
>
> current_fullpaths = set()
>
> for root, dirs, files in os.walk(path):
>
> for fullpath in files:
>
> current_fullpaths.add(os.path.join(root, file))
>
>
>
> # Load'em
>
> for fullpath in current_fullpaths:
>
>
>
> try:
>
> #find the needed counter for the page URL
>
> cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) )
>
> data = cur.fetchone() #URL is unique, so should only be one
>
>
>
> if not data:
>
> #first time for file; primary key is automatic, hit is defaulted
>
> cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
>
> except MySQLdb.Error, e:
>
> print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
>
>
>
> # Delete spurious
>
> cur.execute('''SELECT url FROM files''')
>
> for rec in cur:
>
> fullpath = rec[0]
>
> if fullpath not in current_fullpaths:
>
> other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))
>
>
>
> Of course here I am assuming a lot (a typical thing we do to answer your
>
> questions , in particular that the "url" field content matches the
>
> filesystem layout, which may not be the case. Adapt it to your usecase.
>
>
>
> hope this helps,
>
> ciao, lele.
>
> --
>
> nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
>
> real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
>
> | -- Fortunato Depero, 1929.


You are fantastic! Your straightforward logic amazes me!

Thank you very much for making things clear to me!!

But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here:

http://superhost.gr/cgi-bin/files.py
 
Reply With Quote
 
33
Guest
Posts: n/a
 
      03-06-2013
Τη Τετάρτη, 6 Μαρτίου 2013 10:19:06 π.μ. UTC+2, ο χρήστης Lele Gaifax *γραψε:
> Νίκος Γκρ33κ <> writes:
>
>
>
> > How can i update the databse to only contain the existing filenames without losing the previous stored data?

>
>
>
> Basically you need to keep a list (or better, a set) containing all
>
> current filenames that you are going to insert, and finally do another
>
> "inverse" loop where you scan all the records and delete those that are
>
> not present anymore.
>
>
>
> Of course, this assume you have a "bidirectional" identity between the
>
> filenames you are loading and the records you are inserting, which is
>
> not the case in the code you show:
>
>
>
> > #read the containing folder and insert new filenames

>
> > for result in os.walk(path):

>
> > for filename in result[2]:

>
>
>
> Here "filename" is just that, not the full path: this could result in
>
> collisions, if your are actually loading a *tree* instead of a flat
>
> directory, that is multiple source files are squeezed into a single
>
> record in your database (imagine "/foo/index.html" and
>
> "/foo/subdir/index.html").
>
>
>
> With that in mind, I would do something like the following:
>
>
>
> # Compute a set of current fullpaths
>
> current_fullpaths = set()
>
> for root, dirs, files in os.walk(path):
>
> for fullpath in files:
>
> current_fullpaths.add(os.path.join(root, file))
>
>
>
> # Load'em
>
> for fullpath in current_fullpaths:
>
>
>
> try:
>
> #find the needed counter for the page URL
>
> cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) )
>
> data = cur.fetchone() #URL is unique, so should only be one
>
>
>
> if not data:
>
> #first time for file; primary key is automatic, hit is defaulted
>
> cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
>
> except MySQLdb.Error, e:
>
> print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
>
>
>
> # Delete spurious
>
> cur.execute('''SELECT url FROM files''')
>
> for rec in cur:
>
> fullpath = rec[0]
>
> if fullpath not in current_fullpaths:
>
> other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))
>
>
>
> Of course here I am assuming a lot (a typical thing we do to answer your
>
> questions , in particular that the "url" field content matches the
>
> filesystem layout, which may not be the case. Adapt it to your usecase.
>
>
>
> hope this helps,
>
> ciao, lele.
>
> --
>
> nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
>
> real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
>
> | -- Fortunato Depero, 1929.


You are fantastic! Your straightforward logic amazes me!

Thank you very much for making things clear to me!!

But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here:

http://superhost.gr/cgi-bin/files.py
 
Reply With Quote
 
Lele Gaifax
Guest
Posts: n/a
 
      03-06-2013
Νίκος Γκρ33κ <> writes:

> Thank you very much for making things clear to me!!


You're welcome, even more if you spend 1 second to trim your answers
removing unneeded citation

>
> But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here:
>
> http://superhost.gr/cgi-bin/files.py


Sorry, this seems completely unrelated, and from the little snippet that
appear on that page I cannot understand what's going on there.

ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
| -- Fortunato Depero, 1929.

 
Reply With Quote
 
33
Guest
Posts: n/a
 
      03-06-2013
Its about the following line of code:

current_fullpaths.add( os.path.join(root, files) )


that presents the following error:

<type 'exceptions.AttributeError'>: 'list' object has no attribute 'startswith'
args = ("'list' object has no attribute 'startswith'",)
message = "'list' object has no attribute 'startswith'"

join calls some module that find difficulty when parsing its line:

/usr/lib64/python2.6/posixpath.py in join(a='/home/nikos/public_html/data/files/', *p=(['\xce\x9a\xcf\x8d\xcf\x81\xce\xb9\xce\xb5 \xce\x99\xce\xb7\xcf\x83\xce\xbf\xcf\x8d \xce\xa7\xcf\x81\xce\xb9\xcf\x83\xcf\x84\xce\xad \xce\x95\xce\xbb\xce\xad\xce\xb7\xcf\x83\xce\xbf\x ce\xbd \xce\x9c\xce\xb5.mp3', '\xce\xa0\xce\xb5\xcf\x81\xce\xaf \xcf\x84\xcf\x89\xce\xbd \xce\x9b\xce\xbf\xce\xb3\xce\xb9\xcf\x83\xce\xbc\x cf\x8e\xce\xbd.mp3'],))
63 path = a
64 for b in p:
65 if b.startswith('/'):
 
Reply With Quote
 
33
Guest
Posts: n/a
 
      03-06-2013
Its about the following line of code:

current_fullpaths.add( os.path.join(root, files) )


that presents the following error:

<type 'exceptions.AttributeError'>: 'list' object has no attribute 'startswith'
args = ("'list' object has no attribute 'startswith'",)
message = "'list' object has no attribute 'startswith'"

join calls some module that find difficulty when parsing its line:

/usr/lib64/python2.6/posixpath.py in join(a='/home/nikos/public_html/data/files/', *p=(['\xce\x9a\xcf\x8d\xcf\x81\xce\xb9\xce\xb5 \xce\x99\xce\xb7\xcf\x83\xce\xbf\xcf\x8d \xce\xa7\xcf\x81\xce\xb9\xcf\x83\xcf\x84\xce\xad \xce\x95\xce\xbb\xce\xad\xce\xb7\xcf\x83\xce\xbf\x ce\xbd \xce\x9c\xce\xb5.mp3', '\xce\xa0\xce\xb5\xcf\x81\xce\xaf \xcf\x84\xcf\x89\xce\xbd \xce\x9b\xce\xbf\xce\xb3\xce\xb9\xcf\x83\xce\xbc\x cf\x8e\xce\xbd.mp3'],))
63 path = a
64 for b in p:
65 if b.startswith('/'):
 
Reply With Quote
 
33
Guest
Posts: n/a
 
      03-06-2013
Perhaps because my filenames is in greek letters that thsi error is presented but i'am not sure.....

Maybe we can join root+files and store it to the set() someway differenyl....
 
Reply With Quote
 
33
Guest
Posts: n/a
 
      03-06-2013
Perhaps because my filenames is in greek letters that thsi error is presented but i'am not sure.....

Maybe we can join root+files and store it to the set() someway differenyl....
 
Reply With Quote
 
Wong Wah Meng-R32813
Guest
Posts: n/a
 
      03-06-2013
Hello there,

I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box.

I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was.

Any idea why?


>>> for i in range(100000L):

... str=str+"%s"%(i,)
...

>>> i=None
>>> str=None
>>> del i
>>> del str


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
page data retriview -databse or xml (ASP.NET2) =?Utf-8?B?cGVsZWdrMQ==?= ASP .Net 0 07-16-2007 09:14 PM
insert values into databse using sqlserverstoredprocedure with asp.net Sirisha ASP .Net 1 02-20-2007 09:09 AM
Set a bgcolor in a repeater based on databse value asp newbie Jim Florence ASP .Net 4 06-24-2006 12:18 PM
sync.rb difference between Sync::UN, Sync::EX and Sync::SH Trans Ruby 2 12-12-2005 02:43 PM
non-form based databse application sreenivasan alakappan C++ 6 04-08-2004 02:24 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57