Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > randomly write to a file

Reply
Thread Tools

randomly write to a file

 
 
rohit
Guest
Posts: n/a
 
      05-07-2007
hi,
i am developing a desktop search.For the index of the files i have
developed an algorithm with which
i should be able to read and write to a line if i know its line
number.
i can read a specified line by using the module linecache
but i am struck as to how to implement writing to the n(th) line in a
file EFFICIENTLY
which means i don't want to traverse the file sequentially to reach
the n(th) line

Please help.
Regards
Rohit

 
Reply With Quote
 
 
 
 
kyosohma@gmail.com
Guest
Posts: n/a
 
      05-07-2007
On May 7, 2:51 pm, rohit <(E-Mail Removed)> wrote:
> hi,
> i am developing a desktop search.For the index of the files i have
> developed an algorithm with which
> i should be able to read and write to a line if i know its line
> number.
> i can read a specified line by using the module linecache
> but i am struck as to how to implement writing to the n(th) line in a
> file EFFICIENTLY
> which means i don't want to traverse the file sequentially to reach
> the n(th) line
>
> Please help.
> Regards
> Rohit


Hi,

Looking through the archives, it looks like some recommend reading the
file into a list and doing it that way. And if they file is too big,
than use a database. See links below:

http://mail.python.org/pipermail/tut...ch/045571.html
http://mail.python.org/pipermail/tut...ch/045572.html

I also found this interesting idea that explains what would be needed
to accomplish this task:

http://mail.python.org/pipermail/pyt...il/076890.html

Have fun!

Mike

 
Reply With Quote
 
 
 
 
Gabriel Genellina
Guest
Posts: n/a
 
      05-07-2007
En Mon, 07 May 2007 16:51:37 -0300, rohit <(E-Mail Removed)>
escribió:

> i am developing a desktop search.For the index of the files i have
> developed an algorithm with which
> i should be able to read and write to a line if i know its line
> number.
> i can read a specified line by using the module linecache
> but i am struck as to how to implement writing to the n(th) line in a
> file EFFICIENTLY
> which means i don't want to traverse the file sequentially to reach
> the n(th) line


You can only replace a line in-place with another of exactly the same
length. If the lengths differ, you have to write the modified line and all
the following ones.
If all your lines are of fixed length, you have a "record". To read record
N (counting from 0):
a_file.seek(N*record_length)
return a_file.read(record_length)
And then you are reinventing ISAM.

--
Gabriel Genellina

 
Reply With Quote
 
Nick Vatamaniuc
Guest
Posts: n/a
 
      05-07-2007
Rohit,

Consider using an SQLite database. It comes with Python 2.5 and
higher. SQLite will do a nice job keeping track of the index. You can
easily find the line you need with a SQL query and your can write to
it as well. When you have a file and you write to one line of the
file, all of the rest of the lines will have to be shifted to
accommodate, the potentially larger new line.

-Nick Vatamaniuc


On May 7, 3:51 pm, rohit <(E-Mail Removed)> wrote:
> hi,
> i am developing a desktop search.For the index of the files i have
> developed an algorithm with which
> i should be able to read and write to a line if i know its line
> number.
> i can read a specified line by using the module linecache
> but i am struck as to how to implement writing to the n(th) line in a
> file EFFICIENTLY
> which means i don't want to traverse the file sequentially to reach
> the n(th) line
>
> Please help.
> Regards
> Rohit



 
Reply With Quote
 
rohit
Guest
Posts: n/a
 
      05-07-2007
nick,
i just wanted to ask for time constrained applications like searching
won't sqlite be a expensive approach.
i mean searching and editing o the files is less expensive by the time
taken .
so i need an approach which will allow me writing randomly to a line
in file without using a database
On May 8, 2:41 am, Nick Vatamaniuc <(E-Mail Removed)> wrote:
> Rohit,
>
> Consider using an SQLite database. It comes with Python 2.5 and
> higher. SQLite will do a nice job keeping track of the index. You can
> easily find the line you need with a SQL query and your can write to
> it as well. When you have a file and you write to one line of the
> file, all of the rest of the lines will have to be shifted to
> accommodate, the potentially larger new line.
>
> -Nick Vatamaniuc
>


 
Reply With Quote
 
rohit
Guest
Posts: n/a
 
      05-07-2007
hi gabriel,
i am utilizing file names and their paths which are written to a file
on a singe line.
now if i use records that would be wasting too much space as there is
no limit on the no. of characters (at max) in the path.
next best approach i can think of is reading the file in memory
editing it and writing the portion that has just been altered and the
followiing lines
but is there a better approach you can highlight?

> You can only replace a line in-place with another of exactly the same
> length. If the lengths differ, you have to write the modified line and all
> the following ones.
> If all your lines are of fixed length, you have a "record". To read record
> N (counting from 0):
> a_file.seek(N*record_length)
> return a_file.read(record_length)
> And then you are reinventing ISAM.
>
> --
> Gabriel Genellina



 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      05-08-2007
On Mon, 07 May 2007 12:51:37 -0700, rohit wrote:

> i can read a specified line by using the module linecache but i am
> struck as to how to implement writing to the n(th) line in a file
> EFFICIENTLY
> which means i don't want to traverse the file sequentially to reach the
> n(th) line


Unless you are lucky enough to be using an OS that supports random-access
line access to text files natively, if such a thing even exists, you
can't because you don't know how long each line will be.

If you can guarantee fixed-length lines, then you can use file.seek() to
jump to the appropriate byte position.

If the lines are random lengths, but you can control access to the files
so other applications can't write to them, you can keep an index table,
which you update as needed.

Otherwise, if the files are small enough, say up to 20 or 40MB each, just
read them entirely into memory.

Otherwise, you're out of luck.


--
Steven.
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      05-08-2007
On Mon, 07 May 2007 14:41:02 -0700, Nick Vatamaniuc wrote:

> Rohit,
>
> Consider using an SQLite database. It comes with Python 2.5 and higher.
> SQLite will do a nice job keeping track of the index. You can easily
> find the line you need with a SQL query and your can write to it as
> well. When you have a file and you write to one line of the file, all of
> the rest of the lines will have to be shifted to accommodate, the
> potentially larger new line.



Using an database for tracking line number and byte position -- isn't
that a bit overkill?

I would have thought something as simple as a list of line lengths would
do:

offsets = [35, # first line is 35 bytes long
19, # second line is 19 bytes long...
45, 12, 108, 67]


To get to the nth line, you have to seek to byte position:

sum(offsets[:n])



--
Steven.
 
Reply With Quote
 
Alex Martelli
Guest
Posts: n/a
 
      05-08-2007
Steven D'Aprano <(E-Mail Removed)> wrote:

> On Mon, 07 May 2007 14:41:02 -0700, Nick Vatamaniuc wrote:
>
> > Rohit,
> >
> > Consider using an SQLite database. It comes with Python 2.5 and higher.
> > SQLite will do a nice job keeping track of the index. You can easily
> > find the line you need with a SQL query and your can write to it as
> > well. When you have a file and you write to one line of the file, all of
> > the rest of the lines will have to be shifted to accommodate, the
> > potentially larger new line.

>
>
> Using an database for tracking line number and byte position -- isn't
> that a bit overkill?
>
> I would have thought something as simple as a list of line lengths would
> do:
>
> offsets = [35, # first line is 35 bytes long
> 19, # second line is 19 bytes long...
> 45, 12, 108, 67]
>
>
> To get to the nth line, you have to seek to byte position:
>
> sum(offsets[:n])


....and then you STILL can't write there (without reading and rewriting
all the succeeding part of the file) unless the line you're writing is
always the same length as the one you're overwriting, which doesn't seem
to be part of the constraints in the OP's original application. I'm
with Nick in recommending SQlite for the purpose -- it _IS_ quite
"lite", as its name suggests. BSD-DB (a DB that's much more complicated
to use, being far lower-level, but by the same token affords you
extremely fine-grained control of operations) might be an alternative
IF, after first having coded the application with SQLite, you can indeed
prove, profiler in hand, that it's a serious bottleneck. However,
premature optimization is the root of all evil in programming.


Alex
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      05-08-2007
On Mon, 07 May 2007 20:00:57 -0700, Alex Martelli wrote:

> Steven D'Aprano <(E-Mail Removed)> wrote:
>
>> On Mon, 07 May 2007 14:41:02 -0700, Nick Vatamaniuc wrote:
>>
>> > Rohit,
>> >
>> > Consider using an SQLite database. It comes with Python 2.5 and
>> > higher. SQLite will do a nice job keeping track of the index. You can
>> > easily find the line you need with a SQL query and your can write to
>> > it as well. When you have a file and you write to one line of the
>> > file, all of the rest of the lines will have to be shifted to
>> > accommodate, the potentially larger new line.

>>
>>
>> Using an database for tracking line number and byte position -- isn't
>> that a bit overkill?
>>
>> I would have thought something as simple as a list of line lengths
>> would do:
>>
>> offsets = [35, # first line is 35 bytes long
>> 19, # second line is 19 bytes long... 45, 12, 108, 67]
>>
>>
>> To get to the nth line, you have to seek to byte position:
>>
>> sum(offsets[:n])

>
> ...and then you STILL can't write there (without reading and rewriting
> all the succeeding part of the file) unless the line you're writing is
> always the same length as the one you're overwriting, which doesn't seem
> to be part of the constraints in the OP's original application. I'm
> with Nick in recommending SQlite for the purpose -- it _IS_ quite
> "lite", as its name suggests.



Hang on, as I understand it, Nick just suggesting using SQlite for
holding indexes into the file! That's why I said it was overkill. So
whether the indexes are in a list or a database, you've _still_ got to
deal with writing to the file.

If I've misunderstood Nick's suggestion, if he actually meant to read the
entire text file into the database, well, that's just a heavier version
of reading the file into a list of strings, isn't it? If the database
gives you more and/or better functionality than file.readlines(), then I
have no problem with using the right tool for the job.


--
Steven.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Upload file functionality fails randomly =?Utf-8?B?TWFydGluSg==?= ASP .Net 2 05-14-2007 01:59 AM
WIFI connection randomly/non randomly disconnects King Fu Wireless Networking 2 11-10-2004 07:03 AM
Win XP FILE Explorer shuting down randomly! era Computer Information 2 07-05-2004 04:29 AM
Re: Randomly Assign File Jack Moskowitz ASP .Net 0 08-23-2003 02:17 PM
Re: Randomly Assign File George Ter-Saakov ASP .Net 0 08-21-2003 05:59 PM



Advertisments