Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Manipulate Large Binary Files

Reply
Thread Tools

Re: Manipulate Large Binary Files

 
 
Derek Martin
Guest
Posts: n/a
 
      04-02-2008
On Wed, Apr 02, 2008 at 02:09:45PM -0400, Derek Tracy wrote:
> Both are clocking in at the same time (1m 5sec for 2.6Gb), are there
> any ways I can optimize either solution?


Buy faster disks? How long do you expect it to take? At 65s, you're
already reading/writing 2.6GB at a sustained transfer rate of about
42.6 MB/s. That's nothing to sneeze at... Your disks, and not your
program, are almost certainly the real bottleneck. Unless you have
reason to believe your hardware should be significantly faster...

That said, due to normal I/O generally involving double-buffering, you
might be able to speed things up noticably by using Memory-Mapped I/O
(MMIO). It depends on whether or not the implementation of the Python
things you're using already use MMIO under the hood, and whether or
not MMIO happens to be broken in your OS.

> Would turning off the read/write buff increase speed?


No...

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFH8+x5HEnASN++rQIRAvGNAJ92jOdZw3hG21PJz6Nav5 wfv5FaxACdHjkN
qiJtKaj2brdY+spF1bClRT0=
=5QGm
-----END PGP SIGNATURE-----

 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      04-03-2008
Derek Martin <> writes:
> > Both are clocking in at the same time (1m 5sec for 2.6Gb), are there
> > any ways I can optimize either solution?


Getting 40+ MB/sec through a file system is pretty impressive.
Sounds like a RAID?

> That said, due to normal I/O generally involving double-buffering, you
> might be able to speed things up noticably by using Memory-Mapped I/O
> (MMIO). It depends on whether or not the implementation of the Python
> things you're using already use MMIO under the hood, and whether or
> not MMIO happens to be broken in your OS.


Python has the mmap module and I use it sometimes, but it's not
necessarily the right thing for something like this. Each page you
try to read from results in own delay while the resulting page fault
is serviced, so any overlapped i/o you get comes from the OS being
nice enough to do some predictive readahead for you on sequential
access if it does that. By coincidence there are a couple other
threads mentioning AIO which is a somewhat more powerful mechanism.

 
Reply With Quote
 
 
 
 
Derek Tracy
Guest
Posts: n/a
 
      04-03-2008


---------------------------
Derek Tracy

---------------------------

On Apr 3, 2008, at 3:03 AM, Paul Rubin <"http://
phr.cx"@NOSPAM.invalid> wrote:

> Derek Martin <> writes:
>>> Both are clocking in at the same time (1m 5sec for 2.6Gb), are there
>>> any ways I can optimize either solution?

>
> Getting 40+ MB/sec through a file system is pretty impressive.
> Sounds like a RAID?
>
>> That said, due to normal I/O generally involving double-buffering,
>> you
>> might be able to speed things up noticably by using Memory-Mapped I/O
>> (MMIO). It depends on whether or not the implementation of the
>> Python
>> things you're using already use MMIO under the hood, and whether or
>> not MMIO happens to be broken in your OS.

>
> Python has the mmap module and I use it sometimes, but it's not
> necessarily the right thing for something like this. Each page you
> try to read from results in own delay while the resulting page fault
> is serviced, so any overlapped i/o you get comes from the OS being
> nice enough to do some predictive readahead for you on sequential
> access if it does that. By coincidence there are a couple other
> threads mentioning AIO which is a somewhat more powerful mechanism.
>
> --
> http://mail.python.org/mailman/listinfo/python-list

 
Reply With Quote
 
Derek Tracy
Guest
Posts: n/a
 
      04-03-2008

On Apr 3, 2008, at 3:03 AM, Paul Rubin <"http://
phr.cx"@NOSPAM.invalid> wrote:

> Derek Martin <> writes:
>>> Both are clocking in at the same time (1m 5sec for 2.6Gb), are there
>>> any ways I can optimize either solution?

>
> Getting 40+ MB/sec through a file system is pretty impressive.
> Sounds like a RAID?
>
>> That said, due to normal I/O generally involving double-buffering,
>> you
>> might be able to speed things up noticably by using Memory-Mapped I/O
>> (MMIO). It depends on whether or not the implementation of the
>> Python
>> things you're using already use MMIO under the hood, and whether or
>> not MMIO happens to be broken in your OS.

>
> Python has the mmap module and I use it sometimes, but it's not
> necessarily the right thing for something like this. Each page you
> try to read from results in own delay while the resulting page fault
> is serviced, so any overlapped i/o you get comes from the OS being
> nice enough to do some predictive readahead for you on sequential
> access if it does that. By coincidence there are a couple other
> threads mentioning AIO which is a somewhat more powerful mechanism.
>
> --
> http://mail.python.org/mailman/listinfo/python-list


I am running it on a RAID(stiped raid 5 using fibre channel), but I
was expecting better performance.

I will have to check into AIO, thanks for the bone.
 
Reply With Quote
 
Derek Martin
Guest
Posts: n/a
 
      04-03-2008
On Thu, Apr 03, 2008 at 02:36:02PM -0400, Derek Tracy wrote:
> I am running it on a RAID(stiped raid 5 using fibre channel), but I
> was expecting better performance.


Don't forget that you're reading from and writing to the same
spindles. Writes are slower on RAID 5, and you have to read the data
before you can write it...

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFH9WNNHEnASN++rQIRAiqbAKCH9C/9KI/Tyg9scbDPwEg8RO8XdwCgsX1F
GxJohTpsKQ4IKVyxWWZumRM=
=qcoT
-----END PGP SIGNATURE-----

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to parse and manipulate a binary stream topcat.nyc@googlemail.com Java 5 11-30-2006 08:55 AM
regd efficient methods to manipulate *large* files Madhusudhanan Chandrasekaran Python 2 05-01-2006 09:19 PM
site Design tool that could manipulate aspx files Olivier Matrot ASP .Net 2 05-03-2005 11:57 AM
To manipulate or not to manipulate Dave C++ 1 01-22-2005 10:52 PM
Backing Up Large Files..Or A Large Amount Of Files Scott D. Weber For Unuathorized Thoughts Inc. Computer Support 1 09-19-2003 07:28 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57