Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > RE: Python does not take up available physical memory

Reply
Thread Tools

RE: Python does not take up available physical memory

 
 
Pradipto Banerjee
Guest
Posts: n/a
 
      10-19-2012
Thanks, I tried that. Still got MemoryError, but at least this time python tried to use the physical memory. What I noticed is that before it gave me the error it used up to 1.5GB (of the 2.23 GB originally showed as available) - so in general, python takes up more memory than the size of the file itself.

-----Original Message-----
From: Python-list [mailtoython-list-bounces+pradipto.banerjee=adainvestments.com@pytho n.org] On Behalf Of Emile van Sebille
Sent: Saturday, October 20, 2012 2:46 AM
To: http://www.velocityreviews.com/forums/(E-Mail Removed)
Subject: Re: Python does not take up available physical memory

On 10/19/2012 10:08 AM, Pradipto Banerjee wrote:
> Hi,
>
> I am trying to read a file into memory. The size of the file is around 1
> GB. I have a 3GB memory PC and the Windows Task Manager shows 2.3 GB
> available physical memory when I was trying to read the file. I tried to
> read the file as follows:
>
>>>> fdata = open(filename, 'r').read()

>
> I got a "MemoryError". I was watching the Windows Task Manager while I
> run the python command, and it appears that python **perhaps** never
> even attempted to use more memory but gave me this error.
>
> Is there any reason why python can't read a 1GB file in memory even when
> a 2.3 GB physical memory is available?


The real issue is likely that there is more than one copy of the file in
memory somewhere. I had a similar issue years back that I resolved by
using numeric (now numpy?) as it had a more efficient method of
importing content from disk.

Also realize that windows may not allow the full memory to user space.
I'm not sure what exactly the restrictions are, but a 4Gb windows box
doesn't always get you 4Gb of memory.

Emile


--
http://mail.python.org/mailman/listinfo/python-list

This communication is for informational purposes only. It is not intended to be, nor should it be construed or used as, financial, legal, tax or investment advice or an offer to sell, or a solicitation of any offer to buy, an interest in any fund advised by Ada Investment Management LP, the Investment advisor. Any offer or solicitation of an investment in any of the Funds may be made only by delivery of such Funds confidential offering materials to authorized prospective investors. An investment in any of the Funds is not suitable for all investors. No representation is made that the Fundswill or are likely to achieve their objectives, or that any investor will or is likely to achieve results comparable to those shown, or will make anyprofit at all or will be able to avoid incurring substantial losses. Performance results are net of applicable fees, are unaudited and reflect reinvestment of income and profits. Past performance is no guarantee of future results. All financial data and other information are not warranted as to completeness or accuracy and are subject to change without notice.

Any comments or statements made herein do not necessarily reflect those of Ada Investment Management LP and its affiliates. This transmission may contain information that is confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format.
 
Reply With Quote
 
 
 
 
Thomas Rachel
Guest
Posts: n/a
 
      10-19-2012
Am 19.10.2012 21:03 schrieb Pradipto Banerjee:

> Thanks, I tried that.


What is "that"? It would be helpful to quote in a reasonable way. Look
how others do it.


> Still got MemoryError, but at least this time python tried to use the
> physical memory. What I noticed is that before it gave me the error
> it used up to 1.5GB (of the 2.23 GB originally showed as available) -
> so in general, python takes up more memory than the size of the file
> itself.


Of course - the file is not the only thing to be held by the process.

I see several approaches here:

* Process the file part by part - as the others already suggested,
line-wise, but if you have e.g. a binary file format, other partings may
be suitable as well - e.g. fixed block size, or parts given by the file
format.

* If you absolutely have to keep the whole file data in memory, split it
up in several strings. Why? Well, the free space in virtual memory is
not necessarily contiguous. So even if you have 1.5G free, you might not
be able to read 1.5G at once, but you might succeed in reading 3*0.5G.



Thomas
 
Reply With Quote
 
 
 
 
Steven D'Aprano
Guest
Posts: n/a
 
      10-19-2012
On Fri, 19 Oct 2012 14:03:37 -0500, Pradipto Banerjee wrote:

> Thanks, I tried that. Still got MemoryError, but at least this time
> python tried to use the physical memory. What I noticed is that before
> it gave me the error it used up to 1.5GB (of the 2.23 GB originally
> showed as available) - so in general, python takes up more memory than
> the size of the file itself.


Well of course it does. Once you read the data into memory, it has its
own overhead for the object structure.

You haven't told us what the file is or how you are reading it. I'm going
to assume it is ASCII text and you are using Python 2.

py> open("test file", "w").write("abcde")
py> os.stat("test file").st_size
5L
py> text = open("test file", "r").read()
py> len(text)
5
py> sys.getsizeof(text)
26

So that confirms that a five byte ASCII string takes up five bytes on
disk but 26 bytes in memory as an object.

That overhead will depend on what sort of object, whether Unicode or not,
the version of Python, and how you read the data.

In general, if you have a huge amount of data to work with, you should
try to work with it one line at a time:

for line in open("some file"):
process(line)


rather than reading the whole file into memory at once:

lines = open("some file").readlines()
for line in lines:
process(line)



--
Steven
 
Reply With Quote
 
Pradipto Banerjee
Guest
Posts: n/a
 
      10-19-2012
Thanks, for the illustration. This seems to be one of the biggest shortcomings of Python vs. Matlab. A number of people told me to read one line at a time, but I have a need to run processes on the whole data, e.g. compare one line versus another. So that option doesn't work.

-----Original Message-----
From: Python-list [mailtoython-list-bounces+pradipto.banerjee=adainvestments.com@pytho n.org] On Behalf Of Steven D'Aprano
Sent: Friday, October 19, 2012 6:12 PM
To: (E-Mail Removed)
Subject: Re: Python does not take up available physical memory

On Fri, 19 Oct 2012 14:03:37 -0500, Pradipto Banerjee wrote:

> Thanks, I tried that. Still got MemoryError, but at least this time
> python tried to use the physical memory. What I noticed is that before
> it gave me the error it used up to 1.5GB (of the 2.23 GB originally
> showed as available) - so in general, python takes up more memory than
> the size of the file itself.


Well of course it does. Once you read the data into memory, it has its
own overhead for the object structure.

You haven't told us what the file is or how you are reading it. I'm going
to assume it is ASCII text and you are using Python 2.

py> open("test file", "w").write("abcde")
py> os.stat("test file").st_size
5L
py> text = open("test file", "r").read()
py> len(text)
5
py> sys.getsizeof(text)
26

So that confirms that a five byte ASCII string takes up five bytes on
disk but 26 bytes in memory as an object.

That overhead will depend on what sort of object, whether Unicode or not,
the version of Python, and how you read the data.

In general, if you have a huge amount of data to work with, you should
try to work with it one line at a time:

for line in open("some file"):
process(line)


rather than reading the whole file into memory at once:

lines = open("some file").readlines()
for line in lines:
process(line)



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list

This communication is for informational purposes only. It is not intended to be, nor should it be construed or used as, financial, legal, tax or investment advice or an offer to sell, or a solicitation of any offer to buy, an interest in any fund advised by Ada Investment Management LP, the Investment advisor. Any offer or solicitation of an investment in any of the Funds may be made only by delivery of such Funds confidential offering materials to authorized prospective investors. An investment in any of the Funds is not suitable for all investors. No representation is made that the Fundswill or are likely to achieve their objectives, or that any investor will or is likely to achieve results comparable to those shown, or will make anyprofit at all or will be able to avoid incurring substantial losses. Performance results are net of applicable fees, are unaudited and reflect reinvestment of income and profits. Past performance is no guarantee of future results. All financial data and other information are not warranted as to completeness or accuracy and are subject to change without notice.

Any comments or statements made herein do not necessarily reflect those of Ada Investment Management LP and its affiliates. This transmission may contain information that is confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format.
 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      10-20-2012
On Fri, 19 Oct 2012 17:20:23 -0500, Pradipto Banerjee
<(E-Mail Removed)> declaimed the following in
gmane.comp.python.general:

> Thanks, for the illustration. This seems to be one of the biggest shortcomings of Python vs. Matlab. A number of people told me to read one line at a time, but I have a need to run processes on the whole data, e.g. compare one line versus another. So that option doesn't work.


And that requirement already suggests that reading the file en-mass
is inappropriate... Reading a 1GB mass and THEN splitting it into lines
means you have 2GB (not counting overhead) in memory for some period of
time (assuming the OS found a 1GB contiguous chunk of memory).

I suspect Matlab's read is internally parsing on lines. You don't
show the related Matlab read statement but...
http://www.mathworks.com/help/matlab/ref/fscanf.html does both the read
AND the conversion to the binary array format -- it doesn't read the
file as a chunk and THEN convert it to an array; it only reads enough to
fulfill one "format" string, saves that conversion, then reads the next
amount.

Large data DIFF and SORT are seldom run as in-memory operations --
they work line-by-line using files (in the case of some SORT algorithms,
many files: load 50-100 lines from source, sort in-memory, write to
file-1; repeat for file-2, -3, ... -n; when you have written to "n"
files, start back with the first file... Then do an -n file merge to
another n-files... Repeat until there is only one output file)
--
Wulfraed Dennis Lee Bieber AF6VN
(E-Mail Removed) HTTP://wlfraed.home.netcom.com/

 
Reply With Quote
 
Alain Ketterlin
Guest
Posts: n/a
 
      10-20-2012
Thomas Rachel
<(E-Mail Removed)>
writes:

> Am 19.10.2012 21:03 schrieb Pradipto Banerjee:


[...]
>> Still got MemoryError, but at least this time python tried to use the
>> physical memory. What I noticed is that before it gave me the error
>> it used up to 1.5GB (of the 2.23 GB originally showed as available) -
>> so in general, python takes up more memory than the size of the file
>> itself.

>
> Of course - the file is not the only thing to be held by the process.
>
> I see several approaches here:
>
> * Process the file part by part - as the others already suggested,
> line-wise, but if you have e.g. a binary file format, other partings
> may be suitable as well - e.g. fixed block size, or parts given by the
> file format.
>
> * If you absolutely have to keep the whole file data in memory, split
> it up in several strings. Why? Well, the free space in virtual memory
> is not necessarily contiguous. So even if you have 1.5G free, you
> might not be able to read 1.5G at once, but you might succeed in
> reading 3*0.5G.


* try mmap, if you're lucky it will give you access to your data.

(Note that it is completely unreasonable to load several Gs of data in a
32-bit address space, especially if this is text. So my real advice
would be:

* read the file line per line and pack the contents of every line into
a list of objects; once you have all your stuff, process it

-- Alain.
 
Reply With Quote
 
Pradipto Banerjee
Guest
Posts: n/a
 
      10-21-2012
I tried this on a different PC with 12 GB RAM. As expected, this time, reading the data was no issue. I noticed that for large files, Python takes up 2.5x size in memory compared to size on disk, for the case when each line in the file is retained as a string within a Python list. As an anecdote, for MATLAB, the similar overhead is 2x, slightly lower than Python, and each line in the file was retained as string within a MATLAB cell. I'm curious, has any one compared the overhead of data in memory for other languages like for instance Ruby?


-----Original Message-----
From: Python-list [mailtoython-list-bounces+pradipto.banerjee=adainvestments.com@pytho n.org] On Behalf Of Steven D'Aprano
Sent: Friday, October 19, 2012 6:12 PM
To: (E-Mail Removed)
Subject: Re: Python does not take up available physical memory

On Fri, 19 Oct 2012 14:03:37 -0500, Pradipto Banerjee wrote:

> Thanks, I tried that. Still got MemoryError, but at least this time
> python tried to use the physical memory. What I noticed is that before
> it gave me the error it used up to 1.5GB (of the 2.23 GB originally
> showed as available) - so in general, python takes up more memory than
> the size of the file itself.


Well of course it does. Once you read the data into memory, it has its
own overhead for the object structure.

You haven't told us what the file is or how you are reading it. I'm going
to assume it is ASCII text and you are using Python 2.

py> open("test file", "w").write("abcde")
py> os.stat("test file").st_size
5L
py> text = open("test file", "r").read()
py> len(text)
5
py> sys.getsizeof(text)
26

So that confirms that a five byte ASCII string takes up five bytes on
disk but 26 bytes in memory as an object.

That overhead will depend on what sort of object, whether Unicode or not,
the version of Python, and how you read the data.

In general, if you have a huge amount of data to work with, you should
try to work with it one line at a time:

for line in open("some file"):
process(line)


rather than reading the whole file into memory at once:

lines = open("some file").readlines()
for line in lines:
process(line)



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list

This communication is for informational purposes only. It is not intended to be, nor should it be construed or used as, financial, legal, tax or investment advice or an offer to sell, or a solicitation of any offer to buy, an interest in any fund advised by Ada Investment Management LP, the Investment advisor. Any offer or solicitation of an investment in any of the Funds may be made only by delivery of such Funds confidential offering materials to authorized prospective investors. An investment in any of the Funds is not suitable for all investors. No representation is made that the Fundswill or are likely to achieve their objectives, or that any investor will or is likely to achieve results comparable to those shown, or will make anyprofit at all or will be able to avoid incurring substantial losses. Performance results are net of applicable fees, are unaudited and reflect reinvestment of income and profits. Past performance is no guarantee of future results. All financial data and other information are not warranted as to completeness or accuracy and are subject to change without notice.

Any comments or statements made herein do not necessarily reflect those of Ada Investment Management LP and its affiliates. This transmission may contain information that is confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Python does not take up available physical memory Emile van Sebille Python 0 10-20-2012 06:46 AM
RE: Python does not take up available physical memory Pradipto Banerjee Python 1 10-19-2012 09:37 PM
RE: Python does not take up available physical memory Prasad, Ramit Python 0 10-19-2012 07:17 PM
Re: Python does not take up available physical memory MRAB Python 0 10-19-2012 07:13 PM
Re: Python does not take up available physical memory Ian Kelly Python 0 10-19-2012 06:48 PM



Advertisments