Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Memory Usage of Strings

Reply
Thread Tools

Memory Usage of Strings

 
 
Amit Dev
Guest
Posts: n/a
 
      03-16-2011
I'm observing a strange memory usage pattern with strings. Consider
the following session. Idea is to create a list which holds some
strings so that cumulative characters in the list is 100MB.

>>> l = []
>>> for i in xrange(100000):

.... l.append(str(i) * (1000/len(str(i))))

This uses around 100MB of memory as expected and 'del l' will clear that.


>>> for i in xrange(20000):

.... l.append(str(i) * (5000/len(str(i))))

This is using 165MB of memory. I really don't understand where the
additional memory usage is coming from.

If I reduce the string size, it remains high till it reaches around
1000. In that case it is back to 100MB usage.

Python 2.6.4 on FreeBSD.

Regards,
Amit
 
Reply With Quote
 
 
 
 
John Gordon
Guest
Posts: n/a
 
      03-16-2011
In <(E-Mail Removed)> Amit Dev <(E-Mail Removed)> writes:

> I'm observing a strange memory usage pattern with strings. Consider
> the following session. Idea is to create a list which holds some
> strings so that cumulative characters in the list is 100MB.


> >>> l = []
> >>> for i in xrange(100000):

> ... l.append(str(i) * (1000/len(str(i))))


> This uses around 100MB of memory as expected and 'del l' will clear that.


> >>> for i in xrange(20000):

> ... l.append(str(i) * (5000/len(str(i))))


> This is using 165MB of memory. I really don't understand where the
> additional memory usage is coming from.


> If I reduce the string size, it remains high till it reaches around
> 1000. In that case it is back to 100MB usage.


I don't know anything about the internals of python storage -- overhead,
possible merging of like strings, etc. but some simple character counting
shows that these two loops do not produce the same number of characters.

The first loop produces:

Ten single-digit values of i which are repeated 1000 times for a total of
10000 characters;

Ninety two-digit values of i which are repeated 500 times for a total of
45000 characters;

Nine hundred three-digit values of i which are repeated 333 times for a
total of 299700 characters;

Nine thousand four-digit values of i which are repeated 250 times for a
total of 2250000 characters;

Ninety thousand five-digit values of i which are repeated 200 times for
a total of 18000000 characters.

All that adds up to a grand total of 20604700 characters.

Or, to condense the above long-winded text in table form:

range num digits 1000/len(str(i)) total chars
0-9 10 1 1000 10000
10-99 90 2 500 45000
100-999 900 3 333 299700
1000-9999 9000 4 250 2250000
10000-99999 90000 5 200 18000000
========
grand total chars 20604700

The second loop yields this table:

range num digits 5000/len(str(i)) total bytes
0-9 10 1 5000 50000
10-99 90 2 2500 225000
100-999 900 3 1666 1499400
1000-9999 9000 4 1250 11250000
10000-19999 10000 5 1000 10000000
========
grand total chars 23024400

The two loops do not produce the same numbers of characters, so I'm not
surprised they do not consume the same amount of storage.

P.S.: Please forgive me if I've made some basic math error somewhere.

--
John Gordon A is for Amy, who fell down the stairs
http://www.velocityreviews.com/forums/(E-Mail Removed) B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"

 
Reply With Quote
 
 
 
 
Amit Dev
Guest
Posts: n/a
 
      03-16-2011
sum(map(len, l)) => 99998200 for 1st case and 99999100 for 2nd case.
Roughly 100MB as I mentioned.

On Wed, Mar 16, 2011 at 11:21 PM, John Gordon <(E-Mail Removed)> wrote:
> In <(E-Mail Removed)> Amit Dev <(E-Mail Removed)> writes:
>
>> I'm observing a strange memory usage pattern with strings. Consider
>> the following session. Idea is to create a list which holds some
>> strings so that cumulative characters in the list is 100MB.

>
>> >>> l = []
>> >>> for i in xrange(100000):

>> ... *l.append(str(i) * (1000/len(str(i))))

>
>> This uses around 100MB of memory as expected and 'del l' will clear that..

>
>> >>> for i in xrange(20000):

>> ... *l.append(str(i) * (5000/len(str(i))))

>
>> This is using 165MB of memory. I really don't understand where the
>> additional memory usage is coming from.

>
>> If I reduce the string size, it remains high till it reaches around
>> 1000. In that case it is back to 100MB usage.

>
> I don't know anything about the internals of python storage -- overhead,
> possible merging of like strings, etc. *but some simple character counting
> shows that these two loops do not produce the same number of characters.
>
> The first loop produces:
>
> Ten single-digit values of i which are repeated 1000 times for a total of
> 10000 characters;
>
> Ninety two-digit values of i which are repeated 500 times for a total of
> 45000 characters;
>
> Nine hundred three-digit values of i which are repeated 333 times for a
> total of 299700 characters;
>
> Nine thousand four-digit values of i which are repeated 250 times for a
> total of 2250000 characters;
>
> Ninety thousand five-digit values of i which are repeated 200 times for
> a total of 18000000 characters.
>
> All that adds up to a grand total of 20604700 characters.
>
> Or, to condense the above long-winded text in table form:
>
> range * * * * num digits 1000/len(str(i)) *total chars
> 0-9 * * * * * *10 1 * * *1000 * * * * * * * * * *10000
> 10-99 * * * * *90 2 * * * 500 * * * * * * ** * *45000
> 100-999 * * * 900 3 * * * 333 * * * * * * * ** 299700
> 1000-9999 * *9000 4 * * * 250 * * * * * * * * *2250000
> 10000-99999 90000 5 * * * 200 * * * * * * * * 18000000
> * * * * * * * * * * * * * * * * * * * * * * *========
> * * * * * * * * * * * * *grand total chars * 20604700
>
> The second loop yields this table:
>
> range * * * * num digits 5000/len(str(i)) *total bytes
> 0-9 * * * * * *10 1 * * *5000 * * * * * * * * * *50000
> 10-99 * * * * *90 2 * * *2500 * * * * * * ** * 225000
> 100-999 * * * 900 3 * * *1666 * * * * * * * **1499400
> 1000-9999 * *9000 4 * * *1250 * * * * * * * * 11250000
> 10000-19999 10000 5 * * *1000 * * * * * * * * 10000000
> * * * * * * * * * * * * * * * * * * * * * * *========
> * * * * * * * * * * * * *grand total chars * 23024400
>
> The two loops do not produce the same numbers of characters, so I'm not
> surprised they do not consume the same amount of storage.
>
> P.S.: Please forgive me if I've made some basic math error somewhere.
>
> --
> John Gordon * * * * * * * * * A is for Amy, who fell down the stairs
> (E-Mail Removed) * * * * * * *B is for Basil, assaulted by bears
> * * * * * * * * * * * * * * * *-- Edward Gorey, "The Gashlycrumb Tinies"
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      03-16-2011
On 3/16/2011 3:51 PM, Santoso Wijaya wrote:
> ??
>
> Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit
> (AMD64)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sys
> >>> L = []
> >>> for i in xrange(100000):

> ... L.append(str(i) * (1000 / len(str(i))))
> ...
> >>> sys.getsizeof(L)

> 824464


This is only the size of the list object and does not include the sum of
sizes of the string objects. With 8-byth pointers, 824464 == 8*100000 +
(small bit of overhead) + extra space (for list to grow without
reallocation and copy)

> >>> L = []
> >>> for i in xrange(20000):

> ... L.append(str(i) * (5000 / len(str(i))))
> ...
> >>> sys.getsizeof(L)

> 178024


== 8*20000 + extra

--
Terry Jan Reedy

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What is the difference between Memory Usage and Heap Usage in my JVMMetrics ? Krist Java 8 02-10-2010 12:44 AM
retrieving CPU Usage and Memory Usage information in JAVA hvt Java 0 03-13-2007 01:09 PM
retrieving CPU Usage and Memory Usage information in JAVA hvt Java 0 03-13-2007 01:07 PM
Strings, Strings and Damned Strings Ben C Programming 14 06-24-2006 05:09 AM
Need help on memory usage VS PF usage metfan Java 2 10-21-2003 01:58 PM



Advertisments