Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Memory allocation failure in map container

Reply
Thread Tools

Memory allocation failure in map container

 
 
Saeed Amrollahi
Guest
Posts: n/a
 
      01-04-2011
Dear friends
Hi

I am working on an application to manage Tehran Securities Exchange
organization.
In this program, I try to manipulate a lot of institutional investors
(about 1,500,000). The Investors are a struct like this:

struct Investor {
int ID;
std::wstring NID;
std::wstring Name;
std::wstring RegNum;
std::wstring RegDate;
std::wstring RegProvince;
std::wstring RegCity;
std::wstring Type;
std::wstring HQAddr;
// the deblanked or squeezed info
std::wstring sq_Name;
std::wstring sq_RegNum;
std::wstring sq_RegProvince;
std::wstring sq_RegCity;
};
The investors are stored in an Access Database and I read them into
memory at the program loading time. Because, each investor is
registered in a City, I tried
to use a map container: map a city to the collection of investors:
map<wstring, vector<Investor>*> g_Investor; // map city name to the
investors located in
For the sake of efficiency, I used pointer to vector as a value of
the map.
the following code fills the map:

void InvestorSearchEngine::FillInvestors(const vector<Investor>& m)
{
try {
for (vector<Investor>::const_iterator cit = m.begin(); cit !=
m.end(); ++cit) {
if (g_Investor.find(cit->second.RegCity()) != g_Investor.end()) {
// so, there is a node with reg. place already
map<wstring, vector<Investor>*>::iterator it =
g_Investor.find(cit->second.RegCity());
it->second->push_back(cit->second);
}
else { // the registered city is new
vector<Investor>* vp = new vector<Investor>();
vp->push_back(cit->second);
g_Investor[cit->second.RegCity] = vp;
}
}
}
catch(const std::bad_alloc&)
{
wofstream LogFile("D:/log.txt", std::ios_base::app);
LogFile << "Memory exhausted ... " << L'\n';
LogFile << g_Investor.max_size() << L'\n';
LogFile.close();
}

Unfortunately, sometimes, the memory allocation is failed and the
"Memory exhauseted ..."
message was written to log file.
FYI, the max_size() function returns 119304647 as of largest possible
map.
What's wrong with my code?
I use Pentium 4 CPU 3.00 GHz, 4 GB RAM and Windows XP, Visual Studio
2008.

Please throw some light.
-- Saeed Amrollahi
 
Reply With Quote
 
 
 
 
antred antred is offline
Junior Member
Join Date: Jan 2011
Posts: 6
 
      01-04-2011
So you have 1.5 million instances of your Investor struct, plus storage for your maps and vectors and you're surprised that you've run out of memory?

Have you thought about using a proper database for this stuff?


P.S. Oops, I see you are using a database for this, but still ... loading ALL of it into memory doesn't seem like a pratical solution. Your database probably provides an API offering all the functionality you need. Don't just suck the entire thing into memory, use the database API to query it or modify the data.
 

Last edited by antred; 01-04-2011 at 09:49 PM..
Reply With Quote
 
 
 
 
Goran
Guest
Posts: n/a
 
      01-05-2011
On Jan 4, 5:07*pm, Saeed Amrollahi <(E-Mail Removed)> wrote:
> Dear friends
> Hi
>
> I am working on an application to manage Tehran Securities Exchange
> organization.
> In this program, I try to manipulate a lot of institutional investors
> (about 1,500,000). The Investors are a struct like this:...
> The investors are stored in an Access Database and I read them into
> memory at the program loading time.
> I use Pentium 4 CPU 3.00 GHz, 4 GB RAM and Windows XP, Visual Studio
> 2008.


Address space for any program in 32-bit XP is, by default, is 2GB (you
can raise this to 3, but you really shouldn't). It does not matter
that you have 4GB of memory or whatever big swap file, your program
data must fit in 2GB. It looks like you exhausted that.

You must redesign the program not to load that much data into memory.
You already have a database, and a database is made exactly for that
(among other things): manipulating amounts of data that do not fit
into memory. So use it.

Goran.
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      01-05-2011
On Jan 4, 5:31 pm, Paavo Helde <(E-Mail Removed)> wrote:
> Saeed Amrollahi <(E-Mail Removed)> wrote in news:fe8f1a4b-1315-
> (E-Mail Removed):


> > I am working on an application to manage Tehran Securities
> > Exchange organization. In this program, I try to manipulate
> > a lot of institutional investors (about 1,500,000). The
> > Investors are a struct like this:


> > struct Investor {
> > int ID;
> > std::wstring NID;
> > std::wstring Name;
> > std::wstring RegNum;
> > std::wstring RegDate;
> > std::wstring RegProvince;
> > std::wstring RegCity;
> > std::wstring Type;
> > std::wstring HQAddr;
> > // the deblanked or squeezed info
> > std::wstring sq_Name;
> > std::wstring sq_RegNum;
> > std::wstring sq_RegProvince;
> > std::wstring sq_RegCity;
> > };


> Size of this struct is 584 bytes with my compiler. 1500000*584 is ca 835
> MB. You are holding them twice (both in vector and map), which makes it
> ca 1.7 GB. This brings you already quite close to the maximum limits for
> 32-bit applications (2G in Windows by default), hence the memory problems
> are expected.


Independantly of sizeof(Investor), all those strings are likely
to require additional memory (unless they are empty, or unless
the compiler uses the small string optimization, and they are
small enough---but things like names, provinces and cities
generally aren't). This could easily double the necessary
memory (with small string optimization) or multiply it by 5 to
10, or more (without small string optimization, but then,
sizeof(Investor) would probably be around 64).

> The correct solution would be to redesign the application so that you
> don't need to load all the data in the memory at once. Normally database
> applications are happy to load only the data they need at the moment.


It's hard to say what the correct solution is without more
context, but there's a very good chance you're right, at least
if the actual data is stored in a relational data base.

> Another solution would be to use a 64-bit machine, OS and compilation,
> this lifts the limits a bit, but does not help if your database happens
> to grow 100 times.


If he's near the limits on a 32 bit machine, a 64 bit machine
should multiply the number he can handle by significantly more
than a 100. Moving to 64 bits is probably the simplest
solution, *IF* the program really needs to keep all of the data
virtually in memory.

> Yet another solution is to reduce the size of
> Investor, for example putting all strings together in a single
> std::wstring, separated by some delimiter like a zero byte. This would
> make finding, accessing and changing the individual fields slower and
> more cumbersome of course.


I'm not sure that that would gain that much. What is probably,
on the other hand, is that there are a lot of duplicates in the
fields with province and city; using a string pool for these
could help. And it may be possible to generate the sq_...
values algorithmically from the non sq_ value; in that case,
they could be eliminated completely from the structure. But
none of these optimizations will last for long, if the table
significantly increases in size.

--
James Kanze
 
Reply With Quote
 
Saeed Amrollahi
Guest
Posts: n/a
 
      01-05-2011
Hi Paavo.
Thank you for your feedback, and I'm sorry for a delay.

On Jan 4, 8:31*pm, Paavo Helde <(E-Mail Removed)> wrote:
> Saeed Amrollahi <(E-Mail Removed)> wrote in news:fe8f1a4b-1315-
> (E-Mail Removed):
>
>
>
> > Dear friends
> > Hi

>
> > I am working on an application to manage Tehran Securities Exchange
> > organization.
> > In this program, I try to manipulate a lot of institutional investors
> > (about 1,500,000). The Investors are a struct like this:

>
> > struct Investor {
> > * int ID;
> > * std::wstring NID;
> > * std::wstring Name;
> > * std::wstring RegNum;
> > * std::wstring RegDate;
> > * std::wstring RegProvince;
> > * std::wstring RegCity;
> > * std::wstring Type;
> > * std::wstring HQAddr;
> > * // the deblanked or squeezed info
> > * std::wstring sq_Name;
> > * std::wstring sq_RegNum;
> > * std::wstring sq_RegProvince;
> > * std::wstring sq_RegCity;
> > };

>
> Size of this struct is 584 bytes with my compiler. 1500000*584 is ca 835
> MB. You are holding them twice (both in vector and map), which makes it
> ca 1.7 GB. This brings you already quite close to the maximum limits for
> 32-bit applications (2G in Windows by default), hence the memory problems
> are expected.
>

In general, you are right, but the sizeof of Investor on my machine is
388 bytes
(4 + 7 * 32). and the map and the vector take a lot of memory. please
note I have
other data structures too. I reconsider the Investor, and I really
don't need to
some data members: RegProvince, RegDate, Type, HQAddr and
sq_RegProvince.
So I don't load these fields and the size of Investor decreases by
%40.
At the time being, my program is OK, but It's not the final and good
solution.
> The correct solution would be to redesign the application so that you
> don't need to load all the data in the memory at once. Normally database
> applications are happy to load only the data they need at the moment.
>

Yes. At the moment, the redesign of my program is somehow expensive
but in future I have to think about pre-loading strategy.

> Another solution would be to use a 64-bit machine, OS and compilation,
> this lifts the limits a bit, but does not help if your database happens
> to grow 100 times. Yet another solution is to reduce the size of
> Investor, for example putting all strings together in a single
> std::wstring, separated by some delimiter like a zero byte. This would
> make finding, accessing and changing the individual fields slower and
> more cumbersome of course.
>

At the moment, I can't migrate to Windows 7 or 64 bits versions of
visual
studio. BTW, As you noted, I don't like a bad design with good
hardware.
Your final solution was good and it was a clue, so I removed the
redundant
data members.
> hth
> Paavo

Thanks again.
-- Saeed Amrollahi

 
Reply With Quote
 
Balog Pal
Guest
Posts: n/a
 
      01-05-2011
"Saeed Amrollahi"
>> > I am working on an application to manage Tehran Securities Exchange
>> > organization.
>> > In this program, I try to manipulate a lot of institutional investors
>> > (about 1,500,000). The Investors are a struct like this:

>>
> > struct Investor {
> > int ID;
> > std::wstring NID;
> > std::wstring Name;
> > std::wstring RegNum;
> > std::wstring RegDate;
> > std::wstring RegProvince;
> > std::wstring RegCity;
> > std::wstring Type;
> > std::wstring HQAddr;
> > // the deblanked or squeezed info
> > std::wstring sq_Name;
> > std::wstring sq_RegNum;
> > std::wstring sq_RegProvince;
> > std::wstring sq_RegCity;
> > };

>
>> Size of this struct is 584 bytes with my compiler. 1500000*584 is ca 835
>> MB. You are holding them twice (both in vector and map), which makes it
>> ca 1.7 GB. This brings you already quite close to the maximum limits for
>> 32-bit applications (2G in Windows by default), hence the memory
>> >problems

>> are expected.
>>

>In general, you are right, but the sizeof of Investor on my machine is
>388 bytes (4 + 7 * 32). and the map and the vector take a lot of memory.
> >please note I have other data structures too.


> I reconsider the Investor, and I really
>don't need to
>some data members: RegProvince, RegDate, Type, HQAddr and
>sq_RegProvince.


If you need that amount of records, you better drop using std::wstring.
(unless the strings you store happen to be very close to 31 bytes (guess ~14
letters if you use some UCS2 variant and 16 bit wchar_t) long. The
essential sice of a String is just 4 bytes on 32bit platform (certainly you
need to calculate the size of actual data plus heap blocks overhead). I
bet you can find string implementations tuned for different purposes,
including low memory footprint.

 
Reply With Quote
 
Puppet_Sock
Guest
Posts: n/a
 
      01-05-2011
On Jan 4, 12:31*pm, Paavo Helde <(E-Mail Removed)> wrote:
[snip]
> Yet another solution is to reduce the size of
> Investor,

[snip]

Heh heh. I'm imagining a lot of people with
money and shortened legs, hobbling around.

http://en.wikipedia.org/wiki/Procrustes
Socks
 
Reply With Quote
 
Saeed Amrollahi
Guest
Posts: n/a
 
      01-08-2011
On Jan 5, 3:21*am, "Daniel T." <(E-Mail Removed)> wrote:
> Saeed Amrollahi <(E-Mail Removed)> wrote:
>
> > I am working on an application to manage Tehran Securities Exchange
> > organization.
> > In this program, I try to manipulate a lot of institutional investors
> > (about 1,500,000). The Investors are a struct like this:

>
> > struct Investor {
> > * int ID;
> > * std::wstring NID;
> > * std::wstring Name;
> > * std::wstring RegNum;
> > * std::wstring RegDate;
> > * std::wstring RegProvince;
> > * std::wstring RegCity;
> > * std::wstring Type;
> > * std::wstring HQAddr;
> > * // the deblanked or squeezed info
> > * std::wstring sq_Name;
> > * std::wstring sq_RegNum;
> > * std::wstring sq_RegProvince;
> > * std::wstring sq_RegCity;
> > };
> > The investors are stored in an Access Database and I read them into
> > memory at the program loading time. Because, each investor is
> > registered in a City, I tried
> > to use a map container: map a city to the collection of investors:
> > * map<wstring, vector<Investor>*> g_Investor; // map city name to the
> > investors located in
> > For the sake of efficiency, *I used *pointer to vector as a value of
> > the map.

>
> Using a pointer as you do above is not inherently any more efficient
> than map<wstring, vector<Investor> > would be.
>
> > the following code fills the map:

>
> > void InvestorSearchEngine::FillInvestors(const vector<Investor>& m)
> > {

>
> So at this point, 'm' has 1,500,000 members? I can only assume that it
> isn't referring to some temporary structure. If it isn't, then one thing
> that may help is to have the map hold a vector of Investor* instead of
> Investor. That way you won't have two copies of each Investor object.
> However, if the vector that 'm' refers to changes, you will have to
> rebuild g_Investor all over again. There is probably a better solution.
>
> I see that your class is called "InvestorSearchEngine." If the goal of
> this class is to help clients retrive Investors that meet specific
> criteria, then keeping a huge map of all the Investor objects by city
> seems inappropriate.
>
>
>
> > * try {
> > * * for (vector<Investor>::const_iterator cit = m.begin(); cit !=
> > m.end(); ++cit) {
> > *if (g_Investor.find(cit->second.RegCity()) != g_Investor.end()) {
> > * * * * * * // so, there is a node with reg. place already
> > * * map<wstring, vector<Investor>*>::iterator it =
> > g_Investor.find(cit->second.RegCity());
> > * * * * * *it->second->push_back(cit->second);
> > *}
> > *else { // the registered city is new
> > * * vector<Investor>* vp = new vector<Investor>();
> > * * vp->push_back(cit->second);
> > * * g_Investor[cit->second.RegCity] = vp;
> > *}
> > * * }
> > *}
> > catch(const std::bad_alloc&)
> > *{
> > * * wofstream LogFile("D:/log.txt", std::ios_base::app);
> > *LogFile << "Memory exhausted ... " << L'\n';
> > * * LogFile << g_Investor.max_size() << L'\n';
> > *LogFile.close();
> > }

>
> > Unfortunately, sometimes, the memory allocation is failed and the
> > "Memory exhauseted ..."
> > message was written to log file.

>
> It sounds like you need to sacrifice some of this preloading in order to
> save space.


HiDaniel
Thank you for your answer.
I tried to make smaller the Investor class by removing unnecessary
data members.

Regards,
-- Saeed Amrollahi
 
Reply With Quote
 
Saeed Amrollahi
Guest
Posts: n/a
 
      01-08-2011
On Jan 5, 10:39*am, Goran <(E-Mail Removed)> wrote:
> On Jan 4, 5:07*pm, Saeed Amrollahi <(E-Mail Removed)> wrote:
>
> > Dear friends
> > Hi

>
> > I am working on an application to manage Tehran Securities Exchange
> > organization.
> > In this program, I try to manipulate a lot of institutional investors
> > (about 1,500,000). The Investors are a struct like this:...
> > The investors are stored in an Access Database and I read them into
> > memory at the program loading time.
> > I use Pentium 4 CPU 3.00 GHz, 4 GB RAM and Windows XP, Visual Studio
> > 2008.

>
> Address space for any program in 32-bit XP is, by default, is 2GB (you
> can raise this to 3, but you really shouldn't). It does not matter
> that you have 4GB of memory or whatever big swap file, your program
> data must fit in 2GB. It looks like you exhausted that.
>
> You must redesign the program not to load that much data into memory.
> You already have a database, and a database is made exactly for that
> (among other things): manipulating amounts of data that do not fit
> into memory. So use it.
>
> Goran.


It's wise. Thank you for your answer.
I consider it.

Regards,
-- Saeed Amrollahi
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      01-08-2011
On Jan 5, 7:51 pm, Paavo Helde <(E-Mail Removed)> wrote:
> James Kanze <(E-Mail Removed)> wrote innews:(E-Mail Removed):
> > On Jan 4, 5:31 pm, Paavo Helde <(E-Mail Removed)> wrote:
> >> Saeed Amrollahi <(E-Mail Removed)> wrote in
> >> news:fe8f1a4b-1315-
> >> (E-Mail Removed):


> >> > I am working on an application to manage Tehran Securities
> >> > Exchange organization. In this program, I try to manipulate
> >> > a lot of institutional investors (about 1,500,000). The
> >> > Investors are a struct like this:


> >> > struct Investor {
> >> > int ID;
> >> > std::wstring NID;
> >> > std::wstring Name;
> >> > std::wstring RegNum;
> >> > std::wstring RegDate;
> >> > std::wstring RegProvince;
> >> > std::wstring RegCity;
> >> > std::wstring Type;
> >> > std::wstring HQAddr;
> >> > // the deblanked or squeezed info
> >> > std::wstring sq_Name;
> >> > std::wstring sq_RegNum;
> >> > std::wstring sq_RegProvince;
> >> > std::wstring sq_RegCity;
> >> > };


> >> Size of this struct is 584 bytes with my compiler. 1500000*584 is ca
> >> 835 MB. You are holding them twice (both in vector and map), which
> >> makes it ca 1.7 GB. This brings you already quite close to the
> >> maximum limits for 32-bit applications (2G in Windows by default),
> >> hence the memory problems are expected.


> > Independantly of sizeof(Investor), all those strings are likely
> > to require additional memory (unless they are empty, or unless
> > the compiler uses the small string optimization, and they are
> > small enough---but things like names, provinces and cities
> > generally aren't). This could easily double the necessary
> > memory (with small string optimization) or multiply it by 5 to
> > 10, or more (without small string optimization, but then,
> > sizeof(Investor) would probably be around 64).


> You are right of course, I was assuming most of these strings
> are short and fit in the small string optimization buffer.


What are usual values for the small string cutoff? I seem to
remember 8 being used somewhere, but if sizeof this structure is
584, and wchar_t is 2 bytes (Windows), then I'd guess more
likely 16 (or 8 if wchar_t is 4 bytes, as is the case on most
Unix systems). With 16, your almost certainly right---most of
these fields would fit into 16 characters (supposing the field
names mean what they seem to mean, and judging from the names I
see on maps, etc.). That does change things.

> If he used UTF-8 encoded
> std::string instead of std::wstring, this assumption might be true more
> often (assuming the texts are mostly ASCII of course).


He said that this was for the Teheran securities exchange. The
local language in Teheran uses the Farsi alphabet, a modified
(or extended) form of the Arabic alphabet. In UTF-8, these are
all two byte characters, in the range D8 80--DB BF. If wchar_t
uses UTF-16, he gains absolutely nothing. (If it uses UTF-32,
on the other hand, he divides his memory use by 2. Maybe.)

As to whether it fits into the small string optimization or not,
all depends on how that is implemented. If the optimization
limit depends on the number of code points (char or wchar_t, and
the sizeof you mention would seem to suggest that), then he will
miss the small string optimization more often using UTF-8 than
using UTF-16 or UTF-32. (If the limit depends on the number of
bytes, then UTF-16 and UTF-8 are roughly equal, supposing most
text written in Farsi.)

> >> The correct solution would be to redesign the application so that you
> >> don't need to load all the data in the memory at once. Normally
> >> database applications are happy to load only the data they need at
> >> the moment.


> > It's hard to say what the correct solution is without more
> > context, but there's a very good chance you're right, at least
> > if the actual data is stored in a relational data base.


> >> Another solution would be to use a 64-bit machine, OS and
> >> compilation, this lifts the limits a bit, but does not help if your
> >> database happens to grow 100 times.


> > If he's near the limits on a 32 bit machine, a 64 bit machine
> > should multiply the number he can handle by significantly more
> > than a 100. Moving to 64 bits is probably the simplest
> > solution, *IF* the program really needs to keep all of the data
> > virtually in memory.


> The address space limits would be gone in 64-bit machine, but the new
> limits will be set by the physical RAM and disk space present. A normal
> desktop PC would have e.g. 8GB of RAM, which is only 4 times larger than
> the 2GB limit for Windows. The disks are maybe 200GB or something, if one
> (ab)uses that all for the pagefile, then one could hold theoretically
> hold ca 100 times more data in process memory than for a 32-bit process
> (assuming one has enough patience to wait for such process to complete
> anything!).


Terabyte disks and larger are readily available, and not
expensive. On a 64 bit system, he should easily be able to get
rid of the bad_alloc. With in return so much swapping that the
system becomes unusable.

> >> Yet another solution is to reduce the size of
> >> Investor, for example putting all strings together in a single
> >> std::wstring, separated by some delimiter like a zero byte. This
> >> would make finding, accessing and changing the individual fields
> >> slower and more cumbersome of course.


> > I'm not sure that that would gain that much.


> There are lots of space overhead associated with each std::wstring. In my
> implementation it seems the size of std::wstring is 32 bytes in 32-bit
> compilations, from which only 16 bytes is used for the small string
> buffer.


That seems like a lot, although... It's not too difficult to
implement the small string optimization with an overhead of only
two pointers. On the other hand, any quality implementation
will have a debugging option, which will require extra pointers
to keep a list of iterators, etc. From what I've seen, I have
the impression that VC++ tries to make it possible to link
debugging and non-debugging versions (although at least in VC8,
this doesn't actually work), and so uses the extra space even
when debugging is turned off.

> For example, if all strings are very short and fit in the small
> string buffers, then one can easily reduce the overall memory
> consumptions by half by packing them all in a single string. If the
> strings are large then this would not help of course.


If the strings are large, you would only have one allocation,
instead of many. And each allocation has significant overhead
as well (maybe 16 bytes), without mentionning the small string
buffer which isn't used.

No, you're right. Using a single string with a separator will
represent a significant gain (unless the strings are really,
really long, which I doubt is the case). Regardless of the
implementation of std::basic_string, and whether the strings fit
into the small string buffer or not.

> > What is probably,
> > on the other hand, is that there are a lot of duplicates in the
> > fields with province and city; using a string pool for these
> > could help. And it may be possible to generate the sq_...
> > values algorithmically from the non sq_ value; in that case,
> > they could be eliminated completely from the structure. But
> > none of these optimizations will last for long, if the table
> > significantly increases in size.


> Right


Even without many duplicates: using string pooling would allow
putting all of the actual character data in a single string,
which as you correctly point out, is a win in itself. Each
string in his structure would have type PooledString, which
would consist of two size_t (indexes to the begin and end of the
string in the pool). On a 32 bit machine, that's 8 bytes,
instead of 584; there's still the overhead for the pool, but it
wouldn't surprise me if this halved the total memory use.

--
James Kanze
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
container within a container issue: set in the map puzzlecracker C++ 8 09-21-2008 11:08 PM
STL container class memory allocation Neo C++ 2 08-11-2007 02:15 PM
static memory allocation versus dynamic memory allocation Ken C Programming 24 11-30-2006 12:37 AM
What is the difference between dynamic memory allocation,and stack allocation ? chris C++ 6 10-28-2005 05:27 AM
Cross-compiler way to handle memory allocation failure ? Koen C++ 1 06-25-2003 10:50 AM



Advertisments