Go Back   Velocity Reviews > Newsgroups > C Programming
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

C Programming - The problem with size_t

 
Thread Tools Search this Thread
Old 10-15-2009, 04:33 PM   #11
Default Re: The problem with size_t


Ben Bacarisse a écrit :

> If the implementation has decided that size_t should be 64 bits, then
> I'd want containers that can be that large. I don't see that as a
> problem but I don't use ever use low-memory systems with 64-bit
> size_t. It soulds like you are worrying about a corner case.


Maybe, but I am not sure about it.
If you want to use bitstrings (as opposed to
a container of unsigned char with 1 and 0 in each position)
it means you do care about memory usage.

Besides this problem is pervasive in 64 bit systems.
All programmers start thinking like that, "we have a lot of
memory", etc) and the end result is that the programs bloat
so much that the benefits of having more memory are completely
nullified by bloated software.

Pointers are now 64 bit, when in fact we could surely use
32 bit pointers for most applications. The code size increases
because of longer 64 bit instructions, etc.



jacob navia
  Reply With Quote
Old 10-15-2009, 04:50 PM   #12
Keith Thompson
 
Posts: n/a
Default Re: The problem with size_t
jacob navia <> writes:
> I am implementing the bitstring type in the container library, and
> obviously I store the number of bits in a size_t...
>
> Problem is, in 64 bit versions, size_t grows to 8 bytes, what
> is an absolute overkill for a number that in most cases will
> fit in 16 bits, or, at most 32.
>
> And this happens in all containers. I do not see most applications
> use containers with more than 4G elements... In a 64 bit system
> size_t is just too much waste.


My advice: Don't impose arbitrary limits in general-purpose code
unless there's a *very* good reason to do so.

How do you know that nobody will ever want a bitstring with more than
2**32 elements? What if I'm working with a large dataset (say, 10
billion elements) and I want a bitstring with one bit per element?
No, I can't do that because the person who designed this container
library 10 years ago was certain I'd never want to.

We're talking about 4 bytes of overhead per bitstring. That could
be a problem if you happen to have a huge number of small bitstrings,
but IMHO that's not too high a price to pay for generality.

> Now I see the problems Malcom was pointing at when he ranted at size_t.
>
> What would be the alternatives?
>
> uin32_t?


(Typo: uint32_t)

> That one looks better. Any problem with that?


Yes: It's not size_t, and it would make your bitstring container
gratuitously inconsistent with your other containers.

Just use size_t; that's what it's for.

My only real concern is that, since you can potentially have
objects up to SIZE_MAX bytes, and a bitstring presumably stores
CHAR_BIT bits per byte, so size_t potentially isn't big enough.
That's not likely to be a problem any time soon for 64-bit systems,
but it could for 32-bit where where, assuming CHAR_BIT==8, you
couldn't have a bitstring of more than 2**29 bytes (1/8 of the
addressing range). But I'd still advocate using size_t here rather
than some potentially larger type.

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"


Keith Thompson
  Reply With Quote
Old 10-15-2009, 07:51 PM   #13
Seebs
 
Posts: n/a
Default Re: The problem with size_t
On 2009-10-15, Keith Thompson <kst-> wrote:
> How do you know that nobody will ever want a bitstring with more than
> 2**32 elements? What if I'm working with a large dataset (say, 10
> billion elements) and I want a bitstring with one bit per element?
> No, I can't do that because the person who designed this container
> library 10 years ago was certain I'd never want to.


Let's go for the obvious case:

I have plenty of machines with over 1GB of memory.

I would like to implement a system which has an index of IP addresses,
and as a first pass, before I do a huge binary search, I just want to do
a linear check as to whether an address is known to me.

The bitstring for IPv4 space has precisely 2**32 elements.

If you can't think of a use for a string of 2**32 bits, or possibly 2**32
sets of three or four bits...

> My only real concern is that, since you can potentially have
> objects up to SIZE_MAX bytes, and a bitstring presumably stores
> CHAR_BIT bits per byte, so size_t potentially isn't big enough.
> That's not likely to be a problem any time soon for 64-bit systems,
> but it could for 32-bit where where, assuming CHAR_BIT==8, you
> couldn't have a bitstring of more than 2**29 bytes (1/8 of the
> addressing range). But I'd still advocate using size_t here rather
> than some potentially larger type.


Yes.

And there's a real issue here, because, while we currently don't have
machines that could handle a 2**128 bit bitstring, we have an obvious
application for one... (More realistically, though, I actually know
someone doing some initial data structures work on questions to do with
mapping IPv6, and he is talking about data sets that could, in the
reasonably forseeable future, exceed 2^32 items.)

-s
--
Copyright 2009, all wrongs reversed. Peter Seebach / usenet-
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!


Seebs
  Reply With Quote
Old 10-15-2009, 07:56 PM   #14
Keith Thompson
 
Posts: n/a
Default Re: The problem with size_t
jacob navia <> writes:
> Ben Bacarisse a écrit :
>
>> If the implementation has decided that size_t should be 64 bits, then
>> I'd want containers that can be that large. I don't see that as a
>> problem but I don't use ever use low-memory systems with 64-bit
>> size_t. It soulds like you are worrying about a corner case.

>
> Maybe, but I am not sure about it.
> If you want to use bitstrings (as opposed to
> a container of unsigned char with 1 and 0 in each position)
> it means you do care about memory usage.


Perhaps I care about memory usage in the sense that an unpacked
bitstring of the length I need won't fit in my computer's memory.

If you really think it's important to be able to use a 32-bit, or
even 16-bit, length for small bitstrings, I suppose you could vary
the representation depending on the length. For example, start with
a 16-bit length field. For values from 0 to 0xFFFD, the bitstring
data immediately follows the length. 0xFFFE indicates that the
actual length is in the following 4 bytes, followed by the data.
0xFFFF indicates that the acutal length is in the following 8 bytes,
followed by the data. You could even start with a single byte.

I don't necessarily think this is a good idea (every reference
would have to check for special cases), but it would save space for
small bitstrings while allowing huge bitstrings to be supported.
And as long as the details are hidden from the user, the complexity
is isolated to the library code that implements it. (The assumption
of 8-bit bytes is also isolated to the library code.)

I still recommend just storing the length as a size_t, but this is
an alternative.

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"


Keith Thompson
  Reply With Quote
Old 10-15-2009, 08:00 PM   #15
Flash Gordon
 
Posts: n/a
Default Re: The problem with size_t
jacob navia wrote:
> Ben Bacarisse a écrit :
>
>> If the implementation has decided that size_t should be 64 bits, then
>> I'd want containers that can be that large. I don't see that as a
>> problem but I don't use ever use low-memory systems with 64-bit
>> size_t. It soulds like you are worrying about a corner case.

>
> Maybe, but I am not sure about it.
> If you want to use bitstrings (as opposed to
> a container of unsigned char with 1 and 0 in each position)
> it means you do care about memory usage.


The only time I have used things like that I was dealing with large
enough structures that the overhead of a 64 bit size_t would be dwarfed
by the size of the structure.

> Besides this problem is pervasive in 64 bit systems.
> All programmers start thinking like that, "we have a lot of
> memory", etc) and the end result is that the programs bloat
> so much that the benefits of having more memory are completely
> nullified by bloated software.
>
> Pointers are now 64 bit, when in fact we could surely use
> 32 bit pointers for most applications. The code size increases
> because of longer 64 bit instructions, etc.


You don't have to make size_t 64 bit if you don't want to. You could
make it 32 bits and limit the size of objects appropriately. I don't
know enough about the 64 bit processors to know if you can use 32 bit
pointers. You could, of course, provide compiler options to control this
if you wanted, just like the old compiler options for small/medium/large
memory models etc.
--
Flash Gordon


Flash Gordon
  Reply With Quote
Old 10-15-2009, 09:18 PM   #16
Keith Thompson
 
Posts: n/a
Default Re: The problem with size_t
"Scott Fluhrer" <> writes:
> "Flash Gordon" <> wrote in message
> news:...

[...]
>> You could, of course, provide compiler options to control this if you
>> wanted, just like the old compiler options for small/medium/large memory
>> models etc.

>
> True. It strikes me as overkill in this case (if we have a system with
> multiple Gigabytes of memory, is it that serious of a problem if size_t
> wastes an additional 4 bytes), but it is certainly possible.


A 64-bit system doesn't necessarily have gigabytes of memory.

Historically, issues like this become important when the sizes (of
physical memory, of virtual memory, of the largest supported object)
approach some kind of boundary. We ran into it some time ago when
sizes around 64 KiB (addressible with 16 bits) were common. We're
seeing it again now with sizes around 4 GiB.

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"


Keith Thompson
  Reply With Quote
Old 10-15-2009, 10:56 PM   #17
robertwessel2@yahoo.com
 
Posts: n/a
Default Re: The problem with size_t
On Oct 15, 4:41*am, jacob navia <ja...@nospam.org> wrote:
> I am implementing the bitstring type in the container library, and
> obviously I store the number of bits in a size_t...
>
> Problem is, in 64 bit versions, size_t grows to 8 bytes, what
> is an absolute overkill for a number that in most cases will
> fit in 16 bits, or, at most 32.
>
> And this happens in all containers. I do not see most applications
> use containers with more than 4G elements... In a 64 bit system
> size_t is just too much waste.
>
> Now I see the problems Malcom was pointing at when he ranted at size_t.
>
> What would be the alternatives?
>
> uin32_t?
>
> That one looks better. Any problem with that?



It would also be worth considering that the application may be diverse
enough to justify two types - a large and small bitstring. The small
bitstring bit be limited to a short's worth of bits, and internally be
implemented with the struct hack for the bit storage. The large
bitstring might use some sort of tree or other layer of indirection to
make expansion reasonable as the bitstring gets really large (realloc
()’ing a growing 20KB bitstring is reasonable, but I’d have issues
growing a 1GB bit string that way).

Another option is that you change representations as the size grows
past a certain point. If you do that, could you not just alter the
vtable pointer to go "native" for the changed implementation? This
would require that enough space be allocated for the largest version
of the fixed part of the container (or allow a "small" base part to be
specified, which would not allow the container to grow past a certain
point).


robertwessel2@yahoo.com
  Reply With Quote
Old 10-15-2009, 11:02 PM   #18
Stephen Sprunk
 
Posts: n/a
Default Re: The problem with size_t
jacob navia wrote:
> I am implementing the bitstring type in the container library, and
> obviously I store the number of bits in a size_t...
>
> Problem is, in 64 bit versions, size_t grows to 8 bytes, what
> is an absolute overkill for a number that in most cases will
> fit in 16 bits, or, at most 32.
>
> And this happens in all containers. I do not see most applications
> use containers with more than 4G elements... In a 64 bit system
> size_t is just too much waste.


If the user has a 64-bit system, it is extremely unlikely that saving
four bytes of memory will matter.

Also, on many implementations, 64-bit integers are just as fast as (or
faster than) 32-bit integers, and this is likely to become more common,
not less.

Finally, if I had a dollar for every time some numbskull programmer said
to himself "nobody will _ever_ send me more than N bytes of data" and
then was proven wrong a few years later, I'd be a rich man.

It's best to use size_t for sizes of things, even if that seems like
overkill at times.

(BTW, isn't the human genome around fourteen billion bits? That seems
like the kind of thing I'd store in a bitstring...)

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking


Stephen Sprunk
  Reply With Quote
Old 10-15-2009, 11:11 PM   #19
jacob navia
 
Posts: n/a
Default Re: The problem with size_t
a écrit :
> On Oct 15, 4:41 am, jacob navia <ja...@nospam.org> wrote:
>> I am implementing the bitstring type in the container library, and
>> obviously I store the number of bits in a size_t...
>>
>> Problem is, in 64 bit versions, size_t grows to 8 bytes, what
>> is an absolute overkill for a number that in most cases will
>> fit in 16 bits, or, at most 32.
>>
>> And this happens in all containers. I do not see most applications
>> use containers with more than 4G elements... In a 64 bit system
>> size_t is just too much waste.
>>
>> Now I see the problems Malcom was pointing at when he ranted at size_t.
>>
>> What would be the alternatives?
>>
>> uin32_t?
>>
>> That one looks better. Any problem with that?

>
>
> It would also be worth considering that the application may be diverse
> enough to justify two types - a large and small bitstring. The small
> bitstring bit be limited to a short's worth of bits, and internally be
> implemented with the struct hack for the bit storage. The large
> bitstring might use some sort of tree or other layer of indirection to
> make expansion reasonable as the bitstring gets really large (realloc
> ()’ing a growing 20KB bitstring is reasonable, but I’d have issues
> growing a 1GB bit string that way).
>


Yes, this is a problem with the heap manager, for instance. I have
a schema of having a vector of pointers to blocks each block containing
N elements. That gives 4G (2^32) elements using a couple
of 16 bit indexes.

Most containers will be happy with several thousand elements, what
would fit in two unsigned char indexes, etc.

There are several growth strategies, and it is difficult to generalize.
We have a linear growth (20K as you proposed above) or maybe exponential
growth: we double the amount allocated at each step.

Any strategy based on exponential growth can be dangerous in memory
usage. To use with care...

If we want to keep a record of the number of resizing being done,
that takes place too, etc.

A possible solution is a dynamic growth strategy that starts small
assuming most containers will have a few hundred elements at most,
and then dynamically change the vtable for bigger containers as
soon as some threshold is passed.

This can be complicated if the user has replaced some of the
functions in the vtable: care should be taken to copy the functions
that the user replaced...

> Another option is that you change representations as the size grows
> past a certain point. If you do that, could you not just alter the
> vtable pointer to go "native" for the changed implementation? This
> would require that enough space be allocated for the largest version
> of the fixed part of the container (or allow a "small" base part to be
> specified, which would not allow the container to grow past a certain
> point).


I did not fully understand the above paragraph. Maybe you can expand?
what would it mean to "go native" ?

Thanks for your ideas.



jacob navia
  Reply With Quote
Old 10-15-2009, 11:21 PM   #20
Keith Thompson
 
Posts: n/a
Default Re: The problem with size_t
Stephen Sprunk <> writes:
[...]
> (BTW, isn't the human genome around fourteen billion bits? That seems
> like the kind of thing I'd store in a bitstring...)


Wikipedia says it's just over 3 billion DNA base pairs, at 2 bits each
for over 6 billion bits.

You'd be able to store that in less than 1 gigabyte, but if you want
to store any additional information things are going to get tight.

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"


Keith Thompson
  Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Comcast + Wireless Internet Problem shadoweloc General Help Related Topics 1 07-01-2008 06:19 PM
Dial Up Problem smackedass A+ Certification 3 02-02-2007 11:59 PM
Re: Virus Problem ** Help!** David BlandIII A+ Certification 1 03-02-2004 06:00 PM
Re: Serious Computer Problem hootnholler A+ Certification 1 11-24-2003 12:18 PM
Re: Serious Computer Problem Bret A+ Certification 0 11-19-2003 12:51 AM




SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46