![]() |
|
|
|
#11 |
|
Ben Bacarisse a écrit :
> If the implementation has decided that size_t should be 64 bits, then > I'd want containers that can be that large. I don't see that as a > problem but I don't use ever use low-memory systems with 64-bit > size_t. It soulds like you are worrying about a corner case. Maybe, but I am not sure about it. If you want to use bitstrings (as opposed to a container of unsigned char with 1 and 0 in each position) it means you do care about memory usage. Besides this problem is pervasive in 64 bit systems. All programmers start thinking like that, "we have a lot of memory", etc) and the end result is that the programs bloat so much that the benefits of having more memory are completely nullified by bloated software. Pointers are now 64 bit, when in fact we could surely use 32 bit pointers for most applications. The code size increases because of longer 64 bit instructions, etc. jacob navia |
|
|
|
|
#12 |
|
Posts: n/a
|
jacob navia <> writes:
> I am implementing the bitstring type in the container library, and > obviously I store the number of bits in a size_t... > > Problem is, in 64 bit versions, size_t grows to 8 bytes, what > is an absolute overkill for a number that in most cases will > fit in 16 bits, or, at most 32. > > And this happens in all containers. I do not see most applications > use containers with more than 4G elements... In a 64 bit system > size_t is just too much waste. My advice: Don't impose arbitrary limits in general-purpose code unless there's a *very* good reason to do so. How do you know that nobody will ever want a bitstring with more than 2**32 elements? What if I'm working with a large dataset (say, 10 billion elements) and I want a bitstring with one bit per element? No, I can't do that because the person who designed this container library 10 years ago was certain I'd never want to. We're talking about 4 bytes of overhead per bitstring. That could be a problem if you happen to have a huge number of small bitstrings, but IMHO that's not too high a price to pay for generality. > Now I see the problems Malcom was pointing at when he ranted at size_t. > > What would be the alternatives? > > uin32_t? (Typo: uint32_t) > That one looks better. Any problem with that? Yes: It's not size_t, and it would make your bitstring container gratuitously inconsistent with your other containers. Just use size_t; that's what it's for. My only real concern is that, since you can potentially have objects up to SIZE_MAX bytes, and a bitstring presumably stores CHAR_BIT bits per byte, so size_t potentially isn't big enough. That's not likely to be a problem any time soon for 64-bit systems, but it could for 32-bit where where, assuming CHAR_BIT==8, you couldn't have a bitstring of more than 2**29 bytes (1/8 of the addressing range). But I'd still advocate using size_t here rather than some potentially larger type. -- Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst> Nokia "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister" Keith Thompson |
|
|
|
#13 |
|
Posts: n/a
|
On 2009-10-15, Keith Thompson <kst-> wrote:
> How do you know that nobody will ever want a bitstring with more than > 2**32 elements? What if I'm working with a large dataset (say, 10 > billion elements) and I want a bitstring with one bit per element? > No, I can't do that because the person who designed this container > library 10 years ago was certain I'd never want to. Let's go for the obvious case: I have plenty of machines with over 1GB of memory. I would like to implement a system which has an index of IP addresses, and as a first pass, before I do a huge binary search, I just want to do a linear check as to whether an address is known to me. The bitstring for IPv4 space has precisely 2**32 elements. If you can't think of a use for a string of 2**32 bits, or possibly 2**32 sets of three or four bits... > My only real concern is that, since you can potentially have > objects up to SIZE_MAX bytes, and a bitstring presumably stores > CHAR_BIT bits per byte, so size_t potentially isn't big enough. > That's not likely to be a problem any time soon for 64-bit systems, > but it could for 32-bit where where, assuming CHAR_BIT==8, you > couldn't have a bitstring of more than 2**29 bytes (1/8 of the > addressing range). But I'd still advocate using size_t here rather > than some potentially larger type. Yes. And there's a real issue here, because, while we currently don't have machines that could handle a 2**128 bit bitstring, we have an obvious application for one... (More realistically, though, I actually know someone doing some initial data structures work on questions to do with mapping IPv6, and he is talking about data sets that could, in the reasonably forseeable future, exceed 2^32 items.) -s -- Copyright 2009, all wrongs reversed. Peter Seebach / usenet- http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated! Seebs |
|
|
|
#14 |
|
Posts: n/a
|
jacob navia <> writes:
> Ben Bacarisse a écrit : > >> If the implementation has decided that size_t should be 64 bits, then >> I'd want containers that can be that large. I don't see that as a >> problem but I don't use ever use low-memory systems with 64-bit >> size_t. It soulds like you are worrying about a corner case. > > Maybe, but I am not sure about it. > If you want to use bitstrings (as opposed to > a container of unsigned char with 1 and 0 in each position) > it means you do care about memory usage. Perhaps I care about memory usage in the sense that an unpacked bitstring of the length I need won't fit in my computer's memory. If you really think it's important to be able to use a 32-bit, or even 16-bit, length for small bitstrings, I suppose you could vary the representation depending on the length. For example, start with a 16-bit length field. For values from 0 to 0xFFFD, the bitstring data immediately follows the length. 0xFFFE indicates that the actual length is in the following 4 bytes, followed by the data. 0xFFFF indicates that the acutal length is in the following 8 bytes, followed by the data. You could even start with a single byte. I don't necessarily think this is a good idea (every reference would have to check for special cases), but it would save space for small bitstrings while allowing huge bitstrings to be supported. And as long as the details are hidden from the user, the complexity is isolated to the library code that implements it. (The assumption of 8-bit bytes is also isolated to the library code.) I still recommend just storing the length as a size_t, but this is an alternative. -- Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst> Nokia "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister" Keith Thompson |
|
|
|
#15 |
|
Posts: n/a
|
jacob navia wrote:
> Ben Bacarisse a écrit : > >> If the implementation has decided that size_t should be 64 bits, then >> I'd want containers that can be that large. I don't see that as a >> problem but I don't use ever use low-memory systems with 64-bit >> size_t. It soulds like you are worrying about a corner case. > > Maybe, but I am not sure about it. > If you want to use bitstrings (as opposed to > a container of unsigned char with 1 and 0 in each position) > it means you do care about memory usage. The only time I have used things like that I was dealing with large enough structures that the overhead of a 64 bit size_t would be dwarfed by the size of the structure. > Besides this problem is pervasive in 64 bit systems. > All programmers start thinking like that, "we have a lot of > memory", etc) and the end result is that the programs bloat > so much that the benefits of having more memory are completely > nullified by bloated software. > > Pointers are now 64 bit, when in fact we could surely use > 32 bit pointers for most applications. The code size increases > because of longer 64 bit instructions, etc. You don't have to make size_t 64 bit if you don't want to. You could make it 32 bits and limit the size of objects appropriately. I don't know enough about the 64 bit processors to know if you can use 32 bit pointers. You could, of course, provide compiler options to control this if you wanted, just like the old compiler options for small/medium/large memory models etc. -- Flash Gordon Flash Gordon |
|
|
|
#16 |
|
Posts: n/a
|
"Scott Fluhrer" <> writes:
> "Flash Gordon" <> wrote in message > news:... [...] >> You could, of course, provide compiler options to control this if you >> wanted, just like the old compiler options for small/medium/large memory >> models etc. > > True. It strikes me as overkill in this case (if we have a system with > multiple Gigabytes of memory, is it that serious of a problem if size_t > wastes an additional 4 bytes), but it is certainly possible. A 64-bit system doesn't necessarily have gigabytes of memory. Historically, issues like this become important when the sizes (of physical memory, of virtual memory, of the largest supported object) approach some kind of boundary. We ran into it some time ago when sizes around 64 KiB (addressible with 16 bits) were common. We're seeing it again now with sizes around 4 GiB. -- Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst> Nokia "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister" Keith Thompson |
|
|
|
#17 |
|
Posts: n/a
|
On Oct 15, 4:41*am, jacob navia <ja...@nospam.org> wrote:
> I am implementing the bitstring type in the container library, and > obviously I store the number of bits in a size_t... > > Problem is, in 64 bit versions, size_t grows to 8 bytes, what > is an absolute overkill for a number that in most cases will > fit in 16 bits, or, at most 32. > > And this happens in all containers. I do not see most applications > use containers with more than 4G elements... In a 64 bit system > size_t is just too much waste. > > Now I see the problems Malcom was pointing at when he ranted at size_t. > > What would be the alternatives? > > uin32_t? > > That one looks better. Any problem with that? It would also be worth considering that the application may be diverse enough to justify two types - a large and small bitstring. The small bitstring bit be limited to a short's worth of bits, and internally be implemented with the struct hack for the bit storage. The large bitstring might use some sort of tree or other layer of indirection to make expansion reasonable as the bitstring gets really large (realloc ()’ing a growing 20KB bitstring is reasonable, but I’d have issues growing a 1GB bit string that way). Another option is that you change representations as the size grows past a certain point. If you do that, could you not just alter the vtable pointer to go "native" for the changed implementation? This would require that enough space be allocated for the largest version of the fixed part of the container (or allow a "small" base part to be specified, which would not allow the container to grow past a certain point). robertwessel2@yahoo.com |
|
|
|
#18 |
|
Posts: n/a
|
jacob navia wrote:
> I am implementing the bitstring type in the container library, and > obviously I store the number of bits in a size_t... > > Problem is, in 64 bit versions, size_t grows to 8 bytes, what > is an absolute overkill for a number that in most cases will > fit in 16 bits, or, at most 32. > > And this happens in all containers. I do not see most applications > use containers with more than 4G elements... In a 64 bit system > size_t is just too much waste. If the user has a 64-bit system, it is extremely unlikely that saving four bytes of memory will matter. Also, on many implementations, 64-bit integers are just as fast as (or faster than) 32-bit integers, and this is likely to become more common, not less. Finally, if I had a dollar for every time some numbskull programmer said to himself "nobody will _ever_ send me more than N bytes of data" and then was proven wrong a few years later, I'd be a rich man. It's best to use size_t for sizes of things, even if that seems like overkill at times. (BTW, isn't the human genome around fourteen billion bits? That seems like the kind of thing I'd store in a bitstring...) S -- Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking Stephen Sprunk |
|
|
|
#19 |
|
Posts: n/a
|
a écrit :
> On Oct 15, 4:41 am, jacob navia <ja...@nospam.org> wrote: >> I am implementing the bitstring type in the container library, and >> obviously I store the number of bits in a size_t... >> >> Problem is, in 64 bit versions, size_t grows to 8 bytes, what >> is an absolute overkill for a number that in most cases will >> fit in 16 bits, or, at most 32. >> >> And this happens in all containers. I do not see most applications >> use containers with more than 4G elements... In a 64 bit system >> size_t is just too much waste. >> >> Now I see the problems Malcom was pointing at when he ranted at size_t. >> >> What would be the alternatives? >> >> uin32_t? >> >> That one looks better. Any problem with that? > > > It would also be worth considering that the application may be diverse > enough to justify two types - a large and small bitstring. The small > bitstring bit be limited to a short's worth of bits, and internally be > implemented with the struct hack for the bit storage. The large > bitstring might use some sort of tree or other layer of indirection to > make expansion reasonable as the bitstring gets really large (realloc > ()’ing a growing 20KB bitstring is reasonable, but I’d have issues > growing a 1GB bit string that way). > Yes, this is a problem with the heap manager, for instance. I have a schema of having a vector of pointers to blocks each block containing N elements. That gives 4G (2^32) elements using a couple of 16 bit indexes. Most containers will be happy with several thousand elements, what would fit in two unsigned char indexes, etc. There are several growth strategies, and it is difficult to generalize. We have a linear growth (20K as you proposed above) or maybe exponential growth: we double the amount allocated at each step. Any strategy based on exponential growth can be dangerous in memory usage. To use with care... If we want to keep a record of the number of resizing being done, that takes place too, etc. A possible solution is a dynamic growth strategy that starts small assuming most containers will have a few hundred elements at most, and then dynamically change the vtable for bigger containers as soon as some threshold is passed. This can be complicated if the user has replaced some of the functions in the vtable: care should be taken to copy the functions that the user replaced... > Another option is that you change representations as the size grows > past a certain point. If you do that, could you not just alter the > vtable pointer to go "native" for the changed implementation? This > would require that enough space be allocated for the largest version > of the fixed part of the container (or allow a "small" base part to be > specified, which would not allow the container to grow past a certain > point). I did not fully understand the above paragraph. Maybe you can expand? what would it mean to "go native" ? Thanks for your ideas. jacob navia |
|
|
|
#20 |
|
Posts: n/a
|
Stephen Sprunk <> writes:
[...] > (BTW, isn't the human genome around fourteen billion bits? That seems > like the kind of thing I'd store in a bitstring...) Wikipedia says it's just over 3 billion DNA base pairs, at 2 bits each for over 6 billion bits. You'd be able to store that in less than 1 gigabyte, but if you want to store any additional information things are going to get tight. -- Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst> Nokia "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister" Keith Thompson |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Comcast + Wireless Internet Problem | shadoweloc | General Help Related Topics | 1 | 07-01-2008 06:19 PM |
| Dial Up Problem | smackedass | A+ Certification | 3 | 02-02-2007 11:59 PM |
| Re: Virus Problem ** Help!** | David BlandIII | A+ Certification | 1 | 03-02-2004 06:00 PM |
| Re: Serious Computer Problem | hootnholler | A+ Certification | 1 | 11-24-2003 12:18 PM |
| Re: Serious Computer Problem | Bret | A+ Certification | 0 | 11-19-2003 12:51 AM |