Velocity Reviews > size of a pointer on 4-bit system

size of a pointer on 4-bit system

Ben Bacarisse
Guest
Posts: n/a

 02-04-2013
Tim Rentsch <(E-Mail Removed)> writes:

> Ben Bacarisse <(E-Mail Removed)> writes:
>
>> Richard Damon <(E-Mail Removed)> writes:
>>
>>> On 2/1/13 5:41 AM, BartC wrote:
>>>> "Tim Rentsch" <(E-Mail Removed)> wrote in message
>>>> news:(E-Mail Removed)...
>>>>> "BartC" <(E-Mail Removed)> writes:
>>>>>
>>>>>> "Tim Rentsch" <(E-Mail Removed)> wrote in message
>>>>
>>>>>>> The sizeof operator can be defined in a way that satisfies all
>>>>>>> the Standard's requirements but still allows this example to
>>>>>>> allocate only 100 nibbles.
>>>>>>
>>>>>> So what would be the value of sizeof(*x)? It can only really
>>>>>> be 1 or 0. [snip elaboration]
>>>>>
>>>>> Actually that isn't right. Using 'sizeof' on a non-standard
>>>>> datatype may behave in unexpected ways. Here's a hint: sizeof
>>>>> yields a result of type size_t; size_t is an (unsigned) integer
>>>>> type; integer types larger than character types may have padding
>>>>> bits. Can you fill in the rest?
>>>>
>>>> What, that extra information is stored in those padding bits? How is
>>>> that going to help?
>>>>
>>>> What bit-pattern could be returned by sizeof(*x) that would make
>>>> 100*sizeof(*x) yield 50 instead of 100?
>>>
>>> one simple solution is to have the sizeof operator not return a size_t
>>> for the nybble type, but some special type that does math "funny" to get
>>> the right value, for example a fixed point type with 1 fractional bit.
>>> When doing arithmetic on this type, the compiler can get the "right"
>>> answer, and also do the right thing when converting it to a standard
>>> type.

>>
>> I don't see how that can be permitted, at least without twisting
>> things some more! The type of a sizeof expression is size_t (an
>> unsigned integer type) and the result must be "an integer".

>
> Here's another way of looking at it that may help. Using sizeof
> is supposed to give the size of its operand in bytes. For a
> four-bit data type, that should be a number strictly between
> zero and one. Under 6.5 p5, such a circumstance qualifies as an
> exceptional condition and therefore is undefined behavior.

[The thread has become split and I don't want to say essentially the
same things in two places so I'll just reply here.]

The source of the UB is immaterial to my complaint, which should really
be directed at myself: I should have known better! Given the nature of
UB, the first part of your teaser:

"The sizeof operator can be defined in a way that satisfies all the
Standard's requirements"

is essentially meaningless. Of course we can satisfy all the standard
requirements for undefined behaviour -- there are none!

I thought you meant more than that. I thought you meant more than "it

>> If (and this is a moot point) a value can have padding bits,
>> the result could be a trap representation and then all bets are
>> off and the implementation could do what it likes, but I don't
>> think that's how values work.
>>
>>> It would say that
>>>
>>> malloc(100 * sizeof(*x));
>>>
>>> and
>>>
>>> size_t size = sizeof(*x);
>>> malloc(100 * size);
>>>
>>> might return different sized allocations, but as long as the "nybble"
>>> type is given a name in the implementation reserved name space (like
>>> _Nybble), the the use of that name leads to "undefined behavior" by the
>>> standard which is then defiend by the implementation to be useful.

>>
>> Are you imagining a sort of nuclear option for UB? I.e. that
>> just using some non-standard type anywhere in a program makes
>> all of it undefined? That's probably defensible, but it's not
>> how most people like to talk about such things.

>
> I was meaning to say something stronger, or at least how I think
> of it is stronger.

Yes, I know. But to be clear my remark was to Richard Damon who might
have been suggesting (I was not sure -- hence the interrogative) a more
global kind of undefined behaviour as a way of dealing with what he saw
as a problematic case (the two examples above). It seems from other
posts that he was not going there, but was,like you, imagining a more
localised UB.

> There is local undefined behavior at sizeof,
> because of the exceptional condition, and another local undefined
> behavior for a size_t with non-zero fraction bits. However, once
> these two local undefined behaviors are defined, everything else
> proceeds definedly (not counting things like pointers to the new
> type, etc, which also have to be defined, but I think you get the
> idea). The presence of undefined behavior is repaired purely
> locally by defining the semantics for these two specific cases, and
> otherwise has no effect (again assuming that other aspects have
> been defined suitably).

Absolutely. I saw no problem with making it work, and having extra bits
available allows the problem of returning a non-standard value to be
contained, as it were. (My thought was to imagine a type-tagged memory
architecture which is just fancy padding bits by another name.)

--
Ben.

Tim Rentsch
Guest
Posts: n/a

 02-04-2013
Ben Bacarisse <(E-Mail Removed)> writes:

> Tim Rentsch <(E-Mail Removed)> writes:
>
>> Ben Bacarisse <(E-Mail Removed)> writes:
>>
>>> Richard Damon <(E-Mail Removed)> writes:
>>>
>>>> On 2/1/13 5:41 AM, BartC wrote:
>>>>> "Tim Rentsch" <(E-Mail Removed)> wrote in message
>>>>> news:(E-Mail Removed)...
>>>>>> "BartC" <(E-Mail Removed)> writes:
>>>>>>
>>>>>>> "Tim Rentsch" <(E-Mail Removed)> wrote in message
>>>>>
>>>>>>>> The sizeof operator can be defined in a way that satisfies all
>>>>>>>> the Standard's requirements but still allows this example to
>>>>>>>> allocate only 100 nibbles.
>>>>>>>
>>>>>>> So what would be the value of sizeof(*x)? It can only really
>>>>>>> be 1 or 0. [snip elaboration]
>>>>>>
>>>>>> Actually that isn't right. Using 'sizeof' on a non-standard
>>>>>> datatype may behave in unexpected ways. Here's a hint: sizeof
>>>>>> yields a result of type size_t; size_t is an (unsigned) integer
>>>>>> type; integer types larger than character types may have padding
>>>>>> bits. Can you fill in the rest?
>>>>>
>>>>> What, that extra information is stored in those padding bits? How is
>>>>> that going to help?
>>>>>
>>>>> What bit-pattern could be returned by sizeof(*x) that would make
>>>>> 100*sizeof(*x) yield 50 instead of 100?
>>>>
>>>> one simple solution is to have the sizeof operator not return a size_t
>>>> for the nybble type, but some special type that does math "funny" to get
>>>> the right value, for example a fixed point type with 1 fractional bit.
>>>> When doing arithmetic on this type, the compiler can get the "right"
>>>> answer, and also do the right thing when converting it to a standard
>>>> type.
>>>
>>> I don't see how that can be permitted, at least without twisting
>>> things some more! The type of a sizeof expression is size_t (an
>>> unsigned integer type) and the result must be "an integer".

>>
>> Here's another way of looking at it that may help. Using sizeof
>> is supposed to give the size of its operand in bytes. For a
>> four-bit data type, that should be a number strictly between
>> zero and one. Under 6.5 p5, such a circumstance qualifies as an
>> exceptional condition and therefore is undefined behavior.

>
> [The thread has become split and I don't want to say essentially the
> same things in two places so I'll just reply here.]
>
> The source of the UB is immaterial to my complaint, which should really
> be directed at myself: I should have known better! Given the nature of
> UB, the first part of your teaser:
>
> "The sizeof operator can be defined in a way that satisfies all the
> Standard's requirements"
>
> is essentially meaningless. Of course we can satisfy all the standard
> requirements for undefined behaviour -- there are none!
>
> I thought you meant more than that. I thought you meant more than
> "it can be made to work". [snip subsequent parts]

What I found confusing was you not understanding what it was I
was trying to say. Or, perhaps better, the context in which my
remarks were expressed. I thought for a long time in writing
just one paragraph in my response to your other posting, trying
to figure out what your meta-model is for how you read and what
you wrote. It was baffling. Still is, I am sorry to say.

As for my statement being meaningless, I don't think it is. If I
may offer an analogy, consider the relationship between factorial
and the gamma function (shifted by one so they align on integers).
The gamma function can be seen as an extension of factorial.
Saying that means more than just the two functions matching where
factorial is defined. Similarly, the extended definition of
sizeof satisfies both what sizeof must do on standard types, and
smoothly extends that to cover types like nibble, without changing
the interpretation at all for how standard types work. How size_t
is represented, or what kind of type it is, or what type is
produced by sizeof -- none of those things change when types like
nibble enter the picture. That there is a simple, single
definition of sizeof and size_t that does both jobs -- conforms to
the letter of the Standard for standard types, and behaves as
people may expect for non-standard types -- is a noteworthy
statement. Based on the responses I think it's fair to say at
least some other people saw this meaning in my earlier statement.
Is your interpretation right and theirs wrong? Or is there more
than one reasonable interpretation?

glen herrmannsfeldt
Guest
Posts: n/a

 02-04-2013
Richard Damon <(E-Mail Removed)> wrote:

(snip)
> I am not sure that a fractional type meets the requirements for size_t.
> The problem is that size_t is defined as an "unsigned integral type",
> and math on such types is well defined and does not allow for fractional
> bits. This is why I was proposing that sizeof needs to return something
> besides size_t for the _Nybble type.

>>> It would say that

>>> malloc(100 * sizeof(*x));

>>> and

>>> size_t size = sizeof(*x);
>>> malloc(100 * size);

(snip)

> Making the conversion of the _Size_t type to size_t, when the conversion
> isn't exact, generate a warning could at least help locate problematical
> cases, and perhaps let you change the declaration to _Size_t. Note that
> your second case, while it allocates too much space, will at least run
> properly, even if it will be wasteful.

> I think that it is better that the way this is implemented keeps
> conforming code correct. Since sizeof(char)/2 MUST be 0 by the rules of
> Standard C, size_t can not hold fractional bits.

It seems to be well understood that high bits of pointers can be
used on word addressable machines to indicate bytes within words.

I believe it was mentioned earlier that high bits of sizeof()
and size_t could be used. I don't know that it is possible to
make it work, but that might be one of the things suggested.

Seems to me that if all the usual cases work, and if the new
cases (with nybble) also work, then it should be fine.

-- glen

David Thompson
Guest
Posts: n/a

 02-04-2013
On Wed, 30 Jan 2013 15:47:35 -0500, Roberto Waltman
<(E-Mail Removed)> wrote:

> glen herrmannsfeldt wrote:

> >Are there C implementations for any Harvard (separate program and
> >data space) machines? Maybe the 8048 series?

>
> Many, starting with the PDP-11's, today's Atmel AVR's, etc.

Nit: *some* PDP-11s. Higher-end models had separate "instruction" and
"data" spaces, and 3 privilege levels; middle had combined-I&D and 2
levels; original and later cheapest models had no memory management
and only 1 (implicit) privilege level.

BartC
Guest
Posts: n/a

 02-04-2013
"glen herrmannsfeldt" <(E-Mail Removed)> wrote in message
news:kenkuo\$2vv\$(E-Mail Removed)...
> Richard Damon <(E-Mail Removed)> wrote:

>> I think that it is better that the way this is implemented keeps
>> conforming code correct. Since sizeof(char)/2 MUST be 0 by the rules of
>> Standard C, size_t can not hold fractional bits.

>
> It seems to be well understood that high bits of pointers can be
> used on word addressable machines to indicate bytes within words.

Sometimes, unless the hardware reserves them or requires them to be a
certain pattern.

But with 32-bits, there *are* no spare bits! Maybe with 64-bit pointers (and
the manipulation done by software), but remember this thread started by

> I believe it was mentioned earlier that high bits of sizeof()
> and size_t could be used. I don't know that it is possible to
> make it work, but that might be one of the things suggested.

size_t is a standard integer format. To make use of the high bits, you would
lose unacceptable range in a 32-bit type. And both 32/64-bit integers would
need lots of bit-twiddling to the do the simplest operation. While having
what in reality would be 29/61-bit integers would just be too odd! (For
4-bits, you could have 31/63-bit integers, but you're not going to go to all
this trouble, and not have a Bit type as well.)

> Seems to me that if all the usual cases work, and if the new
> cases (with nybble) also work, then it should be fine.

Someone should try it then! I've implemented 1-, 2- and 4-bit types in a
language, and they were enough different that they needed their own
dedicated handling compared with byte-multiple/byte-aligned scalars, rather
than try and shoe-horn them in by introducing new tagged integer types,
which are even more trouble to deal with that the bit-types!

--
Bartc

James Kuyper
Guest
Posts: n/a

 02-04-2013
On 02/04/2013 08:30 AM, BartC wrote:
> "glen herrmannsfeldt" <(E-Mail Removed)> wrote in message
> news:kenkuo\$2vv\$(E-Mail Removed)...
>> Richard Damon <(E-Mail Removed)> wrote:

....
>> It seems to be well understood that high bits of pointers can be
>> used on word addressable machines to indicate bytes within words.

>
> Sometimes, unless the hardware reserves them or requires them to be a
> certain pattern.
>
> But with 32-bits, there *are* no spare bits!

Whether or not there are any spare bits depends upon both the size of
the pointer and the size of the memory addressed by them; you cannot
determine that there are no spare bits, just by looking at the number of
bits.

>> I believe it was mentioned earlier that high bits of sizeof()
>> and size_t could be used. I don't know that it is possible to
>> make it work, but that might be one of the things suggested.

>
> size_t is a standard integer format. To make use of the high bits, you would
> lose unacceptable range in a 32-bit type.

Again, whether or not it's unacceptable depends upon the context. On a
machine where no object can have as many as 2^30 bytes, the loss of
range caused by using the two highest bits for something else would be
perfectly acceptable.
--
James Kuyper

BartC
Guest
Posts: n/a

 02-04-2013
"James Kuyper" <(E-Mail Removed)> wrote in message
news:keoe69\$o4o\$(E-Mail Removed)...
> On 02/04/2013 08:30 AM, BartC wrote:

>> size_t is a standard integer format. To make use of the high bits, you
>> would
>> lose unacceptable range in a 32-bit type.

>
> Again, whether or not it's unacceptable depends upon the context. On a
> machine where no object can have as many as 2^30 bytes, the loss of
> range caused by using the two highest bits for something else would be
> perfectly acceptable.

Any machine with that amount of memory would quite likely want to be able to
address 4GB rather than be limited to 1or 2GB. I think people would want a
better reason for limiting scalability of a system than just the need to
have nibble types in a language! (Which can be reasonably handled by
other means without too much trouble, even without language extensions.)

So I'd prefer a solution which doesn't impact the normal capabilities of the
hardware and language.

--
Bartc

Keith Thompson
Guest
Posts: n/a

 02-04-2013
"BartC" <(E-Mail Removed)> writes:
> "glen herrmannsfeldt" <(E-Mail Removed)> wrote in message
> news:kenkuo\$2vv\$(E-Mail Removed)...
>> Richard Damon <(E-Mail Removed)> wrote:
>>> I think that it is better that the way this is implemented keeps
>>> conforming code correct. Since sizeof(char)/2 MUST be 0 by the rules of
>>> Standard C, size_t can not hold fractional bits.

>>
>> It seems to be well understood that high bits of pointers can be
>> used on word addressable machines to indicate bytes within words.

>
> Sometimes, unless the hardware reserves them or requires them to be a
> certain pattern.
>
> But with 32-bits, there *are* no spare bits! Maybe with 64-bit pointers (and
> the manipulation done by software), but remember this thread started by

It's unlikely that a 4-bit system would be able to address 4 gigabytes
(or even 4 giganybbles) of either virtual or physical memory. If you're
designing a system with that big an address space, you're probably going
to use at least a 16-bit processor.

> size_t is a standard integer format. To make use of the high bits, you would
> lose unacceptable range in a 32-bit type. And both 32/64-bit integers would
> need lots of bit-twiddling to the do the simplest operation. While having
> what in reality would be 29/61-bit integers would just be too odd! (For
> 4-bits, you could have 31/63-bit integers, but you're not going to go to all
> this trouble, and not have a Bit type as well.)

It's plausible that our 4-bit system could easily address 4-bit nybbles,
but not individual bits. For bits, you could use bit fields or bitwise
operators, just as we do on less exotic systems.

size_t is (a typedef for) some predefined integer type. The standard
gives considerable latitude for how it's represented. It needn't be any
of the language-defined types; those could have more conventional
representations, and whatever size_t is defined to be could have

[...]

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson
Guest
Posts: n/a

 02-04-2013
Richard Damon <(E-Mail Removed)> writes:
> On 2/3/13 8:12 PM, Tim Rentsch wrote:
>> Richard Damon <(E-Mail Removed)> writes:
>>> one simple solution is to have the sizeof operator not return a
>>> size_t for the nybble type, but some special type that does math
>>> "funny" to get the right value, for example a fixed point type
>>> with 1 fractional bit. When doing arithmetic on this type, the
>>> compiler can get the "right" answer, and also do the right thing
>>> when converting it to a standard type.

>>
>> This is basically the idea, except the result isn't a new type
>> but is always a size_t. The key insight is that size_t can be
>> what is in effect a fixed-point type (with three fraction bits,
>> for example), but still satisfy the requirements for being an
>> integer type by designating the fraction bits as "padding bits".
>> Any combination of fraction bits other than all zeroes would be
>> a trap representation, allowing both standard behavior and
>> extended behavior in the same data type (ie, size_t).

>
> I am not sure that a fractional type meets the requirements for size_t.
> The problem is that size_t is defined as an "unsigned integral type",
> and math on such types is well defined and does not allow for fractional
> bits. This is why I was proposing that sizeof needs to return something
> besides size_t for the _Nybble type.

The extra bits would be padding bits as far as any language-defined
operations are concerned. Arithmetic operations would happen to set
those bits consistently, but you wouldn't be able to access them other
than by performing undefined, or at least unspecified, operations.

For example, `(size_t)1 / 2` would yield a result that would compare
equal to 0, but if you look at the padding bits of the result and
interpret them as fractional bits, you can interpret it as 1/2
(a value midway between 0 and 1).

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

James Kuyper
Guest
Posts: n/a

 02-04-2013
On 02/04/2013 09:21 AM, BartC wrote:
> "James Kuyper" <(E-Mail Removed)> wrote in message
> news:keoe69\$o4o\$(E-Mail Removed)...
>> On 02/04/2013 08:30 AM, BartC wrote:

>
>>> size_t is a standard integer format. To make use of the high bits, you
>>> would
>>> lose unacceptable range in a 32-bit type.

>>
>> Again, whether or not it's unacceptable depends upon the context. On a
>> machine where no object can have as many as 2^30 bytes, the loss of
>> range caused by using the two highest bits for something else would be
>> perfectly acceptable.

>
> Any machine with that amount of memory would quite likely want to be able to
> address 4GB rather than be limited to 1or 2GB. ...

A system might use 32-bit pointers even if it has as few 65536 bytes of
addressable memory (though it's far more plausible if it has somewhat
more than that, say 4 MB). It needn't have anywhere near to 1GB of memory.