Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Unsigned types are DANGEROUS??

Reply
Thread Tools

Unsigned types are DANGEROUS??

 
 
MikeP
Guest
Posts: n/a
 
      03-12-2011
If you investigate the tcmalloc code (by Google), you will find the
following warning:

// NOTE: unsigned types are DANGEROUS in loops and other arithmetical
// places. Use the signed types unless your variable represents a bit
// pattern (eg a hash value) or you really need the extra bit. Do NOT
// use 'unsigned' to express "this value should always be positive";
// use assertions for this.

Is it just their idiom? What's the problem with using unsigned ints in
loops (it seems natural to do so)? Are C++ unsigned ints "broken"
somehow?


 
Reply With Quote
 
 
 
 
Öö Tiib
Guest
Posts: n/a
 
      03-13-2011
On Mar 12, 11:35*pm, "MikeP" <(E-Mail Removed)> wrote:
> If you investigate the tcmalloc code (by Google), you will find the
> following warning:
>
> // NOTE: unsigned types are DANGEROUS in loops and other arithmetical
> // places. Use the signed types unless your variable represents a bit
> // pattern (eg a hash value) or you really need the extra bit. Do NOT
> // use 'unsigned' to express "this value should always be positive";
> // use assertions for this.
>
> Is it just their idiom? What's the problem with using unsigned ints in
> loops (it seems natural to do so)? Are C++ unsigned ints "broken"
> somehow?


Unsigned int is not broken. It is well-defined. Also it needs one bit
less than signed type and can use it for representing the value. What
is most annoying about it all is that some people take it sort of
religiously one way or other.

One problem is that unsigned int is type whose behavior most novices
intuitively misinterpret as positive integer. It is not positive or
negative. It is modular arithmetic value that does not and can not
have sign. For example if you subtract 8U from 4U then you do not get
-4 but some platform dependent large unsigned value that you did not
want on majority of cases. When you multiply 4U by -4 then you get
some platform dependent large unsigned value ... and so on.

Unsigned types are very good for algorithms that use bitwise
arithmetics. Unsigned int is also excellent for algorithm that needs
modular arithmetic with modulus 2 in power 32. For example for
cryptography.

In general ... good software developer should be careful. C++ is good
language for training careful developers.
 
Reply With Quote
 
 
 
 
Alf P. Steinbach /Usenet
Guest
Posts: n/a
 
      03-13-2011
* MikeP, on 12.03.2011 22:35:
> If you investigate the tcmalloc code (by Google), you will find the
> following warning:
>
> // NOTE: unsigned types are DANGEROUS in loops and other arithmetical
> // places. Use the signed types unless your variable represents a bit
> // pattern (eg a hash value) or you really need the extra bit. Do NOT
> // use 'unsigned' to express "this value should always be positive";
> // use assertions for this.
>
> Is it just their idiom?


No.


> What's the problem with using unsigned ints in
> loops (it seems natural to do so)?


"and other arithmetical places". It's not about loops, it's about any place
where unsigned values might be treated as numbers (instead of as bitpatterns),
and implicit conversions can kick in. For example, ...

assert( std::string( "blah blah" ).length() < -5 );

.... is technically unproblematic, but you have to think twice about it.

Having to think twice about it means that you can easily write something incorrect.

And Murphy then guarantees that you will.


> Are C++ unsigned ints "broken" somehow?


Not in the sense that you're apparently asking about, that is, there is not
anything broken about e.g 'unsigned' itself. But as part of a willy-nilly broken
type hodge-podge inherited from C, yes, it's broken. That's because implicit
conversions that lose information are all over the place.


Cheers & hth.,

- Alf

--
blog at <url: http://alfps.wordpress.com>
 
Reply With Quote
 
Andre Kaufmann
Guest
Posts: n/a
 
      03-13-2011
On 13.03.2011 03:01, William Ahern wrote:
> Alf P. Steinbach /Usenet<(E-Mail Removed)> wrote:
>
> Using unsigned arithmetic means that underflows become overflows, which
> means you only need to check for a single class of invalid values instead of
> two when you use common vector coding patterns: base + length. It also means


Is there really always a difference between signed and unsigned ?
I think it depends on the code/algorithm and how overflows are handled
by the CPU.

e.g. (only an illustrative sample, wouldn't write code like this)


// Assuming code runs under a
// >>> 32 bit CPU <<< !!!!

char buf[4000];

void foo1(unsigned int len, unsigned int appendLen, char* append)
{
if ((len + appendLen) < sizeof(buf))
{
memcpy(buf + len, append, appendLen);
}
}


foo1(2000, 0xFFFFFFFF - 0x2, ....)

Most of the CPU's and C++ compilers I know would simply copy (32 Bit
CPU!) beyond the buffers boundary because:

2000 + 0xFFFFFFFF -> 1997 < sizeof(buf) == 4000

Effectively the unsigned value results due to an overflow to be a
subtraction of -3:


Using signed values wouldn't change that much:

void foo2(int len, int appendLen, char* append)
{
if ((len + appendLen) < sizeof(buf))
{
memcpy(buf + len, append, appendLen);
}
}

foo1(2000, 0xFFFFFFFF - 0x2, ....)

2000 + -3 -> 1997

What would help either check each value individually or use 64 bit
integer arithmetic for 32 bit operands to prevent overflows.
Or to use either exceptions on overflows (some languages do that by
default or use saturation registers which don't overflow).

> you only need to check the result, instead of each intermediate arithmetic
> operation.


IMHO depends on the CPU and language and the algorithm used. In the
sample above checking the result wouldn't help.

Andre
 
Reply With Quote
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      03-13-2011
Andre Kaufmann <akinet#remove#@t-online.de> writes:

> On 13.03.2011 03:01, William Ahern wrote:
>> Alf P. Steinbach /Usenet<(E-Mail Removed)> wrote:
>>
>> Using unsigned arithmetic means that underflows become overflows, which
>> means you only need to check for a single class of invalid values instead of
>> two when you use common vector coding patterns: base + length. It also means

>
> Is there really always a difference between signed and unsigned ?
> I think it depends on the code/algorithm and how overflows are handled
> by the CPU.
>
> e.g. (only an illustrative sample, wouldn't write code like this)
>
>
> // Assuming code runs under a
> // >>> 32 bit CPU <<< !!!!
>
> char buf[4000];
>
> void foo1(unsigned int len, unsigned int appendLen, char* append)
> {
> if ((len + appendLen) < sizeof(buf))
> {
> memcpy(buf + len, append, appendLen);
> }
> }

....
> What would help either check each value individually or use 64 bit
> integer arithmetic for 32 bit operands to prevent overflows.
> Or to use either exceptions on overflows (some languages do that by
> default or use saturation registers which don't overflow).


Or write your code to do non-overflowing computations only. In this
case we are doing arithmetic on an untrusted value (appendLen) before
validating it, which means that we might mask an invalid value.

If we can assume that len is within sizeof(buf) (otherwise we have already
overflowed the buffer), then in this case it should be:
if (sizeof(buf) - len >= appendLen) {
memcpy(buf + len, append, appendLen);
}
because sizeof(buf) - len is guaranteed to give a valid value, and
we compare that directly to the untrusted value.

Ofcourse, this only works as this if the values are unsigned. If
signed, we should also bail out if appendLen is negative (or,
preferably, just cast both sides to size_t or unsigned).

If there is more than one untrusted value, then we will probably need
to do individual validation, because two wrongs might seem to make a
right

/L
--
Lasse Reichstein Holst Nielsen
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
 
Reply With Quote
 
Andre Kaufmann
Guest
Posts: n/a
 
      03-13-2011
On 13.03.2011 13:26, Lasse Reichstein Nielsen wrote:
> Andre Kaufmann<akinet#remove#@t-online.de> writes:


> [...]
>
> Or write your code to do non-overflowing computations only. In this
> case we are doing arithmetic on an untrusted value (appendLen) before
> validating it, which means that we might mask an invalid value.
>
> If we can assume that len is within sizeof(buf) (otherwise we have already
> overflowed the buffer), then in this case it should be:
> if (sizeof(buf) - len>= appendLen) {
> memcpy(buf + len, append, appendLen);
> }


Agreed - good and more safe idea.

> Ofcourse, this only works as this if the values are unsigned. If
> signed, we should also bail out if appendLen is negative (or,
> preferably, just cast both sides to size_t or unsigned).
>
> If there is more than one untrusted value, then we will probably need
> to do individual validation, because two wrongs might seem to make a
> right


Yes : Negative * Negative = Positive

The problem is only, which value passed as parameter can be trusted ?

C++ strings would be more safe, but even there it depends on the
implementation of the string class itself.

Checking for correct values of integer operations is quite complex.
E.g. the code file of an safe integer class: SafeInt class from
Microsoft is quite long:
> 6000 lines of code.

But who want's to have or can afford such an overhead for each integer :-/

> /L


Andre

 
Reply With Quote
 
Alf P. Steinbach /Usenet
Guest
Posts: n/a
 
      03-13-2011
* William Ahern, on 13.03.2011 03:01:
> Alf P. Steinbach /Usenet<(E-Mail Removed)> wrote:
> <snip>
>> Not in the sense that you're apparently asking about, that is, there is not
>> anything broken about e.g 'unsigned' itself. But as part of a willy-nilly broken
>> type hodge-podge inherited from C, yes, it's broken. That's because implicit
>> conversions that lose information are all over the place.

>
> The danger here with implicit conversions occurs when you mix signed and
> unsigned.


Yes.


> If you don't mix, and you stick to (unsigned int) or wider, then
> you're fine.


Those are two separate claims.

"If you don't mix ... you're fine" is generally true. There has to be some
mixing at some level because of the (with 20-20 hindsight) unfortunate choice of
unsigned sizes in the standard library. To *contain* this mixing it's a good
idea to define the triad of helper functions countOf, startOf and endOf.

"If you stick to (unsigned int) or wider, then you're fine" is generally false.
Use hammer for nails, screwdriver for screws. In short, use the right tool for
the job, or at least don't use a clearly inappropriate tool: don't use signed
types for bitlevel stuff, and don't use unsigned types for numerical stuff.


> unsigned types can be safer because everything about the
> arithmetic is well defined, including over- and underflows which occur
> modulo 2^N; as opposed to signed, where those scenarios are undefined.


The well-definedness of operations that you're talking about is this: that the
language *guarantees* that range errors for unsigned types will not be caught.

A guarantee that errors won't be caught does not mean "safer".

That's plain idiocy, sorry.


> If you don't need negative numbers, why use a type that can produce them?


To catch errors and to not inadvertently produce them in the first place.


> If
> they are produced (from an oversight or a bug), really unexpected things can
> happen.


Right.


> Using unsigned arithmetic means that underflows become overflows, which
> means you only need to check for a single class of invalid values instead of
> two when you use common vector coding patterns: base + length.


I am aware that a similar piece of reasoning is in the FAQ. It is technically
correct. But adopted as a guideline it's like throwing the baby out with the
bathwater: the single check has no real advantage, it is limited to a very
special context, and the cost of providing null-advantage is very high.

In short basing your default selection of types on that, is lunacy.


> It also means
> you only need to check the result, instead of each intermediate arithmetic
> operation.


Sorry, that's also incorrect.


Cheers & hth.,

- Alf (replying because you replied to me, possibly not following up on this)

--
blog at <url: http://alfps.wordpress.com>
 
Reply With Quote
 
Öö Tiib
Guest
Posts: n/a
 
      03-13-2011
On Mar 13, 7:59*pm, William Ahern <will...@wilbur.25thandClement.com>
wrote:
> Andre Kaufmann <akinet#(E-Mail Removed)> wrote:
> > IMHO depends on the CPU and language and the algorithm used. In the
> > sample above checking the result wouldn't help.

>
> Indeed, it would not. But the following is an example of what I had in mind.


[...]

>
> #include <iostream>
>
> int main(void) {
> * * * * const char s[] = "\x7f\xff\xff\xffSome object";
> #if 0
> * * * * int limit = sizeof s;
> * * * * int offset = 0;
> * * * * int count;
> #else
> * * * * unsigned limit = sizeof s;
> * * * * unsigned offset = 0;
> * * * * unsigned count;
> #endif
>
> * * * * /* note that I do (limit - 4), not (offset + 4). and earlier i would
> * * * * * *need to ensure that limit >= 4 */
> * * * * while (offset < limit - 4) {
> * * * * * * * * count = *((0x7fU & s[offset++]) << 24U);
> * * * * * * * * count |= ((0xffU & s[offset++]) << 16U);
> * * * * * * * * count |= ((0xffU & s[offset++]) << 8U);
> * * * * * * * * count |= ((0xffU & s[offset++]) << 0U);
>
> * * * * * * * * offset += count;
>
> * * * * * * * * std::cout << "count:" << count
> * * * * * * * * * * * * * << " limit:" << limit
> * * * * * * * * * * * * * << " offset:" << offset << "\n";
> * * * * }
>
> * * * * return 0;
> }


It is exactly what everyone are agreeing that unsigned is good as a
dummy full of bits for doing bitwise algorithms. If you are constantly
doing such bit-crunching then no wonder that you prefer unsigned.

BTW: Since your example is anyway platform-dependent code ... why you
don't use memcpy? The whole cryptic bit-shift block would go away.
 
Reply With Quote
 
Andre Kaufmann
Guest
Posts: n/a
 
      03-13-2011
On 13.03.2011 18:59, William Ahern wrote:
> Andre Kaufmann<akinet#remove#@t-online.de> wrote:
>
> You're deriving a new base, so of course this is bad code. There are an
> infinite ways to write bad code. This would be equally bad as signed. My
> contention wasn't that unsigned is always better; just as worse or at the


Yes, agreed.

>
> Switching to a wider type is usually the wrong thing to do. 5 or 10 years
> later they can easily become broken. When audio/video protocols switched to
> unsigned 64-bit types, all of a sudden signed 64-bit integers became too
> short because they weren't just capturing operations over 32-bit operands.


I think it's nearly impossible to generally switch to a larger type
without any problems. Either you loose binary compatibility (when sent
over IP to other systems or when old binary data is loaded) or you have
other incompatibilities. One of the reason why most of the compiler use
a data model where an [int] type has the same size on 32 bit and 64
systems. And to be honest, I don't care that much about 128 bit systems
today ;-9

>> Or to use either exceptions on overflows (some languages do that by
>> default or use saturation registers which don't overflow).

>
>>> you only need to check the result, instead of each intermediate arithmetic
>>> operation.

>
>> IMHO depends on the CPU and language and the algorithm used. In the
>> sample above checking the result wouldn't help.

>
> Indeed, it would not. But the following is an example of what I had in mind.
> You get completely different results depending on whether one is using
> signed or unsigned. And, at least for the line of work I do (on-the-fly
> media transcoding), this is extremely common.


I too write code for media transcoding. Generally I don't care about
integer overflows too, besides of overflows caused by integer values
that are passed to our services (e.g via http/tcp).

> In the work I do, it doesn't
> matter whether I detect that there has been an overflow, or I just fail
> after performing the next jump and there's junk data.


Yes, but I had typical buffer overflows in mind, which are used to
attack and compromise systems.

> What matters is that I
> program in a way which mitigates my mistakes (because none of us can write
> perfect code all the time) and which fails gracefully. The less error
> checking I need to do, the less possibility of mistake. I could do
> additional error checking using signed types, but that just means that I (a)
> have to write more code more susceptible to mistake, and (b) mistakes have
> worse consequences.


Yes, I don't think that signed values help that much too - perhaps would
make the situation even worse.

>
> #include<iostream>


> You get completely different results depending on whether one is using
> signed or unsigned.


Yep, signed integers don't help. Besides if you would use a 64 bit (long
long) signed integer on a 32 bit system.

But for most algorithms this would be overkill and at least for most
media codecs result in decreased performance (when one would use
generally 64 bit integers on 32 bit platforms).

Andre
 
Reply With Quote
 
Paul
Guest
Posts: n/a
 
      03-13-2011

"Leigh Johnston" <(E-Mail Removed)> wrote in message
news(E-Mail Removed)...
> On 13/03/2011 19:29, Alf P. Steinbach /Usenet wrote:
>> * William Ahern, on 13.03.2011 03:01:
>>
>> "If you stick to (unsigned int) or wider, then you're fine" is generally
>> false. Use hammer for nails, screwdriver for screws. In short, use the
>> right tool for the job, or at least don't use a clearly inappropriate
>> tool: don't use signed types for bitlevel stuff, and don't use unsigned
>> types for numerical stuff.
>>

>
> Bullshit. Using unsigned integral types to represent values that are
> never negative is perfectly fine. std::size_t ensures that the C++ horse
> has already bolted as far as trying to avoid them is concerned.
>
>>
>>> unsigned types can be safer because everything about the
>>> arithmetic is well defined, including over- and underflows which occur
>>> modulo 2^N; as opposed to signed, where those scenarios are undefined.

>>
>> The well-definedness of operations that you're talking about is this:
>> that the language *guarantees* that range errors for unsigned types will
>> not be caught.
>>
>> A guarantee that errors won't be caught does not mean "safer".
>>
>> That's plain idiocy, sorry.

>
> Plain idiocy is eschewing the unsigned integral types in C++. Perhaps you
> would prefer being a Java programmer? Java has less types to "play with"
> which perhaps would suit you better if you cannot cope with C++'s richer
> set of types.
>

As Java, like C++, supports UDT's I don't think it's correct to say that C++
suports a richer set of types.
class anyTypeYouLike{};

There is a reason Java doesn't bother with a built in unsigned numeric type.
I think the people who created Java know more about programming than you do
and it is not a case of Java being inadequete. This is just your misguided
interpretation in an attempt to reinforce your idea that std::size_t is the
only type people should use in many circumstances.
You obviously think std:size_t is the best thing since sliced bread and this
is the way forward in C++ and, as per usual, your opinion is wrong.

The message you replied to ALf said to use the correct tool for the job,
which seems like a reasonable opinion. You replied saying this was bullshit
and implied Alf had said something about never using unsigned, your post
looks like a deliberate attempt to start a flare.
You also make the point of saying using unsigned for values that are never
negative is fine, but how do you know it is never going to be negative? Your
idea of never negative is different from others', you think array indexes
cannot be negative, but most other people know they can be.






 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
(int) -> (unsigned) -> (int) or (unsigned) -> (int) -> (unsigned):I'll loose something? pozz C Programming 12 03-20-2011 11:32 PM
unsigned long to unsigned char ashtonn@gmail.com Python 1 06-01-2005 07:00 PM
comparing unsigned long and unsigned int sridhar C Programming 6 11-03-2004 03:52 AM
unsigned int const does not match const unsigned int Timo Freiberger C++ 3 10-30-2004 07:02 PM
Assigning unsigned long to unsigned long long George Marsaglia C Programming 1 07-08-2003 05:16 PM



Advertisments