Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > size_t problems

Reply
Thread Tools

size_t problems

 
 
Richard Tobin
Guest
Posts: n/a
 
      08-29-2007
In article <(E-Mail Removed). com>,
user923005 <(E-Mail Removed)> wrote:

>I doubt that the chance a string is longer than 2GB is always
>negligible.


"Always negligible" is irrelevant. Of course it's not negligible in
programs chosen to demonstrate the problem.

>Consider the characters 'C', 'T', 'A', 'G' in various combinations in
>a long sequence of (say) 3 billion.
>That's the human genome.


The chance of a given program being one that stores the complete human
genome in a string is negligible. People with such programs can set the
option I suggested.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
 
Reply With Quote
 
 
 
 
user923005
Guest
Posts: n/a
 
      08-29-2007
On Aug 29, 3:05 pm, (E-Mail Removed) (Richard Tobin) wrote:
> In article <46d5c46d$0$5108$(E-Mail Removed)>,
> jacob navia <(E-Mail Removed)> wrote:
>
> > s = strlen(str) ;

>
> >Since strlen returns a size_t, we have a 64 bit result being
> >assigned to a 32 bit int.

>
> >This can be correct, and in 99.9999999999999999999999999%
> >of the cases the string will be smaller than 2GB...

>
> Clearly with strlen() the chance of it being an error is negligible.
> And I think this is true other size_t->int assignments. For example,
> int s = sizeof(whatever) is almost never a problem.
>
> Ideally, I would suggest not generating a warning unless some option
> is set for it. (There should always be a "maximally paranoid" option
> to help track down obscure errors.) But that only applies to
> size_t->int assignments. Other 64->32 assignments may be more likely to be
> in error. At the point you generate the warning, can you still tell
> that it's a size_t rather than some other 64-bit int type?


I doubt that the chance a string is longer than 2GB is always
negligible.

Consider the characters 'C', 'T', 'A', 'G' in various combinations in
a long sequence of (say) 3 billion.
That's the human genome.

The Chrysanthemum genome is much bigger.

I know of people using database systems to do genetics research. The
probability of long character sequences on those systems is not
negligible.

If the machine is capable of handling large data, right away people
will start to do it.

 
Reply With Quote
 
 
 
 
Malcolm McLean
Guest
Posts: n/a
 
      08-29-2007

"Richard Tobin" <(E-Mail Removed)> wrote in message
news:fb4r2u$2uhn$(E-Mail Removed)...
> The chance of a given program being one that stores the complete human
> genome in a string is negligible. People with such programs can set the
> option I suggested.
>

I work in that field.
Whilst generally you'd want a "rope" type-structure to handle such a long
sequence, there might well be reasons for storing the whole genome as a flat
string. Certainly if I had a 64-bit machine with enough memory installed, I
would expect to have the option, and I'd expect to be able to write the
program in regular C.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm


 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      08-29-2007
Richard Tobin wrote:
> In article <(E-Mail Removed). com>,
> user923005 <(E-Mail Removed)> wrote:
>
>> I doubt that the chance a string is longer than 2GB is always
>> negligible.

>
> "Always negligible" is irrelevant. Of course it's not negligible in
> programs chosen to demonstrate the problem.
>
>> Consider the characters 'C', 'T', 'A', 'G' in various combinations in
>> a long sequence of (say) 3 billion.
>> That's the human genome.

>
> The chance of a given program being one that stores the complete human
> genome in a string is negligible. People with such programs can set the
> option I suggested.
>
> -- Richard


The program has strings of at most a few K. It is an IDE (Integrated
development environment, debugger, etc)

An int can hold string lengths of more than 2 billion... MORE than
enough for this environment. This program has been running under 32 bit
windows where all user space is at most 2GB.
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      08-29-2007
jacob navia <(E-Mail Removed)> writes:
> Keith Thompson wrote:
>> Why didn't you get the same warnings in 32-bit mode? If int and
>> size_t are both 32 bits, INT_MAX < SIZE_MAX, and there are values of
>> size_t that cannot be stored in an int. If the "narrowing conversion"
>> warning is based on the sizes of the type rather than the ranges, I'd
>> say you've just discovered a compiler bug.

>
> 2GB strings are the most you can get under the windows schema in 32 bits.


Ok. Does your compiler know that?

Assigning an arbitrary size_t value to an object of type int, if both
types are 32 bits, could potentially overflow. Your compiler
apparently doesn't issue a warning in that case. Is it because it
knows that the value returned by strlen() can't exceed INT_MAX (if so,
well done, especially since it seems to be smart enough not to make
that assumption on a 64-bit system), or is it because it doesn't issue
a warning when both types are the same size?

For example:

size_t s = func(-1);
/* Assume func() takes a size_t argument and returns it.
Assume func() is defined in another translation unit,
so the compiler can't analyze its definition. In other
words, 's' is initialized to SIZE_MAX, but the compiler
can't make any assumptions about its value. */

signed char c = s;
/* Presumably this produces a warning. */

int i = s;
/* This is a potential overflow. Does this produce
a warning? Should it? */

If your compiler warns about the initialization of 'c' but not about
the initialization of 'i', then IMHO it's being inconsistent. This
doesn't address your original question, but it's related.

[...]

> There isn't any string longer than a few K in this program!
> Of course is a potential bug, but it is practically impossible!


You know that, and I know that, but what matters is what the compiler
knows.

Is it conceivable that a bug in the program and/or some unexpected
input could cause it to create a string longer than 2GB?

You asked how to suppress the bogus warnings without losing any valid
warnings. To do that, your compiler, or some other tool, has to be
able to tell the difference. Telling me that none of the strings are
longer than 2GB doesn't address that concern, unless you can convey
that knowledge to the compiler.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      08-29-2007
Richard Tobin wrote:
> In article <(E-Mail Removed)>,
> Keith Thompson <(E-Mail Removed)> wrote:
>
>> It's better to fix the code. It's even better to write it correctly
>> in the first place.

>
> But int s = sizeof(char *) is not broken, even though sizeof() returns
> a size_t.
>
> -- Richard


If we use size_t everywhere, it is an UNSIGNED quantity.
This means that comparisons with signed quantities will provoke
other warnings, etc etc.

int s = strlen(str) is NOT broken.
 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      08-29-2007
Malcolm McLean wrote:
>
> "Richard Tobin" <(E-Mail Removed)> wrote in message
> news:fb4r2u$2uhn$(E-Mail Removed)...
>> The chance of a given program being one that stores the complete human
>> genome in a string is negligible. People with such programs can set the
>> option I suggested.
>>

> I work in that field.
> Whilst generally you'd want a "rope" type-structure to handle such a
> long sequence, there might well be reasons for storing the whole genome
> as a flat string. Certainly if I had a 64-bit machine with enough memory
> installed, I would expect to have the option, and I'd expect to be able
> to write the program in regular C.
>


YES SIR!

With my new lcc-win32 YOU WILL BE ABLE TO DO IT!

But I am not speaking of that program. I am speaking about
other programs I am PORTING from 32 bit, whose strings are never
bigger than a few Kbytes at most!

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      08-29-2007
"Malcolm McLean" <(E-Mail Removed)> writes:
> "jacob navia" <(E-Mail Removed)> wrote in message
> news:46d5d579$0$27386$(E-Mail Removed)...
>> Malcolm McLean wrote:
>>> There's a very obvious answer to that one. As a compiler-writer,
>>> youa re in a position to do it.
>>>

>>
>> ???
>>
>> (Please excuse my stupidity by I do not see it...)
>>

> The campaign for 64 bit ints T-shirts obviously didn't generate enough
> publicity. I still have a few left. XXL, one size fits all.


One *shirt* fits all (unless somebody other than you actually wants
one).

> There are some good reasons for not making int 64 bits on a 64 bit
> machine, which as a compiler-writer you will be well aware of. However
> typical computers are going to have 64 bits of main address space for
> a very long time to come, so it makes sense to get the language right
> now, and keep it that way for the forseeable future, and not allow
> decisions to be dominated by the need to maintain compatibility with
> legacy 32 bit libraries.


lcc-win32 (and presumably lcc-win64, if that's what it's called) is a
Windows compiler. jacob does not have the option of changing the
Windows API, and a compiler that's incompatible with the underlying
operating system isn't going to be very useful.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      08-29-2007
Keith Thompson wrote:
> lcc-win32 (and presumably lcc-win64, if that's what it's called) is a
> Windows compiler. jacob does not have the option of changing the
> Windows API, and a compiler that's incompatible with the underlying
> operating system isn't going to be very useful.
>


Yes. Mr Gates decided that

sizeof(int) == sizeof(long) == 4.

Only long long is 64 bits. PLease address alll flames to him.

NOT TO ME!!!


 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      08-29-2007
jacob navia <(E-Mail Removed)> writes:
> Richard Tobin wrote:
>> In article <(E-Mail Removed)>,
>> Keith Thompson <(E-Mail Removed)> wrote:
>>
>>> It's better to fix the code. It's even better to write it correctly
>>> in the first place.

>> But int s = sizeof(char *) is not broken, even though sizeof()
>> returns
>> a size_t.

>
> If we use size_t everywhere, it is an UNSIGNED quantity.
> This means that comparisons with signed quantities will provoke
> other warnings, etc etc.


Perhaps those other signed quantities should have been unsigned as
well.

> int s = strlen(str) is NOT broken.


And yet the compiler you're using warns about it. Perhaps you should
take it up with the author of the compiler.

There may well be no easy way to address your problem. Re-writing all
the code as it should have been written in the first place (using
size_t to hold size_t values) may not be practical. Turning off
warnings that you know aren't necessary, while leaving other warnings
in place, requires conveying that information to the compiler; there
may not be a mechanism for doing so. Inserting hundreds of casts
could suppress the warnings, but I dislike that solution, and it's
still a substantial amount of work.

I suppose you could write a strlen wrapper that calls the real strlen,
checks whether the result exceeds INT_MAX (if you think that check is
worth doing), and then returns the result as an int. That's assuming
strlen calls are the only things triggering the warnings. And you'd
still have to make hundreds of changes in the code.

You know that the conversions aren't going to overflow, but C's type
system doesn't let you convey that knowledge to the compiler.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
reinterpret_cast<std::size_t>(p) and reinterpret_cast<std::size_t&>() Alex Vinokur C++ 1 02-06-2011 07:48 AM
Casting from const pair<const unsigned char*, size_t>* to constpair<unsigned char*, size_t>* Alex Vinokur C++ 9 10-13-2008 05:05 PM
Re: for(size_t a=begin();a!=end();++a){} Chris \( Val \) C++ 2 07-14-2003 06:31 AM
Re: size_t ... standards Howard Hinnant C++ 5 06-30-2003 07:22 PM
Re: size_t ... standards Howard Hinnant C++ 0 06-29-2003 05:45 PM



Advertisments