Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: sscanf() safety

Reply
Thread Tools

Re: sscanf() safety

 
 
Jens Thoms Toerring
Guest
Posts: n/a
 
      12-23-2010
Cross <(E-Mail Removed)> wrote:

> I am working on an rtf renderer and parser. My code is hosted at
> http://code.google.com/p/ertf . I tried kcachegrind on my binaries and found
> that getc() is taking a lot of time. Obviously, character read from files is
> slow. So, I decided to read the whole file into memory as a char buffer.
> Please feel free to comment and suggest on the following code. Now, I want
> to scan the char buffer using sscanf().


I guess you mean fscanf() here since you seem to want to read
from a file and not from a string - which is what sscanf() is
for,

> However, I remember once I heard in
> a chat room that sscanf() has buffer overflow vulnerabilities.


Only when used in an unsafe fashion, that is reaing with e.g.
an unadorned '%s' since you can't forsee the number if chars
that can be read in and thus you can't supply a large enough
buffer. But when you specify the maximum number of chars that
fscanf() is allowed to read it's perfectlly safe. So for example

sscanf( fp, "%4096[^\x01-\0xFF]", buf );

will read up to but never more than 4096 chars from the file
and put them into 'buf' (and it will put a trailing '\0' at
the end). It also will stop at a '\0' in the file, but then
this is a binary file and in that case fscanf() isn't suit-
able anyway.

But if you want to read in the whole file anyway and you don't
want to do any conversion of the data then fread() is probably
the better choice anyway.

> I would like
> pointers on this and would like to know how I can use sscanf() safely.


> if(!fp){
> fprintf(stderr, "File pointer uninitialized.\n");
> goto close_strbuf;
> }


> while((count = fread(cp, 1, 4096, fp))){
> if(feof(fp))break;


Checking for feof() is rather useless - if you hit the end of the
file and there are no chars left to be read the return value of
fread() will be 0. That's a good time to stop

> strbuf_append(buf, cp);


No idea what this function is supposed to do (it's not a standard
C function and thus shouldn't have a name starting with 'str'),
but I am rather sure that this use is unsafe - what you got from
fread() has rather likely no '\0' at the end, so the function
has no chance to determine where to stop when copying from 'cp'
to 'buf', you would also have to pass 'count' to it...

> }
> cp[count] = '\0';
> strbuf_append(buf, cp);
> fclose(fp);


The function (or, the snippet from some function) doesn't make
too much sense to me the way its written. How large is 'buf'?
Do you want to read never more than 4096 chars from a file and
thus 'buf' has (at least) 4097 elements? In that case why use
the while() loop? Just

count = fread( buf, 1, 4096, fp );
buf[ count ] = '\0';

would do fine.

Otherwise where do you check that you can still append to 'buf'?
That's assuming that you don't know the length of the file in
advance - but if you did I would expect that you would try to
read in everything at once instead of in chunks of 4096 bytes.

So, please give a bit more of information about what you're
trying to do, otherwise it's impossible to tell what you
may need.
Regards, Jens
--
\ Jens Thoms Toerring ___ http://www.velocityreviews.com/forums/(E-Mail Removed)
\__________________________ http://toerring.de
 
Reply With Quote
 
 
 
 
Jens Thoms Toerring
Guest
Posts: n/a
 
      12-23-2010
Cross <(E-Mail Removed)> wrote:
> On 12/24/2010 02:04 AM, Jens Thoms Toerring wrote:
> > Checking for feof() is rather useless - if you hit the end of the
> > file and there are no chars left to be read the return value of
> > fread() will be 0. That's a good time to stop


> Without feof(), there was some garbage read each time. I don't know why but
> with feof() things appear fine.


You shouldn't program by trial and error but try understand what's
going wrong Perhaps using feof() just hides the error but does
not really fix the real problem. Given the scarcity of information
you're giving that could very well be the case.

> > The function (or, the snippet from some function) doesn't make
> > too much sense to me the way its written. How large is 'buf'?

> I have defined buf to be of size 4096 bytes.
> > Do you want to read never more than 4096 chars from a file and
> > thus 'buf' has (at least) 4097 elements? In that case why use
> > the while() loop? Just
> >
> > count = fread( buf, 1, 4096, fp );
> > buf[ count ] = '\0';
> >
> > would do fine.


> The files sizes used in testing range from several kilobytes to several
> megabytes.


That may be the case but you simply don't show enough of your
code to make sense, and this reply doesn't help either, sorry.
So it's impossible to come up with sensible advice. Please show
the whole function (plus functions used from within it like
strbuf_append()) - then things might become clearer. At the
moment it's impossible to even say what you're intentions for
that function are, so it's like giving the car mechanic a photo
of your car and asking him why it won't start

At least if files can be much longer than 4096 bytes then all
tests of how much still fits into 'buf' are missing (does that
happen in strbuf_append()?) as well as reallocation of 'buf' etc.
And that's were buffer overflows you seem to be concerned about
typically happen (or are avoided).

Regards, Jens
--
\ Jens Thoms Toerring ___ (E-Mail Removed)
\__________________________ http://toerring.de
 
Reply With Quote
 
 
 
 
Jens Thoms Toerring
Guest
Posts: n/a
 
      12-23-2010
Cross <(E-Mail Removed)> wrote:
> On 12/24/2010 03:12 AM, Jens Thoms Toerring wrote:
> > At least if files can be much longer than 4096 bytes then all
> > tests of how much still fits into 'buf' are missing (does that
> > happen in strbuf_append()?) as well as reallocation of 'buf' etc.
> > And that's were buffer overflows you seem to be concerned about
> > typically happen (or are avoided).
> >

> I am using eina string buffer utility from EFL. http://enlightenment.org


That's nice for you. But there was no call to any of them in
the code you posted. And using third-party functions doesn't
make your code inherently safer, they still need to be used
correctly (assuming they're bug-free). But if you're not pre-
pared to tell what you're doing (except some more or less
meaningless snippets) you will need a clairvoyant for help,
my crystal ball doesn't work well enough at the moment. (I
even went as far as checking the web page you gave for your
program but all you get there under "Downloads" is "This pro-
ject currently has no downloads".)

So
<volume setting="max">
POST THE REAL, COMPLETE CODE YOU'RE USING
</volume>

and don't make assumptions about where the problem is - if you
knew you wouldn't have to ask. Best write a short, compilable
program that exhibits the problems you're facing, leaving out
everthing else, and post that.

Regards, Jens
--
\ Jens Thoms Toerring ___ (E-Mail Removed)
\__________________________ http://toerring.de
 
Reply With Quote
 
Jens Thoms Toerring
Guest
Posts: n/a
 
      12-24-2010
Cross <(E-Mail Removed)> wrote:
> On 12/24/2010 04:22 AM, Jens Thoms Toerring wrote:
> > So
> > <volume setting="max">
> > POST THE REAL, COMPLETE CODE YOU'RE USING
> > </volume>
> >
> > and don't make assumptions about where the problem is - if you
> > knew you wouldn't have to ask. Best write a short, compilable
> > program that exhibits the problems you're facing, leaving out
> > everthing else, and post that.


> http://docs.enlightenment.org/auto/e...er__Group.html


> The above link is the documentation and the source code is available at
> http://trac.enlightenment.org/e/brow...k/eina/src/lib


Thanks, I am able to use Google. But in the code you posted
here no 'eina_' function was ever used. So why should I care
about it?

> I also posted the site for my full code http://code.google.com/p/ertf Look in
> src/lib/ertf_input.c and src/lib/ertf_rtf_to_markup.c


As I already told you, if you go to "Downloads" all you get is
"This project currently has no downloads.". So the only way to
get the source you have to do a svn checkout. Perhaps you should
consider that not everyone has svn and is prepared to install it.
But if you do it anyway one finds that the code snippet you pos-
ted here isn't to be found in that source tree at all - e.g. the
fread() function isn't found when you grep over the whole tree.
So it got nothing to do with the code we're supposed to discuss
here and doesn't give a clue what you're after. All one finds in
one of the files you mentioned is a single function with more
than 400 lines, with a single switch case making up more than
350 of that. Good people have been shot for less. And in the
other one one finds e.g.

int
ertf_tag_get(FILE *fp, char *s)
{
int c;
fscanf(fp, "%[^ 0123456789;\\{}\n\r]", s);
if ((c = fgetc(fp)) != ' ')
ungetc(c, fp);
CHECK_EOF(fp, "EOF encountered while obtaining control tag.\n", return 1);
return 0;
}

This function can't be used safely at all. Within the function
you have no idea how much room is left in 's', so it's impos-
sible to use fscanf() without a potential buffer overrun. You
also don't check if fscanf() was successful - but the calling
functions seem, as far as I can see, to blindly assume that it
did succeed. And the check for EOF after the fgetc() is useless
- if you hit EOF you already tried to stuff it back into the
file, which of course is not what you want since EOF is not a
char and what gets "pushed back" is whatever EOF actually is on
the system, converted to an unsigned char...

--
\ Jens Thoms Toerring ___ (E-Mail Removed)
\__________________________ http://toerring.de
 
Reply With Quote
 
Jens Thoms Toerring
Guest
Posts: n/a
 
      01-09-2011
Cross <(E-Mail Removed)> wrote:
> Actually, I tried kcachegrind on my code and found that as the file size
> increases 50 - 80% of the time is spent in i/o (on fgetc() specifically).
> So, I came up with the idea of reading the whole file into a char buffer
> and scanning it.


Something like

Eina_Strbuf *buf = eina_strbuf_new( );
char cp[ 4097 ];
size_t count;

while ( ( count = fread( cp, 1, 4096, fp ) ) != 0 )
{
cp[ count ] = '\0';
eina_strbuf_append( buf, cp );
}

should do the job (assuming I got the part about the eina string
buffer right and it works as I assume). Of course, you should
check the return value of eina_strbuf_append() - it may run out
of memory. And there's no need to check for EOF at all - the
loop will be left once the end of the file has been reached.

Also, you seem to like feof() a lot, but as far as my experience
goes it's hardly needed at all. All functions for reading from a
file return some kind if indication if reading succeeded or not
and if you use that calling feof() in most cases is only needed
if you have to distinguish between the EOF condition from a read
failure. I just grepped through a project of mine with about 250k
LoC and dozends of files being read in and parsed and found not a
single call of feof() (all there is are a few calls of ferror()
which come from automatically generated code).

> > All one finds in
> > one of the files you mentioned is a single function with more
> > than 400 lines, with a single switch case making up more than
> > 350 of that. Good people have been shot for less.

> Honestly, I would like to know a better way of doing it.


Split it up into functions that do simple tasks and call them
instead of putting all the functionality into a single mono-
litic function. Also consider if its possible to use a "dis-
patch table", e.g. if you have code like

if ( ! strncmp( buf, "plain", 5 )
{
/* do something with plain text */
}
else if ( ! strncmp( buf, "par", 3 ) )
{
/* do something with a paragraph */
}
....
else
/* do something if all else failed */

you could instead define a structure like

struct {
const char * keyword;
int ( *handler )( const char * );
} dispatch_table[ ] = { { "plain", handle_plain_text },
{ "f", handle_font },
... };

and use that it as in

for ( int i = 0; i < sizeof dispatch_table / sizeof *dispatch_table, i++ )
if ( ! strncmp( buf, dispatch_table[ i ].keyword,
strlen( dispatch_table[ i ].keyword ) )
{
result = dispatch_table[ i ].handler( buf );
break;
}

if ( i == sizeof dispatch_table / sizeof *dispatch_table )
result = handle_unkown_keyword( buf );

That could get rid of the endless if/else if/else blocks. Of
course you then need a single function for each of the diff-
erent keywords (don't be afraid of short functions, the com-
piler will rather likely inline them), but you should have
something like that anyway to make the code modular and give
the reader at least a fighting chance of understanding what's
going on. With the above definition of the structure they would
have to have a signature of

int handle_plain_text( const char * buf );

and you must adapt the member for the function pointer when
you use a different signature.

When you use such functions you will rather likely have to
assemble the information about the current state of the par-
ser (all those 'boldset', 'italicset' variables etc.) into
a structure that you then pass to the functions (probably as
a pointer to the structure).

Of course, instead of testing for the length of the dispatch
table you could have a sentinel element at the end (where
e.g. the keyword member is a NULL pointer) and bail out from
the loop when this is found - the trick with

sizeof dispatch_table / sizeof *dispatch_table

only works when the array of structures is defined in the
same function where it's used (or as a global array).

> > And in the
> > other one one finds e.g.
> >
> > int
> > ertf_tag_get(FILE *fp, char *s)
> > {
> > int c;
> > fscanf(fp, "%[^ 0123456789;\\{}\n\r]", s);
> > if ((c = fgetc(fp)) != ' ')
> > ungetc(c, fp);
> > CHECK_EOF(fp, "EOF encountered while obtaining control tag.\n", return 1);
> > return 0;
> > }
> >
> > This function can't be used safely at all. Within the function
> > you have no idea how much room is left in 's', so it's impos-
> > sible to use fscanf() without a potential buffer overrun. You
> > also don't check if fscanf() was successful - but the calling
> > functions seem, as far as I can see, to blindly assume that it
> > did succeed. And the check for EOF after the fgetc() is useless
> > - if you hit EOF you already tried to stuff it back into the
> > file, which of course is not what you want since EOF is not a
> > char and what gets "pushed back" is whatever EOF actually is on
> > the system, converted to an unsigned char...
> >


> Considering this, s is declared of size 1000 in ertf_font.c. However, now I
> think I should reflect this in fscanf() as something similar to "%1000s".


Yes. Of course, that assumes that you know for sure that 's' has
1000 elements, which can't be checked from within the function.
And if you do something like that make it "%999s" - you need to
count in the trailing '\0' that is always appended, so there's
only room for 999 characters.

Another thing: what you're doing here seems to be writing a
parser. They are typically ugly to write and if the "syntax"
of the input isn't too convoluted it can be worth the time
to check out tools that automatically generate the "ugly"
part of the code for you. One example would be yacc/flex.
Admittedly, it takes some time to learn how to use them, but
then parsing files becomes a lot easier and much more fun
since you don't have to care anymore about all those pesky
little details
Regards, Jens
--
\ Jens Thoms Toerring ___ (E-Mail Removed)
\__________________________ http://toerring.de
 
Reply With Quote
 
Jens Thoms Toerring
Guest
Posts: n/a
 
      01-10-2011
Cross <(E-Mail Removed)> wrote:
> >> Considering this, s is declared of size 1000 in ertf_font.c. However, now I
> >> think I should reflect this in fscanf() as something similar to "%1000s".

> >
> > Yes. Of course, that assumes that you know for sure that 's' has
> > 1000 elements, which can't be checked from within the function.
> > And if you do something like that make it "%999s" - you need to
> > count in the trailing '\0' that is always appended, so there's
> > only room for 999 characters.

>
> Well "%999s" shall be able to get strings whose maximum length is 999,
> isn't it?


The "%999s" bit tells *scanf() to read in up to 999 chars -
the necessary trailing '\0' (that makes it a string) always
will be added automatically. So if you have a char array
with 1000 elements (and thus a sizeof 1000) the maximum
number of chars you can allow *scanf() to read in without
risking a buffer overrun is 999.

Regards, Jens
--
\ Jens Thoms Toerring ___ (E-Mail Removed)
\__________________________ http://toerring.de
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What is thread safety? Hans ASP .Net 1 10-12-2004 03:15 PM
newbie question on "Type Safety" Raymond Du ASP .Net 1 06-21-2004 11:42 AM
ADODB.connection safety settings error Steven Baeten ASP .Net 1 05-09-2004 10:57 PM
LiteralControl thread safety. George Ter-Saakov ASP .Net 1 04-06-2004 10:06 AM
VPN L2TP over IPSEC: double safety? Bert Roos Cisco 1 02-25-2004 08:52 PM



Advertisments