Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Re: sscanf(): weird behavior? (http://www.velocityreviews.com/forums/t313834-re-sscanf-weird-behavior.html)

Chris Torek 06-26-2003 06:11 AM

Re: sscanf(): weird behavior?
 
In article <741e0fbf.0306232332.8e569b1@posting.google.com>
Sidney Cadot <sidney@jigsaw.nl> writes:
>Thanks for your answers. The standard (I'm talking about N869 here
>which is the closest thing I have) talks about a "string" argument
>which, I assume, implies zero-termination in its terminology.


Yes, this is where both the ability to crash, and the slowness you
have experienced, come from.

>N869 says [sscanf] is "equivalent" to fscanf. Now one could read
>that to imply:
>
>(1) it has the same Big-Oh complexity as fscanf;


The C Standards *never* address performance.

>(2) If I use the format string "X" and the contents start with "Y",
>the fscanf spec says the "Y" character and subsequent characters will
>remain "unread". For the sscanf() I would assume this translates to
>"not read-accessed".


Although the scanf engine itself is not going to access them, the
"string" wording at the front gives sscanf() license to read through
the argument looking for the '\0' byte -- and indeed, real
implementations do precisely that.

>As to (mild) "abuse" of sscanf in my application, I beg to differ. The
>400 MB behemoths I'm reading are mmap()'d read-only files in my
>application in a binary format, which happen to contain ASCII-encoded
>numbers here and there.


I did not call it "abuse" myself; I merely pointed out that the C
library is generally optimized for typical uses, and yours is
nowhere near typical. Yours will probably also remain atypical as
long as the C library is so slow at it, creating a certain
chicken-and-egg problem. :-)

>It can happen that I have something like XXXXXYYYYYYYY where XXXXX is
>a 5-digit decimal number and YYYYY is binary data possibly containing
>digits. sscanf(BUF, "%5d") would be ideal for the job, if not for the
>\0 restriction, and time complexity behavior.
>
>I'm sure I can find a way around this ...


The easiest is, I think, to memcpy() the desired portion into a
valid (and short :-) ) C string, then apply strtol() on it. Why
strtol()? Because sscanf() has to parse a format string, find the
"%d" directive, copy[%] the number from the string-stream into a
suitable -- i.e., '\0'-terminated -- buffer, and call strtol() or
its equivalent anyway -- so if you already know that the input is
supposed to be a number, you can avoid a lot of work.

A particularly smart C compiler could see that a "%d" directive
(one without a count, that is) will just invoke strtol() and optimize
out the sscanf() step, but by doing this manually, you avoid the
need for a particularly smart C compiler.

[%Footnote: This copy step is not required for string-streams if
the implementor is willing to duplicate the work that strtol()
would perform, or have both the scanf engine and strtol() call some
subsidiary function. But because scanf formats can have field
widths -- as in the %5d directives in this very example -- and
strtol() does not have such limits, and because a string-stream's
backing memory may be read-only so that it is not always possible
(much less advisable) to punch a '\0' byte in, scanf cannot blindly
call strtol(), which might in this case read more than the desired
five digits. Moreover, for input that comes from unbuffered FILE
streams, if the scanf engine is going to use strtol() at all, it
*does* have to copy the characters to a temporary buffer. The
scanf() engine could do the strtol() work "in line", as it were,
one incoming digit at a time, but only at the cost of considerably
more code and the inability to share so much of the four integral
conversions (%d, %o, %u, and %i) in all their variants (%hh, %h,
none, %l, and %ll modifiers). (The ll modifiers need to use
strtoll() of course, and the unsigned variants need strtoul() or
strtoull(), but by factoring, we wind up with far less code inside
the scanf engine, which is already enough of a maintenance nightmare
as it is :-) .)]
--
In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.


All times are GMT. The time now is 12:06 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.