Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: sscanf(): weird behavior?

Reply
Thread Tools

Re: sscanf(): weird behavior?

 
 
Chris Torek
Guest
Posts: n/a
 
      06-26-2003
In article <(E-Mail Removed)>
Sidney Cadot <(E-Mail Removed)> writes:
>Thanks for your answers. The standard (I'm talking about N869 here
>which is the closest thing I have) talks about a "string" argument
>which, I assume, implies zero-termination in its terminology.


Yes, this is where both the ability to crash, and the slowness you
have experienced, come from.

>N869 says [sscanf] is "equivalent" to fscanf. Now one could read
>that to imply:
>
>(1) it has the same Big-Oh complexity as fscanf;


The C Standards *never* address performance.

>(2) If I use the format string "X" and the contents start with "Y",
>the fscanf spec says the "Y" character and subsequent characters will
>remain "unread". For the sscanf() I would assume this translates to
>"not read-accessed".


Although the scanf engine itself is not going to access them, the
"string" wording at the front gives sscanf() license to read through
the argument looking for the '\0' byte -- and indeed, real
implementations do precisely that.

>As to (mild) "abuse" of sscanf in my application, I beg to differ. The
>400 MB behemoths I'm reading are mmap()'d read-only files in my
>application in a binary format, which happen to contain ASCII-encoded
>numbers here and there.


I did not call it "abuse" myself; I merely pointed out that the C
library is generally optimized for typical uses, and yours is
nowhere near typical. Yours will probably also remain atypical as
long as the C library is so slow at it, creating a certain
chicken-and-egg problem.

>It can happen that I have something like XXXXXYYYYYYYY where XXXXX is
>a 5-digit decimal number and YYYYY is binary data possibly containing
>digits. sscanf(BUF, "%5d") would be ideal for the job, if not for the
>\0 restriction, and time complexity behavior.
>
>I'm sure I can find a way around this ...


The easiest is, I think, to memcpy() the desired portion into a
valid (and short ) C string, then apply strtol() on it. Why
strtol()? Because sscanf() has to parse a format string, find the
"%d" directive, copy[%] the number from the string-stream into a
suitable -- i.e., '\0'-terminated -- buffer, and call strtol() or
its equivalent anyway -- so if you already know that the input is
supposed to be a number, you can avoid a lot of work.

A particularly smart C compiler could see that a "%d" directive
(one without a count, that is) will just invoke strtol() and optimize
out the sscanf() step, but by doing this manually, you avoid the
need for a particularly smart C compiler.

[%Footnote: This copy step is not required for string-streams if
the implementor is willing to duplicate the work that strtol()
would perform, or have both the scanf engine and strtol() call some
subsidiary function. But because scanf formats can have field
widths -- as in the %5d directives in this very example -- and
strtol() does not have such limits, and because a string-stream's
backing memory may be read-only so that it is not always possible
(much less advisable) to punch a '\0' byte in, scanf cannot blindly
call strtol(), which might in this case read more than the desired
five digits. Moreover, for input that comes from unbuffered FILE
streams, if the scanf engine is going to use strtol() at all, it
*does* have to copy the characters to a temporary buffer. The
scanf() engine could do the strtol() work "in line", as it were,
one incoming digit at a time, but only at the cost of considerably
more code and the inability to share so much of the four integral
conversions (%d, %o, %u, and %i) in all their variants (%hh, %h,
none, %l, and %ll modifiers). (The ll modifiers need to use
strtoll() of course, and the unsigned variants need strtoul() or
strtoull(), but by factoring, we wind up with far less code inside
the scanf engine, which is already enough of a maintenance nightmare
as it is .)]
--
In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: A Weird Appearance for a Weird Site David Segall HTML 0 01-22-2011 04:50 AM
Re: A Weird Appearance for a Weird Site Beauregard T. Shagnasty HTML 1 01-21-2011 04:17 PM
Re: A Weird Appearance for a Weird Site richard HTML 0 01-21-2011 07:10 AM
Re: A Weird Appearance for a Weird Site dorayme HTML 1 01-21-2011 06:51 AM
Re: A Weird Appearance for a Weird Site richard HTML 0 01-21-2011 06:46 AM



Advertisments