Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > strtok()

Reply
Thread Tools

strtok()

 
 
Mark
Guest
Posts: n/a
 
      08-03-2010
Hi

I'm trying to write a simple parser for my application, the purpose is to
allow application understand the command line arguments in the form:

my_app 1-3,5,9
or
my_app 1,4,8-24
....

so it should support both ranges and enumerators. But my function doesn't
print what I expect:

int parseLine(char *buf)
{
char *token, *subtoken;
char buftmp[20];

for (token = strtok(buf, ","); token != NULL; token = strtok(NULL, ","))
{
printf("%s: ", token);
strcpy(buftmp, token); /* strtok modifies buffer, so we save a
copy */
for (subtoken = strtok(buftmp, "-"); subtoken != NULL;
subtoken = strtok(NULL, "-")) {
printf("%s ", buf,subtoken);
}
putchar('\n');
}

return 0;
}

For example, buf="1-3,5,8", and I'd expect to have such output:
1-3: 1 3
5: 5
8: 8

Where is my mistake?
Thanks!

--
Mark

 
Reply With Quote
 
 
 
 
Malcolm McLean
Guest
Posts: n/a
 
      08-03-2010
On Aug 3, 12:26*pm, "Mark" <mark_cruzNOTFORS...@hotmail.com> wrote:
> Hi
>
> I'm trying to write a simple parser for my application, the purpose is to
> allow application understand the command line arguments in the form:
>
> my_app 1-3,5,9
> or
> my_app 1,4,8-24
> ...
>
> so it should support both ranges and enumerators. But my function doesn't
> print what I expect:
>
> int parseLine(char *buf)
> {
> * * char *token, *subtoken;
> * * char buftmp[20];
>
> * * for (token = strtok(buf, ","); token != NULL; token = strtok(NULL, ","))
> {
> * * * * * * for (subtoken = strtok(buftmp, "-"); subtoken != NULL;
> * * * * * * *subtoken = strtok(NULL, "-")) {
> * * * * * * printf("%s ", buf,subtoken);
> * * * * }
> * * * * putchar('\n');
> * * }


>
> Where is my mistake?
>

Nesting strtoks(). The function uses a static to store the current
pointer position, which you then overwrite witht he nested call.
strtok is basically a bad function. Write your own strsplit() instead,
returning a list of strings in allocated memory.



 
Reply With Quote
 
 
 
 
Ben Bacarisse
Guest
Posts: n/a
 
      08-03-2010
"Mark" <> writes:

> I'm trying to write a simple parser for my application, the purpose is
> to allow application understand the command line arguments in the
> form:
>
> my_app 1-3,5,9
> or
> my_app 1,4,8-24
> ...
>
> so it should support both ranges and enumerators. But my function
> doesn't print what I expect:
>
> int parseLine(char *buf)
> {
> char *token, *subtoken;
> char buftmp[20];
>
> for (token = strtok(buf, ","); token != NULL; token = strtok(NULL,
> ",")) {
> printf("%s: ", token);
> strcpy(buftmp, token); /* strtok modifies buffer, so we save
> a copy */
> for (subtoken = strtok(buftmp, "-"); subtoken != NULL;
> subtoken = strtok(NULL, "-")) {
> printf("%s ", buf,subtoken);


The problem with strtok has been pointed out, but you can continue to
use it because you don't really need it here. You expect only one pair
or maybe a lone number and you can parse that using sscanf:

sscanf(token, "%d-%d", &low, &high)

will return 1 for lone numbers, 2 for a pair like 1-3 and anything else
is an error and needs to be reported.

If you need to check that there are no other characters in the token you
could do something like this:

sscanf(token, "%d%n-%d%n", &low, &len1, &high, &len1)

Now, you need a return of 1 and strlen(token) == len1 or a return of 2
and strlen(token) == len2. Again, anything else is an error.

> }
> putchar('\n');
> }
>
> return 0;
> }


<snip>
--
Ben.
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      08-03-2010
On 8/3/2010 5:26 AM, Mark wrote:
> Hi
>
> I'm trying to write a simple parser for my application, the purpose is
> to allow application understand the command line arguments in the form:
>
> my_app 1-3,5,9
> or
> my_app 1,4,8-24
> ...
>
> so it should support both ranges and enumerators. But my function
> doesn't print what I expect:
>
> int parseLine(char *buf)
> {
> char *token, *subtoken;
> char buftmp[20];
>
> for (token = strtok(buf, ","); token != NULL; token = strtok(NULL, ",")) {
> printf("%s: ", token);
> strcpy(buftmp, token); /* strtok modifies buffer, so we save a copy */
> for (subtoken = strtok(buftmp, "-"); subtoken != NULL;
> subtoken = strtok(NULL, "-")) {
> printf("%s ", buf,subtoken);
> }
> putchar('\n');
> }
>
> return 0;
> }
>
> For example, buf="1-3,5,8", and I'd expect to have such output:
> 1-3: 1 3
> 5: 5
> 8: 8
>
> Where is my mistake?


strtok() doesn't "nest:" It can be working on only one source
string at a time. When you call strtok(buftmp,...), it forgets
about the "outer" string.

If your system has the (non-Standard) strtok_r() function, you
might be able to use that instead of strtok().

--
Eric Sosman
lid
 
Reply With Quote
 
Nick Keighley
Guest
Posts: n/a
 
      08-03-2010
On 3 Aug, 10:26, "Mark" <mark_cruzNOTFORS...@hotmail.com> wrote:
> Hi
>
> I'm trying to write a simple parser for my application, the purpose is to
> allow application understand the command line arguments in the form:
>
> my_app 1-3,5,9
> or
> my_app 1,4,8-24
> ...
>
> so it should support both ranges and enumerators. But my function doesn't
> print what I expect:
>
> int parseLine(char *buf)
> {
> * * char *token, *subtoken;
> * * char buftmp[20];
>
> * * for (token = strtok(buf, ","); token != NULL; token = strtok(NULL, ","))
> {
> * * * * printf("%s: ", token);
> * * * * strcpy(buftmp, token); * */* strtok modifies buffer, so we save a
> copy */
> * * * * for (subtoken = strtok(buftmp, "-"); subtoken != NULL;
> * * * * * * *subtoken = strtok(NULL, "-")) {
> * * * * * * printf("%s ", buf,subtoken);
> * * * * }
> * * * * putchar('\n');
> * * }
>
> * * return 0;
>
> }
>
> For example, buf="1-3,5,8", and I'd expect to have such output:
> 1-3: 1 3
> 5: 5
> 8: 8


be nice if you told us what it did instead...
other posters have pointed out the nesting problem.
also not strtok() modifies the string it's parsing so beware

parseLine ("1-3,5,6");

might give a problem (its actually undefined behaviour to modify a
string literal)

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      08-03-2010
Ben Bacarisse <> writes:
[snip]
> The problem with strtok has been pointed out, but you can continue to
> use it because you don't really need it here. You expect only one pair
> or maybe a lone number and you can parse that using sscanf:
>
> sscanf(token, "%d-%d", &low, &high)
>
> will return 1 for lone numbers, 2 for a pair like 1-3 and anything else
> is an error and needs to be reported.

[...]

Keep in mind that sscanf's behavior is undefined if you scan a number
outside the range of the specified type. For example,
if INT_MAX==32767, then this:

sscanf("40000-50000", "%d-%d", &low, &high);

has undefined behavior. Which is a great pity; it makes the *scanf()
functions very difficult to use safely for numeric input.

With a bit of extra work, you can use the strto*() functions instead;
they're sane enough to tell you if the value is out of range (by
returning an extreme value and setting errno to ERANGE).

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Mark
Guest
Posts: n/a
 
      08-04-2010
Keith Thompson wrote:
[skip]
> With a bit of extra work, you can use the strto*() functions instead;
> they're sane enough to tell you if the value is out of range (by
> returning an extreme value and setting errno to ERANGE).

My system's strtok man page (Fedore Core 6) doesn't say anything about
returning extreme value or setting errno to ERANGE.

--
Mark

 
Reply With Quote
 
Mark
Guest
Posts: n/a
 
      08-04-2010
Vincenzo Mercuri wrote:
> I've written a scratch I hope will serve. Beware that maybe I am
> missing some error checkings, also you couldn't write white spaces
> between the separators "," , "-" and numbers. I didn't add any checks

Thanks, I'll give it a try.

--
Mark
 
Reply With Quote
 
Mark
Guest
Posts: n/a
 
      08-04-2010
Eric Sosman wrote:
[skip]
> strtok() doesn't "nest:" It can be working on only one source
> string at a time. When you call strtok(buftmp,...), it forgets
> about the "outer" string.
>
> If your system has the (non-Standard) strtok_r() function, you
> might be able to use that instead of strtok().


So for strtok_r() it's safe to pass the same buffer pointer? Like this:

for (token = strtok(buf, ","); token != NULL; token = strtok(NULL, ",")) {
printf("%s: ", token);
/* no need to keep a copy of 'buf' */
for (subtoken = strtok(buftmp, "-"); subtoken != NULL; subtoken =
strtok(NULL, "-")) {
printf("%s ", buf,subtoken);
}
}


--
Mark

 
Reply With Quote
 
Mark
Guest
Posts: n/a
 
      08-04-2010
One more question; when I compile code featuring strtok_r() with
"gcc -ansi -pedantic -W -Wall" it naturally complains:

warning: implicit declaration of function 'strtok_r'
warning: assignment makes pointer from integer without a cast

First warning is clear, the second refers to strtok_r() call:

char *token;
char *saveptr1 = NULL, *saveptr2 = NULL;
token = strtok_r(buf, ",", &saveptr1);

I wonder, what is the compiler's logic here: if in ANSI mode a function is
not prototyped, then the compiler considers that such functions return
'int', but it actually return 'char *', is that correct?

These warnings are gone, when compiled with "-posix -W -Wall"

--
Mark

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57