Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > strtok behavior with multiple consecutive delimiters

Reply
Thread Tools

strtok behavior with multiple consecutive delimiters

 
 
Geometer
Guest
Posts: n/a
 
      05-06-2006
Hello, and good whatever daytime is at your place..


please can somebody tell me, what the standard behavior of strtok shall be,
if it encounters two or more consecutive delimiters like in
(checks omitted)

char tst[] = "this\nis\n\nan\nempty\n\n\nline";
^^^^ ^^^^^^
char *tok = strtok(tst, "\n");
tok = strtok(NULL, "\n");
and so on..

will the groups of '\n' marked above be consumed one by one or the whole
group together?

Thank you very much


 
Reply With Quote
 
 
 
 
Phlip
Guest
Posts: n/a
 
      05-06-2006
Geometer wrote:

> please can somebody tell me, what the standard behavior of strtok shall
> be, if it encounters two or more consecutive delimiters like in
> (checks omitted)
>
> char tst[] = "this\nis\n\nan\nempty\n\n\nline";
> ^^^^ ^^^^^^
> char *tok = strtok(tst, "\n");
> tok = strtok(NULL, "\n");
> and so on..
>
> will the groups of '\n' marked above be consumed one by one or the whole
> group together?


Yes.

But why didn't you just write a test case and see?

Going forward, don't use strtok(). Google for a replacement, possibly
including a Regex system. Then you can control such details.

--
Phlip
http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!


 
Reply With Quote
 
 
 
 
Geometer
Guest
Posts: n/a
 
      05-06-2006


--
Geometer
Dipl.Ing. Erwin Lebloch

Hauptplatz 39
2130 Mistelbach - N÷
Tel.: 02572/4300

www.lebloch.at
http://www.velocityreviews.com/forums/(E-Mail Removed)
"Phlip" <(E-Mail Removed)> schrieb im Newsbeitrag
news:%d27g.27531$(E-Mail Removed). com...
> Geometer wrote:
>
>> please can somebody tell me, what the standard behavior of strtok shall
>> be, if it encounters two or more consecutive delimiters like in
>> (checks omitted)
>>
>> char tst[] = "this\nis\n\nan\nempty\n\n\nline";
>> ^^^^ ^^^^^^
>> char *tok = strtok(tst, "\n");
>> tok = strtok(NULL, "\n");
>> and so on..
>>
>> will the groups of '\n' marked above be consumed one by one or the whole
>> group together?

>
> Yes.
>
> But why didn't you just write a test case and see?


I did . I just wanted to know if this is the behavior required by the
standard and whether there is a difference betwenn C and C++.
Thanks for your response.

Robert


 
Reply With Quote
 
Peter Jansson
Guest
Posts: n/a
 
      05-06-2006
Geometer wrote:
> Hello, and good whatever daytime is at your place..
>
>
> please can somebody tell me, what the standard behavior of strtok shall be,
> if it encounters two or more consecutive delimiters like in
> (checks omitted)
>
> char tst[] = "this\nis\n\nan\nempty\n\n\nline";
> ^^^^ ^^^^^^
> char *tok = strtok(tst, "\n");
> tok = strtok(NULL, "\n");
> and so on..
>
> will the groups of '\n' marked above be consumed one by one or the whole
> group together?
>
> Thank you very much
>
>


<quote src="A man-page for strok.">

Never use these functions. If you do, note that:
These functions modify their first argument.
These functions cannot be used on constant strings.
The identity of the delimiting character is lost.
The strtok() function uses a static buffer while parsing,
so itís not thread safe.

</quote>

Regards,

Peter Jansson
http://www.p-jansson.com/
http://www.jansson.net/
 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      05-06-2006
Peter Jansson wrote:
> Geometer wrote:
>>
>> please can somebody tell me, what the standard behavior of
>> strtok shall be, if it encounters two or more consecutive
>> delimiters like in (checks omitted)
>>
>> char tst[] = "this\nis\n\nan\nempty\n\n\nline";
>> ^^^^ ^^^^^^
>> char *tok = strtok(tst, "\n");
>> tok = strtok(NULL, "\n");
>> and so on..
>>
>> will the groups of '\n' marked above be consumed one by one or
>> the whole group together?

>
> <quote src="A man-page for strok.">
>
> Never use these functions. If you do, note that:
> These functions modify their first argument.
> These functions cannot be used on constant strings.
> The identity of the delimiting character is lost.
> The strtok() function uses a static buffer while parsing,
> so itís not thread safe.
>
> </quote>


The OP can simply use the following replacement function, which
does not have those objectionable features. The testing code is
longer than the function.

/* ------- file toksplit.c ----------*/
#include "toksplit.h"

/* copy over the next token from an input string, after
skipping leading blanks (or other whitespace?). The
token is terminated by the first appearance of tokchar,
or by the end of the source string.

The caller must supply sufficient space in token to
receive any token, Otherwise tokens will be truncated.

Returns: a pointer past the terminating tokchar.

This will happily return an infinity of empty tokens if
called with src pointing to the end of a string. Tokens
will never include a copy of tokchar.

A better name would be "strtkn", except that is reserved
for the system namespace. Change to that at your risk.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
*/

const char *toksplit(const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh) /* length token can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) *src++;

while (*src && (tokchar != *src)) {
if (lgh) {
*token++ = *src;
--lgh;
}
src++;
}
if (*src && (tokchar == *src)) src++;
}
*token = '\0';
return src;
} /* toksplit */

#ifdef TESTING
#include <stdio.h>

#define ABRsize 6 /* length of acceptable token abbreviations */

int main(void)
{
char teststring[] = "This is a test, ,, abbrev, more";

const char *t, *s = teststring;
int i;
char token[ABRsize + 1];

puts(teststring);
t = s;
for (i = 0; i < 4; i++) {
t = toksplit(t, ',', token, ABRsize);
putchar(i + '1'); putchar(':');
puts(token);
}

puts("\nHow to detect 'no more tokens'");
t = s; i = 0;
while (*t) {
t = toksplit(t, ',', token, 3);
putchar(i + '1'); putchar(':');
puts(token);
i++;
}

puts("\nUsing blanks as token delimiters");
t = s; i = 0;
while (*t) {
t = toksplit(t, ' ', token, ABRsize);
putchar(i + '1'); putchar(':');
puts(token);
i++;
}
return 0;
} /* main */

#endif
/* ------- end file toksplit.c ----------*/

I have set follow-ups to exclude c.l.c++. Although the above code
is usable there, it is seldom a good idea to mix the two
languages. I have not provided a header file with a C++ linkage
provision.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>


 
Reply With Quote
 
Jerry Coffin
Guest
Posts: n/a
 
      05-06-2006
In article <1n27g.55903$(E-Mail Removed)>,
(E-Mail Removed) says...

[ ... ]

> The strtok() function uses a static buffer while parsing,
> so itís not thread safe.


More accurately, it uses a static pointer while parsing,
so the vendor has to go to extra work to make it thread
safe. The same is true with a number of other functions
as well, though -- much of what's defined in time.h, to
give only one obvious example.

--
Later,
Jerry.

The universe is a figment of its own imagination.
 
Reply With Quote
 
Ben Pfaff
Guest
Posts: n/a
 
      05-06-2006
"Geometer" <(E-Mail Removed)> writes:

> please can somebody tell me, what the standard behavior of strtok shall be,
> if it encounters two or more consecutive delimiters like in


strtok() has at least these problems:

* It merges adjacent delimiters. If you use a comma as your
delimiter, then "a,,b,c" will be divided into three tokens,
not four. This is often the wrong thing to do. In fact, it
is only the right thing to do, in my experience, when the
delimiter set contains white space (for dividing a string
into "words") or it is known in advance that there will be
no adjacent delimiters.

* The identity of the delimiter is lost, because it is
changed to a null terminator.

* It modifies the string that it tokenizes. This is bad
because it forces you to make a copy of the string if
you want to use it later. It also means that you can't
tokenize a string literal with it; this is not
necessarily something you'd want to do all the time but
it is surprising.

* It can only be used once at a time. If a sequence of
strtok() calls is ongoing and another one is started,
the state of the first one is lost. This isn't a
problem for small programs but it is easy to lose track
of such things in hierarchies of nested functions in
large programs. In other words, strtok() breaks
encapsulation.

--
"What is appropriate for the master is not appropriate for the novice.
You must understand the Tao before transcending structure."
--The Tao of Programming
 
Reply With Quote
 
andy@servocomm.freeserve.co.uk
Guest
Posts: n/a
 
      05-06-2006
CBFalconer wrote:

> The OP can simply use the following replacement function, which
> does not have those objectionable features. The testing code is
> longer than the function.


OTOH By using C++ life becomes more productive, less error prone,
less complicated and more elegant:

#include <sstream>
#include <string>
#include <vector>
#include <iostream>

int main()
{

char tst[] = "this\nis\n\nan\nempty\n\n\nline";

std::stringstream s;
s << tst;

std::vector<std::string> tokens;
while (! s.eof() ){
std::string str;
getline(s,str,'\n');
tokens.push_back(str);
}

for (std::vector<std::string>::const_iterator iter
= tokens.begin();
iter !=tokens.end();
++iter){
std::cout << "token: \""<< *iter <<"\"\n";
}

}

regards
Andy Little

 
Reply With Quote
 
Pete Becker
Guest
Posts: n/a
 
      05-06-2006
Peter Jansson wrote:
>
> <quote src="A man-page for strok.">


The name of the function is strtok.

>
> Never use these functions. If you do, note that:
> These functions modify their first argument.
> These functions cannot be used on constant strings.


These two say the same thing. Sounds like someone is trying too hard.

> The identity of the delimiting character is lost.


Which has nothing to do with the claim that you should never use it. You
shouldn't use it if you need to know which of the delimiters was
actually encountered.

> The strtok() function uses a static buffer while parsing,


No, it uses a static variable to hold its result BETWEEN calls.

> so itís not thread safe.
>


Non sequitur. It's easy enough to implement with a per-thread static
pointer, which is thread safe.

Yup, definitely trying too hard. strtok is well suited for what it does.
If you need something more elaborate, go for it.

--

Pete Becker
Roundhouse Consulting, Ltd.
 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      05-06-2006
You forgot toksplit.h Chuck

Can you post it too?

jacob
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Trouble splitting strings with consecutive delimiters deuteros Python 3 05-02-2012 06:37 AM
URL with multiple successive parameter delimiters? Greg N. HTML 2 05-07-2008 04:15 PM
using strtok to mark delimiters as tokens gpaps87@gmail.com C Programming 11 03-13-2008 08:47 AM
strtok behavior with multiple consecutive delimiters Geometer C++ 33 05-09-2006 02:32 PM
Help with split using multiple delimiters geeknc@yahoo.com Perl Misc 11 07-29-2005 06:50 PM



Advertisments