Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > C++ strtok

Reply
Thread Tools

C++ strtok

 
 
abcd
Guest
Posts: n/a
 
      04-24-2012
Hello C++ users,
Greetings.

I have a following question regarding strtok function used for string
tokenizing. As I understand, strtok internally uses static variable to
keep track of the string passed to it so that tokens can be searched
based on delimiter.
After the strtok returns NULL, it means that no tokens are available.

What if now strtok is invoked with another string to search for
tokens?? What happens to the internal static buffer which was
initialized to the previous string, when is that released??

Best Regards
Sumit
 
Reply With Quote
 
 
 
 
gwowen
Guest
Posts: n/a
 
      04-24-2012
On Apr 24, 10:46*am, abcd <(E-Mail Removed)> wrote:
>
> What if now strtok is invoked with another string to search for
> tokens?? What happens to the internal static buffer which was
> initialized to the previous string, when is that released??


There is no static buffer. strtok() modifies the string passed to it
as an argument, by overwriting the delimiter characters with '\0' so
that the return values points to the (modified) input C-string. There
is static state between calls (i.e. where the last tokenization got
to), but no dynamic buffer is needed.
 
Reply With Quote
 
 
 
 
Vlad from Moscow
Guest
Posts: n/a
 
      04-24-2012
On 24 апр, 13:46, abcd <(E-Mail Removed)> wrote:
> Hello C++ users,
> Greetings.
>
> I have a following question regarding strtok function used for string
> tokenizing. As I understand, strtok internally uses static variable to
> keep track of the string passed to it so that tokens can be searched
> based on delimiter.
> After the strtok returns NULL, it means that no tokens are available.
>
> What if now strtok is invoked with another string to search for
> tokens?? What happens to the internal static buffer which was
> initialized to the previous string, when is that released??
>
> Best Regards
> Sumit
>
>


As I understand it does not have any internal static buffer. So there
is no need to release anything. It has a static variable of type point
to char. When you supply another string to process that static pointer
is set to this string. So in the very beginning there is a check
whether supplied string has value of NULL. If it is not equal to NULL
then the static pointer is set to this new value.
 
Reply With Quote
 
Marcel Mller
Guest
Posts: n/a
 
      04-24-2012
On 24.04.2012 11:46, abcd wrote:
> I have a following question regarding strtok function used for string
> tokenizing. As I understand, strtok internally uses static variable to
> keep track of the string passed to it so that tokens can be searched
> based on delimiter.
> After the strtok returns NULL, it means that no tokens are available.
>
> What if now strtok is invoked with another string to search for
> tokens??


In this case the static state is discarded. Once you passed another
string you cannot continue to tokenize the first one.

> What happens to the internal static buffer which was
> initialized to the previous string, when is that released??


There is nothing to release. The internal state has fixed size and
refers to the string buffer you supplied at the first call. The state is
globally allocated in the data segment of the C++ runtime.

More exactly, modern thread-safe C++ runtimes allocate the storage for
the internal state of strtok as thread local storage. Otherwise strtok
would be almost useless.


In practice I avoid to use strtok at all.

Firstly, because it is not re-entrant. I.e. you must not parse another
string while you have to complete the first one. This divides the
functions that you are allowed to call from within the parser loop into
the ones that never call strtok and the functions that might call strtok.
While it is trivial to decide this for runtime library functions it
becomes error prone for your own code. E.g. an object method you call
might internally call methods that use strtok. You might not be aware of
that.

Secondly strtok modifies the original string in a C style way. C like
string manipulation should not be used in C++ programs because it is
error prone and often a backdoor for security vulnerabilities. As long
as you do not deal with char* in C++ and you only use const char* the
probability of security vulnerabilities is significantly reduced.

Use strspn and strcspn for C style parsing in C++. They will easily
achieve the same behavior than strtok without it's disadvantages. I.e.
they do not modify the input buffer and the internal state is kept at
the local stack.

strtok is mainly supported for C compatibility by the C++ runtime.


Marcel
 
Reply With Quote
 
none
Guest
Posts: n/a
 
      04-24-2012
In article <4f968b83$0$7620$(E-Mail Removed)-online.net>,
Marcel Mller <(E-Mail Removed)> wrote:
>On 24.04.2012 11:46, abcd wrote:
>> I have a following question regarding strtok function used for string
>> tokenizing. As I understand, strtok internally uses static variable to
>> keep track of the string passed to it so that tokens can be searched
>> based on delimiter.
>> After the strtok returns NULL, it means that no tokens are available.
>>
>> What if now strtok is invoked with another string to search for
>> tokens??

>
>In this case the static state is discarded. Once you passed another
>string you cannot continue to tokenize the first one.
>
>> What happens to the internal static buffer which was
>> initialized to the previous string, when is that released??

>
>There is nothing to release. The internal state has fixed size and
>refers to the string buffer you supplied at the first call. The state is
>globally allocated in the data segment of the C++ runtime.
>
>More exactly, modern thread-safe C++ runtimes allocate the storage for
>the internal state of strtok as thread local storage. Otherwise strtok
>would be almost useless.
>
>
>In practice I avoid to use strtok at all.
>
>Firstly, because it is not re-entrant. I.e. you must not parse another
>string while you have to complete the first one. This divides the
>functions that you are allowed to call from within the parser loop into
>the ones that never call strtok and the functions that might call strtok.
>While it is trivial to decide this for runtime library functions it
>becomes error prone for your own code. E.g. an object method you call
>might internally call methods that use strtok. You might not be aware of
>that.
>
>Secondly strtok modifies the original string in a C style way. C like
>string manipulation should not be used in C++ programs because it is
>error prone and often a backdoor for security vulnerabilities. As long
>as you do not deal with char* in C++ and you only use const char* the
>probability of security vulnerabilities is significantly reduced.
>
>Use strspn and strcspn for C style parsing in C++. They will easily
>achieve the same behavior than strtok without it's disadvantages. I.e.
>they do not modify the input buffer and the internal state is kept at
>the local stack.
>
>strtok is mainly supported for C compatibility by the C++ runtime.


Totally agree with Marcel here. strtok is not a very good function to
use anywhere, neither in C nor in C++. Even the manual says so:

---------------------------------------
man strtok
<snip snip>
BUGS
Be cautious when using these functions. If you do use them,
note that:

* These functions modify their first argument.

* These functions cannot be used on constant strings.

* The identity of the delimiting character is lost.

* The strtok() function uses a static buffer while parsing, so
it's not thread safe. Use strtok_r() if this matters to you.
-----------------------------------------

std::string::find() and std::string::substr() can pretty much do
everything the strtok does much more safely. I typically use a
template function that tokenize a string a return vector of string
containing the individual tokens. Much more usable at the cost of
copying a few string. It is very rarely a performance bottle neck.

Yannick

 
Reply With Quote
 
Dan McLeran
Guest
Posts: n/a
 
      04-24-2012
> What if now strtok is invoked with another string to search for
> tokens?? What happens to the internal static buffer which was
> initialized to the previous string, when is that released??
>
> Best Regards
> Sumit


Have a look at Boost's excellent libraries: http://www.boost.org/doc/libs/1_49_0/libs/tokenizer/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
strtok() and std::string Alex Vinokur C++ 6 04-14-2005 01:40 PM
Problems with strtok() returning one too many tokens... Adam Balgach C++ 2 11-28-2004 01:12 AM
segfault on strtok Fatih Gey C Programming 40 11-01-2003 07:24 PM
strtok trouble Robert C Programming 17 09-06-2003 10:30 PM
strtok problem jorntk@yahoo.com C Programming 4 08-29-2003 11:26 AM



Advertisments