Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: Deficiency of strtok

Reply
Thread Tools

Re: Deficiency of strtok

 
 
CBFalconer
Guest
Posts: n/a
 
      08-08-2008
Pilcrow wrote:
>
> Here is a quick program, together with its output, that illustrates
> what I consider to be a deficiency of the standard function strtok
> from <string.h>:


Here is my solution to that problem.

/* ------- file tknsplit.h ----------*/
#ifndef H_tknsplit_h
# define H_tknsplit_h

# ifdef __cplusplus
extern "C" {
# endif

#include <stddef.h>

/* copy over the next tkn from an input string, after
skipping leading blanks (or other whitespace?). The
tkn is terminated by the first appearance of tknchar,
or by the end of the source string.

The caller must supply sufficient space in tkn to
receive any tkn, Otherwise tkns will be truncated.

Returns: a pointer past the terminating tknchar.

This will happily return an infinity of empty tkns if
called with src pointing to the end of a string. Tokens
will never include a copy of tknchar.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
revised 2007-05-26 (name)
*/

const char *tknsplit(const char *src, /* Source of tkns */
char tknchar, /* tkn delimiting char */
char *tkn, /* receiver of parsed tkn */
size_t lgh); /* length tkn can receive */
/* not including final '\0' */

# ifdef __cplusplus
}
# endif
#endif
/* ------- end file tknsplit.h ----------*/

/* ------- file tknsplit.c ----------*/
#include "tknsplit.h"

/* copy over the next tkn from an input string, after
skipping leading blanks (or other whitespace?). The
tkn is terminated by the first appearance of tknchar,
or by the end of the source string.

The caller must supply sufficient space in tkn to
receive any tkn, Otherwise tkns will be truncated.

Returns: a pointer past the terminating tknchar.

This will happily return an infinity of empty tkns if
called with src pointing to the end of a string. Tokens
will never include a copy of tknchar.

A better name would be "strtkn", except that is reserved
for the system namespace. Change to that at your risk.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
Revised 2006-06-13 2007-05-26 (name)
*/

const char *tknsplit(const char *src, /* Source of tkns */
char tknchar, /* tkn delimiting char */
char *tkn, /* receiver of parsed tkn */
size_t lgh) /* length tkn can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) src++;

while (*src && (tknchar != *src)) {
if (lgh) {
*tkn++ = *src;
--lgh;
}
src++;
}
if (*src && (tknchar == *src)) src++;
}
*tkn = '\0';
return src;
} /* tknsplit */

#ifdef TESTING
#include <stdio.h>

#define ABRsize 6 /* length of acceptable tkn abbreviations */

/* ---------------- */

static void showtkn(int i, char *tok)
{
putchar(i + '1'); putchar(':');
puts(tok);
} /* showtkn */

/* ---------------- */

int main(void)
{
char teststring[] = "This is a test, ,, abbrev, more";

const char *t, *s = teststring;
int i;
char tkn[ABRsize + 1];

puts(teststring);
t = s;
for (i = 0; i < 4; i++) {
t = tknsplit(t, ',', tkn, ABRsize);
showtkn(i, tkn);
}

puts("\nHow to detect 'no more tkns' while truncating");
t = s; i = 0;
while (*t) {
t = tknsplit(t, ',', tkn, 3);
showtkn(i, tkn);
i++;
}

puts("\nUsing blanks as tkn delimiters");
t = s; i = 0;
while (*t) {
t = tknsplit(t, ' ', tkn, ABRsize);
showtkn(i, tkn);
i++;
}
return 0;
} /* main */

#endif
/* ------- end file tknsplit.c ----------*/

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.


 
Reply With Quote
 
 
 
 
Richard
Guest
Posts: n/a
 
      08-08-2008
CBFalconer <> writes:

> Pilcrow wrote:
>>
>> Here is a quick program, together with its output, that illustrates
>> what I consider to be a deficiency of the standard function strtok
>> from <string.h>:

>
> Here is my solution to that problem.
>
> /* ------- file tknsplit.h ----------*/
> #ifndef H_tknsplit_h
> # define H_tknsplit_h
>
> # ifdef __cplusplus
> extern "C" {
> # endif
>
> #include <stddef.h>
>
> /* copy over the next tkn from an input string, after
> skipping leading blanks (or other whitespace?). The
> tkn is terminated by the first appearance of tknchar,
> or by the end of the source string.
>
> The caller must supply sufficient space in tkn to
> receive any tkn, Otherwise tkns will be truncated.
>
> Returns: a pointer past the terminating tknchar.
>
> This will happily return an infinity of empty tkns if
> called with src pointing to the end of a string. Tokens
> will never include a copy of tknchar.
>
> released to Public Domain, by C.B. Falconer.
> Published 2006-02-20. Attribution appreciated.
> revised 2007-05-26 (name)
> */
>
> const char *tknsplit(const char *src, /* Source of tkns */
> char tknchar, /* tkn delimiting char */
> char *tkn, /* receiver of parsed tkn */
> size_t lgh); /* length tkn can receive */
> /* not including final '\0' */
>
> # ifdef __cplusplus
> }
> # endif
> #endif
> /* ------- end file tknsplit.h ----------*/
>
> /* ------- file tknsplit.c ----------*/
> #include "tknsplit.h"
>
> /* copy over the next tkn from an input string, after
> skipping leading blanks (or other whitespace?). The
> tkn is terminated by the first appearance of tknchar,
> or by the end of the source string.
>
> The caller must supply sufficient space in tkn to
> receive any tkn, Otherwise tkns will be truncated.
>
> Returns: a pointer past the terminating tknchar.
>
> This will happily return an infinity of empty tkns if
> called with src pointing to the end of a string. Tokens
> will never include a copy of tknchar.
>
> A better name would be "strtkn", except that is reserved
> for the system namespace. Change to that at your risk.
>
> released to Public Domain, by C.B. Falconer.
> Published 2006-02-20. Attribution appreciated.
> Revised 2006-06-13 2007-05-26 (name)
> */
>
> const char *tknsplit(const char *src, /* Source of tkns */
> char tknchar, /* tkn delimiting char */
> char *tkn, /* receiver of parsed tkn */
> size_t lgh) /* length tkn can receive */
> /* not including final '\0' */
> {
> if (src) {
> while (' ' == *src) src++;
>
> while (*src && (tknchar != *src)) {
> if (lgh) {
> *tkn++ = *src;
> --lgh;
> }
> src++;
> }
> if (*src && (tknchar == *src)) src++;
> }


I would replace the function with this more or less for the following
reasons:

Back to front comparisons are used in a minority of code and most people
hate them. And yes I do know "most" is not "all". Naming of variables
seems almost meaningles - lgh is what? it doesnt save much compiling
time to use meaningful names in a publicly released library. Oh and like
other functions assume it is called with meaningful data.

,----
|
|/* remove leading white space */
|while(*source==' ')
| *source++;
|
|/* store string up to next token */
|while(maxLength-- && ((tokenChar = *source++) != endOfTokenChar))
| *savedToken++=tokenChar;
|
|*savedToken='\0';
|
`----





> *tkn = '\0';
> return src;
> } /* tknsplit */
>
> #ifdef TESTING
> #include <stdio.h>
>
> #define ABRsize 6 /* length of acceptable tkn abbreviations */
>
> /* ---------------- */
>
> static void showtkn(int i, char *tok)
> {
> putchar(i + '1'); putchar(':');
> puts(tok);
> } /* showtkn */
>
> /* ---------------- */
>
> int main(void)
> {
> char teststring[] = "This is a test, ,, abbrev, more";
>
> const char *t, *s = teststring;
> int i;
> char tkn[ABRsize + 1];
>
> puts(teststring);
> t = s;
> for (i = 0; i < 4; i++) {
> t = tknsplit(t, ',', tkn, ABRsize);
> showtkn(i, tkn);
> }
>
> puts("\nHow to detect 'no more tkns' while truncating");
> t = s; i = 0;
> while (*t) {
> t = tknsplit(t, ',', tkn, 3);
> showtkn(i, tkn);
> i++;
> }
>
> puts("\nUsing blanks as tkn delimiters");
> t = s; i = 0;
> while (*t) {
> t = tknsplit(t, ' ', tkn, ABRsize);
> showtkn(i, tkn);
> i++;
> }
> return 0;
> } /* main */
>
> #endif
> /* ------- end file tknsplit.c ----------*/


--
 
Reply With Quote
 
 
 
 
Richard
Guest
Posts: n/a
 
      08-08-2008
Richard<> writes:

> CBFalconer <> writes:
>
> ,----
> |
> |/* remove leading white space */
> |while(*source==' ')
> | *source++;


Whoops! ^ error above left to the inquisitive to spot and pick on ....

> |
> |/* store string up to next token */
> |while(maxLength-- && ((tokenChar = *source++) != endOfTokenChar))
> | *savedToken++=tokenChar;
> |
> |*savedToken='\0';
> |
> `----
>
>
>
>
>
>> *tkn = '\0';
>> return src;
>> } /* tknsplit */
>>
>> #ifdef TESTING
>> #include <stdio.h>
>>
>> #define ABRsize 6 /* length of acceptable tkn abbreviations */
>>
>> /* ---------------- */
>>
>> static void showtkn(int i, char *tok)
>> {
>> putchar(i + '1'); putchar(':');
>> puts(tok);
>> } /* showtkn */
>>
>> /* ---------------- */
>>
>> int main(void)
>> {
>> char teststring[] = "This is a test, ,, abbrev, more";
>>
>> const char *t, *s = teststring;
>> int i;
>> char tkn[ABRsize + 1];
>>
>> puts(teststring);
>> t = s;
>> for (i = 0; i < 4; i++) {
>> t = tknsplit(t, ',', tkn, ABRsize);
>> showtkn(i, tkn);
>> }
>>
>> puts("\nHow to detect 'no more tkns' while truncating");
>> t = s; i = 0;
>> while (*t) {
>> t = tknsplit(t, ',', tkn, 3);
>> showtkn(i, tkn);
>> i++;
>> }
>>
>> puts("\nUsing blanks as tkn delimiters");
>> t = s; i = 0;
>> while (*t) {
>> t = tknsplit(t, ' ', tkn, ABRsize);
>> showtkn(i, tkn);
>> i++;
>> }
>> return 0;
>> } /* main */
>>
>> #endif
>> /* ------- end file tknsplit.c ----------*/


--
 
Reply With Quote
 
santosh
Guest
Posts: n/a
 
      08-08-2008
Richard wrote:
> CBFalconer <> writes:


<snip>

>> const char *tknsplit(const char *src, /* Source of tkns */
>> char tknchar, /* tkn delimiting char */
>> char *tkn, /* receiver of parsed tkn */
>> size_t lgh) /* length tkn can receive */
>> /* not including final '\0' */
>> {
>> if (src) {
>> while (' ' == *src) src++;
>>
>> while (*src && (tknchar != *src)) {
>> if (lgh) {
>> *tkn++ = *src;
>> --lgh;
>> }
>> src++;
>> }
>> if (*src && (tknchar == *src)) src++;
>> }

>
> I would replace the function with this more or less for the following
> reasons:
>
> Back to front comparisons are used in a minority of code and most
> people hate them. And yes I do know "most" is not "all". Naming of
> variables seems almost meaningles - lgh is what? it doesnt save much
> compiling time to use meaningful names in a publicly released library.
> Oh and like other functions assume it is called with meaningful data.
>
> ,----
> |
> |/* remove leading white space */
> |while(*source==' ')


Wouldn't you be better of using isspace()?

> | *source++;


I suppose this should be source++?

> |/* store string up to next token */
> |while(maxLength-- && ((tokenChar = *source++) != endOfTokenChar))
> | *savedToken++=tokenChar;
> |
> |*savedToken='\0';
> |
> `----


<snip>

 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      08-08-2008
Richard<> writes:

> CBFalconer <> writes:

<snip>
>> const char *tknsplit(const char *src, /* Source of tkns */
>> char tknchar, /* tkn delimiting char */
>> char *tkn, /* receiver of parsed tkn */
>> size_t lgh) /* length tkn can receive */
>> /* not including final '\0' */
>> {
>> if (src) {
>> while (' ' == *src) src++;
>>
>> while (*src && (tknchar != *src)) {
>> if (lgh) {
>> *tkn++ = *src;
>> --lgh;
>> }
>> src++;
>> }
>> if (*src && (tknchar == *src)) src++;
>> }

>
> I would replace the function with this more or less for the following
> reasons:
>
> Back to front comparisons are used in a minority of code and most people
> hate them. And yes I do know "most" is not "all". Naming of variables
> seems almost meaningles - lgh is what? it doesnt save much compiling
> time to use meaningful names in a publicly released library. Oh and like
> other functions assume it is called with meaningful data.
>
> ,----
> |
> |/* remove leading white space */
> |while(*source==' ')
> | *source++;


This erroneous * has been corrected elsewhere, but..

> |
> |/* store string up to next token */
> |while(maxLength-- && ((tokenChar = *source++) != endOfTokenChar))
> | *savedToken++=tokenChar;
> |
> |*savedToken='\0';
> |
> `----


You can't be serious? You seem to be since you corrected one mistake
but you've left plenty more. Hardly a re-write to recommend.

>> *tkn = '\0';
>> return src;
>> } /* tknsplit */


--
Ben.
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      08-17-2008
Pilcrow <> writes:

<snip>
> int main(void)
> {
> char line[256];
> while(fgets(line, sizeof line,
> stdin)!= NULL && strlen(line) >1) { /* exit on empty line or ^Z
> */
> const char *t = line;
> int i;
> char tkn[1]; /*** ?? ***/
> i = 0;
>
> printf("\nINPUT: %s\n",line);
> puts("OUTPUT:");
> while (*t) {
> t = tknsplit(t, ',', tkn, 100);
> showtkn(i, tkn);
> i++;
> }
> putchar('\n');
> }
> return 0;
> } /* main */
>
> When I test, I get:
>
> C:\>gcc -pedantic -O -Wall -Wextra tknsplit.c -o tknsplit
>
> C:\>tknsplit.exe
> ghghghghghg,hhhhhh,, ,kk
>
> INPUT: ghghghghghg,hhhhhh,, ,kk
>
> OUTPUT:
> 0:[ghghghghghg]
> 1:[hhhhhh]
> 2:[]
> 3:[]
> 4:[kk]
>
> Notice the definition
> char tkn[1];
>
> This should not work according to Mr. Falconer. Why does it?


Since you need a '\0' at the end you can't store any useful tokens in
an array of size 1. You tell the function that the array can hold 100
characters and at that point you enter the realms of undefined
behaviour. You were just unlucky that the program ran without obvious
fault -- you must not lie about the size of receiving array.

--
Ben.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XSD Deficiency: Incompatibility of xs:add and xs:any KMyers1 XML 2 07-20-2007 05:26 PM
XSD Deficiency: Incompatibility of xs:add and xs:any KMyers1 XML 0 07-20-2007 02:51 PM
Serious Perl Regular Expression deficiency? robic0 Perl Misc 15 12-29-2005 10:10 PM
Deficiency in urllib/socket for https? Gary Feldman Python 4 08-23-2003 12:04 AM
distutils deficiency PenguinOfDoom Python 0 06-24-2003 11:52 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57