Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > strsup - supplementary string functions

Reply
Thread Tools

strsup - supplementary string functions

 
 
JohnF
Guest
Posts: n/a
 
      01-25-2013
Malcolm McLean <> wrote:
> I'm putting together a little library of supplementary string functions,
> strsup.c.
>
> The functions are intended to be fairly short, and to operate on char *s.
> They should ideally be implementable as a single function which can be
> snipped and pasted.
>
> Unlike the string.h functions they can depend on malloc().
>
> The obvious first function is strdup().
>
> I've also got
>
> strcount - count the instances of character ch in str.
> trim - remove leadign and trailing whitespace from a string
> singlespace - repalce all runs of whitespace in a string with a single
> space character.
> split - return an array of fields, split on a delimiter character.
> compnumeric - compare two strings with embedded numbers.
> replace - replace all substrings with new passed string.
>
> Any ideas for more?


Some might be implemented as macros, especially the snipped-and-pasted
variety. Several snipped-and-pasted examples from my programs are below.
Some do little more than arg-checking, e.g., to avoid segfaulting if
used with a NULL ptr.
Want about ten zillion other ideas (some arguably a bit over-the-top)?
Look at the VMS documentation about lexical string functions in DCL
(DEC's shell DCL="digital command language"),
openvms.compaq.com/doc/73final/6489/6489pro_047.html#66_manipulatingstrings

/* ---
* macro to skip whitespace
* ------------------------ */
#define WHITESPACE " \t\n\r\f\v" /* skipped whitespace chars */
#define skipwhite(thisstr) if ( (thisstr) != NULL ) \
thisstr += strspn(thisstr,WHITESPACE)
/* ---
* macros to check if a string is empty
* ------------------------------------ */
#define isempty(s) ((s)==NULL?1*(s)=='\000'?1:0))
/* ---
* macro to strip leading and trailing whitespace
* ---------------------------------------------- */
#define trimwhite(thisstr) if ( (thisstr) != NULL ) { \
int thislen = strlen(thisstr); \
while ( --thislen >= 0 ) \
if ( isthischar((thisstr)[thislen],WHITESPACE) ) \
(thisstr)[thislen] = '\000'; \
else break; \
if ( (thislen = strspn((thisstr),WHITESPACE)) > 0 ) \
{strsqueeze((thisstr),thislen);} } else /*user adds ;*/
/* ---
* macro to remove all 'c' chars from s
* ------------------------------------ */
#define compress(s,c) if(!isempty(s)) /* remove embedded c's from s */ \
{ char *p; while((p=strchr((s),(c)))!=NULL) {strsqueeze(p,1);} } else
/* ---
* macro to strcpy(s,s+n) using memmove() (also works for negative n)
* ------------------------------------------------------------------ */
#define strsqueeze(s,n) if((n)!=0) { if(!isempty((s))) { \
int thislen3=strlen(s); \
if ((n) >= thislen3) *(s) = '\000'; \
else memmove((s),((s)+(n)),(1+thislen3-(n))); }} else /*user adds ;*/
/* ---
* macro to strncpy() n bytes and make sure it's null-terminated
* ------------------------------------------------------------- */
#define strninit(target,source,n) \
if( (target)!=NULL && (n)>=0 ) { \
char *thissource = (source); \
(target)[0] = '\000'; \
if ( (n)>0 && thissource!=NULL ) { \
strncpy((target),thissource,(n)); \
(target)[(n)] = '\000'; } }
/* ---
* macro to check for thischar inthisstr
* ------------------------------------- */
#define isthischar(thischar,inthisstr) \
( (thischar)!='\000' && *(inthisstr)!='\000' \
&& strchr(inthisstr,(thischar))!=(char *)NULL )
/* ---
* macro for last char of a string
* ------------------------------- */
#define lastchar(s) (isempty(s)?'\000':*((s)+(strlen(s)-1)))

--
John Forkosh ( mailto: where j=john and f=forkosh )
 
Reply With Quote
 
 
 
 
BartC
Guest
Posts: n/a
 
      01-25-2013
"Malcolm McLean" <> wrote in message
news:9a56dea9-ecaa-495d-ab28-...
> On Thursday, January 24, 2013 7:44:16 AM UTC, Dr Nick wrote:


>> Unless you change your mind and create (yet another, we've all done it)
>> structure based string type, here's a suggestion: have a very clear
>> convention in the naming as to which of your functions return the
>> original string, perhaps modified (convert case, trim the
>> right-hand-end) and which return a new string. In fact, think hard
>> about this all over (should replace always create a new string, create a
>> new string when lengthening but not shortening, or realloc when
>> necessary?).
>>

> The function will all take nul-terminated strings passed as char *s or
> const char *s. The idea is to write functions that should have been in
> string.h but weren't, either because of perceived need, or because
> string.h
> isn't allowed to depend on malloc.
>
> I think it's probably best to return a malloced char * where the return
> may be larger than the input.


How will the caller know that? And in the case of a trim function, the
result will never be bigger than the input, but the output may need to be
stored elsewhere because of the need to zero-terminate a sub-string of the
input.

As was mentioned, there are just too many ways of dealing with such
functions: results can be in-place, in a caller-supplied destination, or in
allocated memory.

Sometimes also, the caller has useful length information for a string, but
the standard library doesn't often have no way to impart that information to
the function; that would be handy to be able to do.

For ideas about new string functions, I mainly use the following:

o Convert a string to upper/lower case (some Cs will have this already). I
also allow just the first N characters to be modified.

o Return the leftmost or rightmost N characters. When N is negative, then
all *except* the abs(N) rightmost or leftmost characters are returned. When
N is longer than the string, then it's padded with spaces, or a
caller-supplied fill character (or string). (Some of this can be done, with
a bit more trouble, with sprintf().)

o Split or join strings as I think you've already mentioned.

o Several functions to deal with filespecs: extract a path, filename,
basefile, or extension; or to change or add an extension to a filespec.

--
Bartc

 
Reply With Quote
 
 
 
 
88888 Dihedral
Guest
Posts: n/a
 
      01-25-2013
在 2013年1月25日星期五UTC+8下午6时57分27秒 ,JohnF写道:
> Malcolm McLean <> wrote:
>
> > I'm putting together a little library of supplementary string functions,

>
> > strsup.c.

>
> >

>
> > The functions are intended to be fairly short, and to operate on char *s.

>
> > They should ideally be implementable as a single function which can be

>
> > snipped and pasted.

>
> >

>
> > Unlike the string.h functions they can depend on malloc().

>
> >

>
> > The obvious first function is strdup().

>
> >

>
> > I've also got

>
> >

>
> > strcount - count the instances of character ch in str.

>
> > trim - remove leadign and trailing whitespace from a string

>
> > singlespace - repalce all runs of whitespace in a string with a single

>
> > space character.

>
> > split - return an array of fields, split on a delimiter character.

>
> > compnumeric - compare two strings with embedded numbers.

>
> > replace - replace all substrings with new passed string.

>
> >

>
> > Any ideas for more?

>
>
>
> Some might be implemented as macros, especially the snipped-and-pasted
>
> variety. Several snipped-and-pasted examples from my programs are below.
>
> Some do little more than arg-checking, e.g., to avoid segfaulting if
>
> used with a NULL ptr.
>
> Want about ten zillion other ideas (some arguably a bit over-the-top)?
>
> Look at the VMS documentation about lexical string functions in DCL
>
> (DEC's shell DCL="digital command language"),
>
> openvms.compaq.com/doc/73final/6489/6489pro_047.html#66_manipulatingstrings
>
>
>
> /* ---
>
> * macro to skip whitespace
>
> * ------------------------ */
>
> #define WHITESPACE " \t\n\r\f\v" /* skipped whitespace chars */
>
> #define skipwhite(thisstr) if ( (thisstr) != NULL ) \
>
> thisstr += strspn(thisstr,WHITESPACE)
>
> /* ---
>
> * macros to check if a string is empty
>
> * ------------------------------------ */
>
> #define isempty(s) ((s)==NULL?1*(s)=='\000'?1:0))
>
> /* ---
>

I have to say one subtle point about
the type char *str=NULL in C.

str=NULL; // not even allocated

char str2[10]; //10 bytes alloacted
str2[0]='\0' ; // string length is zero

// But I work out my own string library before
// by my own format to deal with the BIG5 encoding.

> * macro to strip leading and trailing whitespace
>
> * ---------------------------------------------- */
>
> #define trimwhite(thisstr) if ( (thisstr) != NULL ) { \
>
> int thislen = strlen(thisstr); \
>
> while ( --thislen >= 0 ) \
>
> if ( isthischar((thisstr)[thislen],WHITESPACE) ) \
>
> (thisstr)[thislen] = '\000'; \
>
> else break; \
>
> if ( (thislen = strspn((thisstr),WHITESPACE)) > 0 ) \
>
> {strsqueeze((thisstr),thislen);} } else /*user adds ;*/
>
> /* ---
>
> * macro to remove all 'c' chars from s
>
> * ------------------------------------ */
>
> #define compress(s,c) if(!isempty(s)) /* remove embedded c's from s */ \
>
> { char *p; while((p=strchr((s),(c)))!=NULL) {strsqueeze(p,1);} } else
>
> /* ---
>
> * macro to strcpy(s,s+n) using memmove() (also works for negative n)
>
> * ------------------------------------------------------------------ */
>
> #define strsqueeze(s,n) if((n)!=0) { if(!isempty((s))) { \
>
> int thislen3=strlen(s); \
>
> if ((n) >= thislen3) *(s) = '\000'; \
>
> else memmove((s),((s)+(n)),(1+thislen3-(n))); }} else /*user adds;*/
>
> /* ---
>
> * macro to strncpy() n bytes and make sure it's null-terminated
>
> * ------------------------------------------------------------- */
>
> #define strninit(target,source,n) \
>
> if( (target)!=NULL && (n)>=0 ) { \
>
> char *thissource = (source); \
>
> (target)[0] = '\000'; \
>
> if ( (n)>0 && thissource!=NULL ) { \
>
> strncpy((target),thissource,(n)); \
>
> (target)[(n)] = '\000'; } }
>
> /* ---
>
> * macro to check for thischar inthisstr
>
> * ------------------------------------- */
>
> #define isthischar(thischar,inthisstr) \
>
> ( (thischar)!='\000' && *(inthisstr)!='\000' \
>
> && strchr(inthisstr,(thischar))!=(char *)NULL )
>
> /* ---
>
> * macro for last char of a string
>
> * ------------------------------- */
>
> #define lastchar(s) (isempty(s)?'\000':*((s)+(strlen(s)-1)))
>
>
>
> --
>
> John Forkosh ( mailto: where j=john and f=forkosh )


 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      01-28-2013
On Friday, January 25, 2013 6:27:16 PM UTC, Bart wrote:
> "Malcolm McLean" <> wrote in message
>
>
> As was mentioned, there are just too many ways of dealing with such
> functions: results can be in-place, in a caller-supplied destination, or in
> allocated memory.
>

That's one big issue.

trim() could be reasonably an in-place trim function, one that took a const
char * and a buffer (with length supplied or not supplied), or return an
allocated pointer.
I'd say the first option is best because of the way the function is likely
to be used. No-one is going to want to pass it a string literal, and only
rarely will you want both a trimmed string and the original retained.
But with replace() you can't easily calculate the output size before calling
it, and it can't be in-place as it's as likely to expand as to shrink the
buffer.

--
Visit Malcolm's website
http://www.malcolmmclean.site11.com
 
Reply With Quote
 
ssmitch@gmail.com
Guest
Posts: n/a
 
      01-29-2013
On Tuesday, January 22, 2013 1:14:18 PM UTC-5, Malcolm McLean wrote:
> I'm putting together a little library of supplementary string functions, strsup.c. The functions are intended to be fairly short, and to operate on char *s. They should ideally be implementable as a single function which can be snipped and pasted. Unlike the string.h functions they can depend on malloc(). The obvious first function is strdup(). I've also got strcount - count the instances of character ch in str. trim - remove leadign and trailing whitespace from a string singlespace - repalce all runs of whitespace ina string with a single space character. split - return an array of fields,split on a delimiter character. compnumeric - compare two strings with embedded numbers. replace - replace all substrings with new passed string. Anyideas for more?


Besides the obvious candidates, I've also found useful on a number of occasions a function to strip unwanted characters (such as commas or dollar signs in numeric values) from a string before further processing. My own version is simply called strip(), but for a library you would probably want to rename it:

/*
* remove unwanted characters from string
*
* call as strip(char *str, *unwanted)
*
* where "str" is the string to process and "unwanted" is a null-
* terminated string containing the characters to be stripped.
*
* returns pointer to modified string
*/

char *strip(char *str, char *unwanted) {
char *cp, *savptr;
savptr = str;
cp = str - 1;
while (*++cp = *str++)
if (strchr(unwanted, *cp) != NULL)
--cp;
return savptr;
{

 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      01-29-2013
On Tuesday, January 29, 2013 4:17:22 PM UTC, ssm...@gmail.com wrote:
> On Tuesday, January 22, 2013 1:14:18 PM UTC-5, Malcolm McLean wrote:
>
> My own version is simply called strip(), but for a library you would probably want to rename it:
>

It's got the str prefix which indicates a standard library string function.

 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      01-30-2013
writes:
<snip>
> Besides the obvious candidates, I've also found useful on a number of
> occasions a function to strip unwanted characters (such as commas or
> dollar signs in numeric values) from a string before further
> processing. My own version is simply called strip(), but for a
> library you would probably want to rename it:
>
> /*
> * remove unwanted characters from string
> *
> * call as strip(char *str, *unwanted)
> *
> * where "str" is the string to process and "unwanted" is a null-
> * terminated string containing the characters to be stripped.
> *
> * returns pointer to modified string
> */
>
> char *strip(char *str, char *unwanted) {
> char *cp, *savptr;
> savptr = str;
> cp = str - 1;


That's, technically, problematic. If str points to that start of an
array, the standard does not permit you to form the pointer str - 1,
even if you do nothing with it!

> while (*++cp = *str++)
> if (strchr(unwanted, *cp) != NULL)
> --cp;


I think the fix is simpler than the original:

cp = str;
while (*cp = *str++)
if (strchr(unwanted, *cp) == NULL)
cp++;

I such cases I tend to write:

while (*cp = *str++)
cp += strchr(unwanted, *cp) == NULL;

but similar things have caused me to accused of all sorts of barbarism,
so I won't suggest you do likewise!

> return savptr;
> {


} I think.


--
Ben.
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      01-31-2013
Ben Bacarisse <> writes:

> writes:
> <snip>
>> Besides the obvious candidates, I've also found useful on a number of
>> occasions a function to strip unwanted characters (such as commas or
>> dollar signs in numeric values) from a string before further
>> processing. My own version is simply called strip(), but for a
>> library you would probably want to rename it:
>>
>> /*
>> * remove unwanted characters from string
>> *
>> * call as strip(char *str, *unwanted)
>> *
>> * where "str" is the string to process and "unwanted" is a null-
>> * terminated string containing the characters to be stripped.
>> *
>> * returns pointer to modified string
>> */
>>
>> char *strip(char *str, char *unwanted) {
>> char *cp, *savptr;
>> savptr = str;
>> cp = str - 1;

>
> That's, technically, problematic. If str points to that start of
> an array, the standard does not permit you to form the pointer
> str - 1, even if you do nothing with it!
>
>> while (*++cp = *str++)
>> if (strchr(unwanted, *cp) != NULL)
>> --cp;

>
> I think the fix is simpler than the original:
>
> cp = str;
> while (*cp = *str++)
> if (strchr(unwanted, *cp) == NULL)
> cp++;
>
> I such cases I tend to write:
>
> while (*cp = *str++)
> cp += strchr(unwanted, *cp) == NULL;
>
> but similar things have caused me to accused of all sorts of
> barbarism, so I won't suggest you do likewise!


I was inspired by your examples to look for a short and simple
implementation. I came up with this:

char *
eliminate( char *to_shrink, const char *unwanted ){
char *p = to_shrink, *q = p;
do q += strspn( q, unwanted ); while( *p++ = *q++ );
return to_shrink;
}

I think it's easy to see that all 'unwanted' bytes are skipped
and only values not in 'unwanted' are copied.

And now let the barbarism accusers say what they will!
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      02-02-2013
Tim Rentsch <> writes:

> Ben Bacarisse <> writes:
>
>> writes:
>> <snip>
>>> Besides the obvious candidates, I've also found useful on a number of
>>> occasions a function to strip unwanted characters (such as commas or
>>> dollar signs in numeric values) from a string before further
>>> processing. My own version is simply called strip(), but for a
>>> library you would probably want to rename it:
>>>
>>> /*
>>> * remove unwanted characters from string
>>> *
>>> * call as strip(char *str, *unwanted)
>>> *
>>> * where "str" is the string to process and "unwanted" is a null-
>>> * terminated string containing the characters to be stripped.
>>> *
>>> * returns pointer to modified string
>>> */
>>>
>>> char *strip(char *str, char *unwanted) {
>>> char *cp, *savptr;
>>> savptr = str;
>>> cp = str - 1;

>>
>> That's, technically, problematic. If str points to that start of
>> an array, the standard does not permit you to form the pointer
>> str - 1, even if you do nothing with it!
>>
>>> while (*++cp = *str++)
>>> if (strchr(unwanted, *cp) != NULL)
>>> --cp;

>>
>> I think the fix is simpler than the original:
>>
>> cp = str;
>> while (*cp = *str++)
>> if (strchr(unwanted, *cp) == NULL)
>> cp++;
>>
>> I such cases I tend to write:
>>
>> while (*cp = *str++)
>> cp += strchr(unwanted, *cp) == NULL;
>>
>> but similar things have caused me to accused of all sorts of
>> barbarism, so I won't suggest you do likewise!

>
> I was inspired by your examples to look for a short and simple
> implementation. I came up with this:


Well, I was suggesting a simple fix rather than a simple alternative.

> char *
> eliminate( char *to_shrink, const char *unwanted ){
> char *p = to_shrink, *q = p;
> do q += strspn( q, unwanted ); while( *p++ = *q++ );
> return to_shrink;
> }
>
> I think it's easy to see that all 'unwanted' bytes are skipped
> and only values not in 'unwanted' are copied.
>
> And now let the barbarism accusers say what they will!


That's nice (expect for the layout!) and I don't think there is any
barbarism involved (where could it be?).

--
Ben.
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      02-03-2013
Ben Bacarisse <> writes:

> Tim Rentsch <> writes:
>
>> Ben Bacarisse <> writes:
>>>> [.. discussing a function to remove unwanted characters from
>>>> a string ..]
>>>
>>> I think the fix is simpler than the original:
>>> cp = str;
>>> while (*cp = *str++)
>>> if (strchr(unwanted, *cp) == NULL)
>>> cp++;
>>>
>>> I such cases I tend to write:
>>>
>>> while (*cp = *str++)
>>> cp += strchr(unwanted, *cp) == NULL;
>>>
>>> but similar things have caused me to accused of all sorts of
>>> barbarism, so I won't suggest you do likewise!

>>
>> I was inspired by your examples to look for a short and simple
>> implementation. I came up with this:

>
> Well, I was suggesting a simple fix rather than a simple
> alternative.


Right. I didn't mean to imply anything different.

>> char *
>> eliminate( char *to_shrink, const char *unwanted ){
>> char *p = to_shrink, *q = p;
>> do q += strspn( q, unwanted ); while( *p++ = *q++ );
>> return to_shrink;
>> }
>>
>> [snip]

>
> That's nice (expect for the layout!) [snip]


Those who find the single-line do/while unattractive might
prefer this instead:

while( q += strspn( q, unwanted ), *p++ = *q++ ) {}
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[RFC] PEP 3143: supplementary group list concerns Jan Pokorn Python 1 03-11-2012 11:41 PM
using the string functions (ex. find()) on a multi-symbol string korean_dave Python 2 06-17-2008 10:12 PM
please help me in distinguish redefining functions, overloading functions and overriding functions. Xiangliang Meng C++ 1 06-21-2004 03:11 AM
supplementary C frequent answers Ben Pfaff C Programming 26 01-05-2004 09:22 PM
Supplementary groups Andrew Walrond Ruby 0 11-20-2003 12:18 AM



Advertisments