Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > substring

Reply
Thread Tools

substring

 
 
Richard Heathfield
Guest
Posts: n/a
 
      11-02-2003
Tristan Miller wrote:

> Does the C standard use the term "string" to refer to both nul-
> and non-nul-terminated strings, or does it make a nomenclatural
> distinction between "string" (which always includes the sentinel) and the
> more general "array-of-char"?


The C Standard defines a string to be an array of characters terminated by
the first null character. If you want the exact wording, I'm sure someone
can oblige.

--
Richard Heathfield : http://www.velocityreviews.com/forums/(E-Mail Removed)
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton
 
Reply With Quote
 
 
 
 
Mark McIntyre
Guest
Posts: n/a
 
      11-02-2003
On Sun, 02 Nov 2003 21:51:51 +0100, in comp.lang.c , Tristan Miller
<(E-Mail Removed)> wrote:

>cases. Does the C standard use the term "string" to refer to both nul- and
>non-nul-terminated strings


C defines a string as
7.1.1 (1) A string is a contiguous sequence of characters terminated
by and including the first null character.



--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>


----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
 
Reply With Quote
 
 
 
 
pete
Guest
Posts: n/a
 
      11-03-2003
Richard Heathfield wrote:
>
> Tristan Miller wrote:
>
> > Does the C standard use the term "string" to refer to both nul-
> > and non-nul-terminated strings, or does it make a nomenclatural
> > distinction between "string"
> > (which always includes the sentinel) and the
> > more general "array-of-char"?

>
> The C Standard defines a string to be an array
> of characters terminated by the first null character.


The word "array" is conspicuously absent from the
standard definition of string.
If a program can determine whether or not two seperately
allocated objects are contigous,
then a string may span them if they are contiguous.
Also, an array may contain several strings.

> If you want the exact wording, I'm sure someone
> can oblige.


Here's the 411 from the C89 last public draft:
4. LIBRARY
4.1 INTRODUCTION
4.1.1 Definitions of terms

A string is a contiguous sequence of characters terminated by and
including the first null character. It is represented by a pointer to
its initial (lowest addressed) character and its length is the number
of characters preceding the null character.

--
pete
 
Reply With Quote
 
Thomas Stegen
Guest
Posts: n/a
 
      11-03-2003
pete wrote:

> Richard Heathfield wrote:


>>The C Standard defines a string to be an array
>>of characters terminated by the first null character.

>
>
> The word "array" is conspicuously absent from the
> standard definition of string.
> If a program can determine whether or not two seperately
> allocated objects are contigous,
> then a string may span them if they are contiguous.


It would be a useless string though since neither pointer arithmetic
or the array index operator is defined for accessing these things.

Lets assume that the following two arrays are contigous in memory.
(s2 follows s1 directly).

char s1[3] = "123";
char s2[4] = "456";

You cannot do this:

int i = 0;
while(s1[i] != '\0')
putchar(s1[i++]);

Since it will cause UB. Same arguments applies to pointers (since the
above really operates on pointers anyway).

I am not so sure if the same goes for arguments to library functions.
But the whole string will be accessed by the same pointer so I would
say you cannot do the above. The avoidance of the word array in the
standard might be to allow string literals to be strings proper.

So s1 and s2 constitutes a singe string but you will have to treat the
arrays separately and cannot draw any benefit from them being contigous.

--
Thomas.

 
Reply With Quote
 
pete
Guest
Posts: n/a
 
      11-04-2003
Thomas Stegen wrote:
>
> pete wrote:
>
> > Richard Heathfield wrote:

>
> >>The C Standard defines a string to be an array
> >>of characters terminated by the first null character.

> >
> >
> > The word "array" is conspicuously absent from the
> > standard definition of string.
> > If a program can determine whether or not two seperately
> > allocated objects are contigous,
> > then a string may span them if they are contiguous.

>
> It would be a useless string


It's just some C trivia.

> though since neither pointer arithmetic
> or the array index operator is defined for accessing these things.
>
> Lets assume that the following two arrays are contigous in memory.
> (s2 follows s1 directly).
>
> char s1[3] = "123";
> char s2[4] = "456";
>
> You cannot do this:
>
> int i = 0;
> while(s1[i] != '\0')
> putchar(s1[i++]);
>
> Since it will cause UB. Same arguments applies to pointers (since the
> above really operates on pointers anyway).
> I am not so sure if the same goes for arguments to library functions.


If the arrays are shown to be contiguous,
then you can have
puts(s1);

--
pete
 
Reply With Quote
 
Richard Bos
Guest
Posts: n/a
 
      11-04-2003
pete <(E-Mail Removed)> wrote:

> Thomas Stegen wrote:
>
> > char s1[3] = "123";
> > char s2[4] = "456";
> >
> > You cannot do this:
> >
> > int i = 0;
> > while(s1[i] != '\0')
> > putchar(s1[i++]);
> >
> > Since it will cause UB.

>
> If the arrays are shown to be contiguous,
> then you can have
> puts(s1);


No, you can't. Any reference to s1[3] and above, including those
implicit in puts(s1), invoke undefined behaviour. Although it is true
that on many architectures this instance of UB will behave as if it is
defined, you cannot rely on that.

Richard
 
Reply With Quote
 
pete
Guest
Posts: n/a
 
      11-04-2003
Richard Bos wrote:
>
> pete <(E-Mail Removed)> wrote:
>
> > Thomas Stegen wrote:
> >
> > > char s1[3] = "123";
> > > char s2[4] = "456";
> > >
> > > You cannot do this:
> > >
> > > int i = 0;
> > > while(s1[i] != '\0')
> > > putchar(s1[i++]);
> > >
> > > Since it will cause UB.

> >
> > If the arrays are shown to be contiguous,
> > then you can have
> > puts(s1);

>
> No, you can't.


> Any reference to s1[3] and above, including those
> implicit in puts(s1), invoke undefined behaviour.


That doesn't matter.
If you give puts a pointer to a string,
then the behavior is defined.
How puts accomplishes the behavior, is up to the implementors.
If two objects are being spanned by a string,
and if puts doesn't want to index across them,
then puts may deal with the objects seperately,
or do it some other way.

--
pete
 
Reply With Quote
 
Richard Bos
Guest
Posts: n/a
 
      11-04-2003
pete <(E-Mail Removed)> wrote:

> Richard Bos wrote:
> >
> > pete <(E-Mail Removed)> wrote:
> >
> > > Thomas Stegen wrote:
> > >
> > > > char s1[3] = "123";
> > > > char s2[4] = "456";


> > > If the arrays are shown to be contiguous,
> > > then you can have
> > > puts(s1);


> > Any reference to s1[3] and above, including those
> > implicit in puts(s1), invoke undefined behaviour.

>
> That doesn't matter.


Yes, it does.

> If you give puts a pointer to a string,
> then the behavior is defined.


s1 is not a string. It may happen to look like one on your favourite
architecture, but that doesn't make it one.

Richard
 
Reply With Quote
 
pete
Guest
Posts: n/a
 
      11-04-2003
Richard Bos wrote:
>
> pete <(E-Mail Removed)> wrote:
>
> > Richard Bos wrote:
> > >
> > > pete <(E-Mail Removed)> wrote:
> > >
> > > > Thomas Stegen wrote:
> > > >
> > > > > char s1[3] = "123";
> > > > > char s2[4] = "456";

>
> > > > If the arrays are shown to be contiguous,
> > > > then you can have
> > > > puts(s1);

>
> > > Any reference to s1[3] and above, including those
> > > implicit in puts(s1), invoke undefined behaviour.

> >
> > That doesn't matter.

>
> Yes, it does.
>
> > If you give puts a pointer to a string,
> > then the behavior is defined.

>
> s1 is not a string. It may happen to look like one on your favourite
> architecture, but that doesn't make it one.


I believe we are only talking about cases where s1 and s3
are shown to be contiguous.
In that case s1, satisfies the defintion for "pointer to a string"

N869

7.1.1 Definitions of terms

[#1] A string is a contiguous sequence of characters
terminated by and including the first null character. The
term multibyte string is sometimes used instead to emphasize
special processing given to multibyte characters contained
in the string or to avoid confusion with a wide string. A
pointer to a string is a pointer to its initial (lowest
addressed) character.

--
pete
 
Reply With Quote
 
pete
Guest
Posts: n/a
 
      11-04-2003
pete wrote:
>
> Richard Bos wrote:
> >
> > pete <(E-Mail Removed)> wrote:
> >
> > > Richard Bos wrote:
> > > >
> > > > pete <(E-Mail Removed)> wrote:
> > > >
> > > > > Thomas Stegen wrote:
> > > > >
> > > > > > char s1[3] = "123";
> > > > > > char s2[4] = "456";

> >
> > > > > If the arrays are shown to be contiguous,
> > > > > then you can have
> > > > > puts(s1);

> >
> > > > Any reference to s1[3] and above, including those
> > > > implicit in puts(s1), invoke undefined behaviour.
> > >
> > > That doesn't matter.

> >
> > Yes, it does.
> >
> > > If you give puts a pointer to a string,
> > > then the behavior is defined.

> >
> > s1 is not a string. It may happen to look like one on your favourite
> > architecture, but that doesn't make it one.

>
> I believe we are only talking about cases where


> s1 and s3


That should be "s1 and s2".

> are shown to be contiguous.
> In that case s1, satisfies the defintion for "pointer to a string"
>
> N869
>
> 7.1.1 Definitions of terms
>
> [#1] A string is a contiguous sequence of characters
> terminated by and including the first null character. The
> term multibyte string is sometimes used instead to emphasize
> special processing given to multibyte characters contained
> in the string or to avoid confusion with a wide string. A
> pointer to a string is a pointer to its initial (lowest
> addressed) character.


--
pete
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding a SubString within a String Badass Scotsman ASP .Net 2 03-31-2006 04:00 PM
RegEx search for a substring within a substring colinhumber@gmail.com Perl Misc 3 08-03-2005 04:29 PM
find if there is a given substring inside a string juli ASP .Net 3 12-06-2004 11:52 AM
web sddress substring =?Utf-8?B?Sm9u?= ASP .Net 2 11-30-2004 01:31 PM
"Substring" for images? How do I determine if an image is contained within another? Sean Java 0 05-04-2004 05:50 PM



Advertisments