Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Substrings and so on

Reply
Thread Tools

Substrings and so on

 
 
Ersek, Laszlo
Guest
Posts: n/a
 
      01-26-2010
In article <(E-Mail Removed)>, Vicent <(E-Mail Removed)> writes:

> (5) Translate the second part (it is still a "string") into a number.
>
> - About #5 : I hope that a proper casting statement will be enough.


Please read an introductory book or tutorial on C, preferably one not
contradicting the ISO C standard(s). I hope others will name such works.
Reddit had a similar discussion recently. I obviously can't vouch for
the pieces of advice given there.

http://www.reddit.com/r/programming/...commend_for_a/


> So, do you think that C++ std::string and std:iostream classes are
> the right choice for me??


I don't know. For the stated purpose, in (not standard) C I'd likely use
fgets() with a 32,767 byte buffer, then call regexec() in order to
identify the trimmed parts via parenthesized subexpressions, then call
strtol() to convert the decimal sequence to a long int.

Cheers,
lacos
 
Reply With Quote
 
 
 
 
Ersek, Laszlo
Guest
Posts: n/a
 
      01-26-2010
In article <20100126191040.35310081@kubuntu>, Lorenzo Villari <(E-Mail Removed)> writes:
> On 26 Jan 2010 19:04:22 +0100
> http://www.velocityreviews.com/forums/(E-Mail Removed) (Ersek, Laszlo) wrote:
>
>>
>> Please elaborate.
>>
>> Thanks,
>> lacos

>
> C is not perfect I know that, but saying "C: don't start with it" in a
> newsgroup with this name, it sounds a bit strange to me. That's all.


Thanks for answering.

I love C (even though most of the time this love is unrequited). I
didn't intend to point out C's perceived "shortcomings" -- I hope not to
have an ego that big. I tried to signal that C (and especially
manipulation of character arrays for parsing purposes) might not be the
best choice for the *original poster*, following completely from what I
perceived to be the OP's understanding of C.

Someone advising against me operating a sawbench would be completely
justified. A sawbench is a wonderful tool. It's not the sawbench, it's
me. I should start with introductory woodworking lessons first.

(Yes, I just compared C to a sawbench, please forgive me. And for the
record, I can "operate" a hand saw.)

Cheers,
lacos
 
Reply With Quote
 
 
 
 
santosh
Guest
Posts: n/a
 
      01-26-2010
Vicent wrote:
[...]

> What I exactly need to do is the following:
>
> While there are still new lines:
> (1) Get one line from a given text file.
> (2) In that line, detect a "first" part and a "second part", which are
> separated by a "=" symbol.
> (3) Take away the possible "blanks" (like a "trim" function would do)
> from those parts.
> (4) Detect which variable in my program is being referred by the
> "first part".
> (5) Translate the second part (it is still a "string") into a number.
>
> - About #1 : It can be done by means of standard I/O C libraries. I
> guess that there are also ways to do it with C++ libraries.


Yes. For C, fgets() is the obvious choice, but if you want to read in
lines of arbitrary length, then you might have to write your own
function which uses dynamically allocated memory.

> - About #2 : It would be as simple as: detecting the position of "="
> and then get two substrings. I don't understand why this step is so
> difficult to perform in C!!!! I mean: there IS a C standard function
> for getting the position of a character (it is "strchr"), but not a
> function for substring (unless it is a substring that starts at
> position 1, which can be done with "strncpy_s"). Is it easier at C++??


Your point #2 is not clear. Do you simply need to locate the first
occurence of a '=' character? For that purpose strchr() would be fine.

[...]

> - About #5 : I hope that a proper casting statement will be enough.


Atleast for C, no. Casting is not appropriate. Depending on what type
of number the "string" represents (i.e., integer or real), you'll want
to use one of the strto*() family of functions, like strtol() strtoul
() & strtod() to name three.

Here's a good online reference to Standard C library functions (among
others):

<http://www.dinkumware.com/manuals/>

[...]
 
Reply With Quote
 
santosh
Guest
Posts: n/a
 
      01-26-2010
Ersek, Laszlo wrote:
> In article <(E-Mail Removed)>, Vicent <(E-Mail Removed)> writes:
>
> > (5) Translate the second part (it is still a "string") into a number.
> >
> > - About #5 : I hope that a proper casting statement will be enough.

>
> Please read an introductory book or tutorial on C, preferably one not
> contradicting the ISO C standard(s). I hope others will name such works.

[...]

One online tutorial for complete beginners might be the one by Steve
Summit:

<http://www.eskimo.com/~scs/cclass/cclass.html>

Since Mr. Summit was apparently involved in the standardisation
process of C90, one might trust his tutorial not to contradict
Standard C.

[...]
 
Reply With Quote
 
Robert Latest
Guest
Posts: n/a
 
      01-26-2010
Vicent wrote:
> On 26 ene, 15:20, (E-Mail Removed) (Ersek, Laszlo) wrote:
>> For C: don't start with it. Low-level string manipulation is one of the
>> most error-prone tasks in general, leading to countless security
>> vulnerabilities.


Depends. It can be fun and educative. See below.

>
>
> What I exactly need to do is the following:
>
> While there are still new lines:
> (1) Get one line from a given text file.


Use fgets() in a while loop.


> (2) In that line, detect a "first" part and a "second part", which are
> separated by a "=" symbol.
> (3) Take away the possible "blanks" (like a "trim" function would do)
> from those parts.


That's the fun part. You need a few simple loops to do this. I used to
do a lot of those string-walking exercises, so I just typed this into
the newsreader untested. It really helps you develop a sense of what
goes on behind the curtain.

for (p = buffer; *p && isspace(*p); ++p) ; /* skip initial WS */
first_part = p; /* save pointer */
for (; *p && *p != '='; ++p) ; /* find '=' */
for (q = p+1; *q && isspace(*q); ++q) ; /* skip more WS */
second_part = q; /* save pointer */
for (--p; isspace(*p); --p) ; /* skip trailing WS */
*(p+1) = 0; /* mark end of 1st */
for (p = second_part; *p; ++p) ; /* find \0 char */
for (--p; isspace(*p); --p) ; /* skip trailing WS */

now first_part and second_part should be nicely trimmed, NUL-terminated
C strings. This thing will probably segfault when fed invalid strings,
so some input validity checks are in order. This method can be driven to
the extreme; the nice thing is that everything happens in a single chunk
of memory ('buffer') which gets pointed into and peppered with zeroes.

If your first and second part can't contain whitespace, it boils down to
a sscanf() one-liner:

#include <stdio.h>
int main(void)
{
char *str = "abcd = 100 "; /* test string */
char first[20];
int second;
int r;

r = sscanf(str, " %[^ =] = %d", first, &second);
if (r == 2) {
printf("%s=%d\n", first, second);
} else {
fprintf(stderr, "Couldn't parse string (r=%d)\n", r);
}
return 0;
}

It would be wise to check for the position of the '=' sign first to make
sure that the buffer 'first' doesn't overflow.

> (4) Detect which variable in my program is being referred by the
> "first part".


A bsearch()-based solution comes to mind

> (5) Translate the second part (it is still a "string") into a number.


strtol(), or automatically done by sscanf()

> - About #2 : It would be as simple as: detecting the position of "="
> and then get two substrings. I don't understand why this step is so
> difficult to perform in C!!!!


> I mean: there IS a C standard function
> for getting the position of a character (it is "strchr"), but not a
> function for substring


strtok() can also be your friend. For index-based substrings, use
strdup() and pointer arithmetics. All one-liners.

> So, do you think that C++ std::string and std:iostream classes are
> the right choice for me??


It really depends on what the rest of your application does. If breaking
up a string into two parts overwhelms you complexity-wise, it probably
does very little.

That said, I nowadays greatly prefer Python over C for many things,
although I enjoy coding in C more. Especially when dealing with
undefined input, the necessary overhead of error-checking and -handling
in C (and C++) can be bothersome.

robert
 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      01-26-2010
Vicent <(E-Mail Removed)> writes:
>About reading data from a text file, I think this is called "parsing".


I am just teaching about binary trees in C. So I started with:

struct tree { struct tree * left; int value; struct tree * right; };

To print a tree:

void print( struct tree const * const tree )
{ if( tree ){ putchar( '(' ); print( tree->left );
putchar( '0' + tree->value ); print( tree->right ); putchar( ')' ); }}

(The code is simplified insofar as it assumes one-digit
numbers only.)

An example output is:

(((0)1(2))3(4))

For the tree

3
/ \
/ \
1 4
/ \
/ \
0 2

Now, how do we parse this in again?

Two steps:

1.) Write a grammar:

<number> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'.

<entry> ::= '(' <tree> <number> <tree> ')'.

<tree> ::= [<entry>].

2.) Write the parser in analogy with the grammar:

int number( void ){ return get( 1 )- '0'; }

struct tree * entry( void )
{ TREE left, right; int value;
get( '(' );
left = tree();
value = number();
right = tree();
get( ')' );
return newtree( left, value, right ); }

struct tree * tree( void )
{ return get( 0 )== '(' ? entry() : 0; }

This assume a »get« function that will return the current
character from the source and advances to the next character
when called with any non-zero argument. The code is
simplified insofar as it does not handle any run-time errors.

Thus, we are now able to round-trip serialize (write) and
de-serialize (read) binary tries with essentially 14 lines
of C code.

.------------------------------------------------------.
| Now, observe that during the whole serialization and |
| de-serialization we create and process strings of |
| symbols, but never actually build a 0-terminated |
| C-string in memory! |
'------------------------------------------------------'

I thought it would be nice if a tree in the source code
also would look like a tree. So the above tree

3
/ \
/ \
1 4
/ \
/ \
0 2

is being defined using:

extern struct tree t1, t0, t2, t4; struct tree t3 =
{ &t1, 3, &t4 },

t1 ={ &t0, 1, &t2 }, t4 ={ 0, 4, 0 },


t0 ={ 0, 0, 0 }, t2 ={ 0, 2, 0 };


 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      01-26-2010
http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de (Stefan Ram) writes:
>{ TREE left, right; int value;


Oops, this should read:

{ struct tree *left, *right; int value;

 
Reply With Quote
 
Ersek, Laszlo
Guest
Posts: n/a
 
      01-26-2010
(Not to contradict, but to complement.)

In article <(E-Mail Removed)-berlin.de>, Robert Latest <(E-Mail Removed)> writes:

> If your first and second part can't contain whitespace, it boils down to
> a sscanf() one-liner:
>
> #include <stdio.h>
> int main(void)
> {
> char *str = "abcd = 100 "; /* test string */
> char first[20];
> int second;
> int r;
>
> r = sscanf(str, " %[^ =] = %d", first, &second);
> if (r == 2) {
> printf("%s=%d\n", first, second);
> } else {
> fprintf(stderr, "Couldn't parse string (r=%d)\n", r);
> }
> return 0;
> }
>
> It would be wise to check for the position of the '=' sign first to make
> sure that the buffer 'first' doesn't overflow.


[...]

>> (5) Translate the second part (it is still a "string") into a number.

>
> strtol(), or automatically done by sscanf()


abcd=999999999999999999999999999999999999999999999 99999999999

%d -> implementation-defined behavior ("signed overflow")

%u -> silent truncation ("unsigned overflow")

%*d -> assignment suppressed, not applicable here

%9ld -> file position indicator will advance until after the ninth nine
(I think), the stored long int value (999,999,999) won't reflect the
actual decimal string, a matching failure will follow only in the next
cycle. Full range of long int not available to decimal strings.
Magnitude of smallest negative value is about one tenth of the greatest
positive value.

strtol() is better.

When writing my previous post in the thread, I've tried to create a
scanf() format string that (a) relies only on completely defined
behavior, (b) is correct: parses what the OP needs (pre-set limits on
the lengths of the trimmed parts are allowed), (c) is complete: refuses
anything else. I gave up after a while and decided to wait for other
submissions and try to break them, or if I can't, learn from them.

Cheers,
lacos
 
Reply With Quote
 
Ersek, Laszlo
Guest
Posts: n/a
 
      01-26-2010
In article <9b0HOlwZ28YG@ludens>, (E-Mail Removed) (Ersek, Laszlo) writes:

> abcd=999999999999999999999999999999999999999999999 99999999999
>
> %d -> implementation-defined behavior ("signed overflow")


I apologize, that would hold for a conversion from eg. unsigned int; the
fscanf() spec says (C99 7.19.6.2 The fscanf function, p10):

----v----
Unless assignment suppression was indicated by a *, the result of the
conversion is placed in the object pointed to by the first argument
following the format argument that has not already received a conversion
result. If this object does not have an appropriate type, or if the
result of the conversion cannot be represented in the object, the
behavior is undefined.
----^----

See also

http://groups.google.com/group/comp....0a797a716cf74a

Cheers,
lacos
 
Reply With Quote
 
Ersek, Laszlo
Guest
Posts: n/a
 
      01-26-2010
In article <(E-Mail Removed)>, santosh <(E-Mail Removed)> writes:

> One online tutorial for complete beginners might be the one by Steve
> Summit:
>
> <http://www.eskimo.com/~scs/cclass/cclass.html>
>
> Since Mr. Summit was apparently involved in the standardisation
> process of C90, one might trust his tutorial not to contradict
> Standard C.


Bookmarked, thank you!
lacos
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Again substrings and so on Vicent Giner-Bosch C++ 3 01-26-2010 06:16 PM
Removing duplicates and substrings from an array Sam Larbi Ruby 10 11-28-2007 10:32 PM
Finding and Replacing Substrings In A String DarthBob88 C Programming 7 09-23-2007 03:14 PM
Char strings, pointers and substrings. Lawrie C Programming 8 04-07-2005 06:38 PM
Binary files, substrings and (un)packing. Leandro Pardini Perl 1 10-27-2003 07:57 PM



Advertisments