Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Finding and Replacing Substrings In A String

Reply
Thread Tools

Finding and Replacing Substrings In A String

 
 
DarthBob88
Guest
Posts: n/a
 
      09-23-2007
I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?

 
Reply With Quote
 
 
 
 
Malcolm McLean
Guest
Posts: n/a
 
      09-23-2007

"DarthBob88" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...
>I have to go through a file and replace any occurrences of a given
> string with the desired string, like replacing "bug" with "feature".
> This is made more complicated by the fact that I have to do this with
> a lot of replacements and by the fact that some of the target strings
> are two words or more long, so I can't just break up the file at
> whitespace, commas, and periods. How's the best way to do this? I've
> thought about using strstr() to find the string and strncpy() to
> replace it, but it occurs to me that it would screw up the string to
> overwrite part of it with strncpy(). How should I do this?
>

You'll make life a lot easier for yourself if you can specify that the
search string cannot contain newlines.

Load each line. Call strstr() repeatedly to count the number of ocurrences
of each target string. Then calculate how much extra memory is required.

(You need to think what happens if one search string is a substring of
another, or contains an overlap)

Allocate another buffer of the right length, not forgetting the terminal
nul. Then do a search and replace. Probably the easiest way to do this is to
have two buffers, search one and replace into the other, iteratively until
you have done all the targets.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

 
Reply With Quote
 
 
 
 
Richard Heathfield
Guest
Posts: n/a
 
      09-23-2007
Malcolm McLean said:

<snip>

> You'll make life a lot easier for yourself if you can specify that the
> search string cannot contain newlines.


This is not in fact necessary. If you're prepared to shift stuff around in
memory a fair bit, all you need is a source buffer twice the size of the
needle. Search for the needle; if you find it, copy everything up to but
not including it to a temporary file, write the replacement needle to the
file, and then move all the subsequent contents of the buffer (i.e. the
stuff following the needle) to its beginning, and replenish it from the
input file. (Newlines are merely more grist to the mill.)

If you *don't* find it, write the first half of the buffer to the temporary
file, and then shift the second half into the first half and replenish
from the input.

When the input is exhausted and you're sure the buffer contains no needles,
write the remainder to the temporary file. Then remove and rename in the
canonical fashion.

Depending on just how much data you've got, it might be worth investigating
the Boyer-Moore string searching algorithm, since native strstr
implementations can be a bit dumb.

<snip>

> (You need to think what happens if one search string is a substring of
> another, or contains an overlap)


Indeed.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
 
Reply With Quote
 
Army1987
Guest
Posts: n/a
 
      09-23-2007
On Sun, 23 Sep 2007 06:22:23 +0000, DarthBob88 wrote:

> I have to go through a file and replace any occurrences of a given
> string with the desired string, like replacing "bug" with "feature".
> This is made more complicated by the fact that I have to do this with
> a lot of replacements and by the fact that some of the target strings
> are two words or more long, so I can't just break up the file at
> whitespace, commas, and periods. How's the best way to do this? I've
> thought about using strstr() to find the string and strncpy() to
> replace it, but it occurs to me that it would screw up the string to
> overwrite part of it with strncpy(). How should I do this?

Try to memmove() the remainder of the string forward, like this:
"This is a bug. \n\0"
"feature" is four characters longer than "bug", so slide the part
of the string starting with the period four characters forward,
then memcpy() "feature" where the 'b' of "bug" was. Probably there
are better ways to do that, try asking in comp.programming.

e.g.
char str[1000] = "This is a bug. \n"
char *search = "bug";
char *replace = "feature";
size_t len = strlen(str);
size_t s_len = strlen(search);
size_t r_len = strlen(replace);
char *current = str;
while (current = strstr(current, search)) /*assignment*/ {
memmove(current + r_len , current + s_len,
len - (current - str) - s_len + 1);
memcpy(current, replace, r_len);
} /*not compiled, not tested. make sure there's enough space past
*the end of the string in str. */
--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

 
Reply With Quote
 
Willem
Guest
Posts: n/a
 
      09-23-2007
DarthBob88 wrote:
) I have to go through a file and replace any occurrences of a given
) string with the desired string, like replacing "bug" with "feature".
) This is made more complicated by the fact that I have to do this with
) a lot of replacements and by the fact that some of the target strings
) are two words or more long, so I can't just break up the file at
) whitespace, commas, and periods. How's the best way to do this? I've
) thought about using strstr() to find the string and strncpy() to
) replace it, but it occurs to me that it would screw up the string to
) overwrite part of it with strncpy(). How should I do this?

The Knuth-Morris-Pratt algorithm reads the charachers in the searched
string sequentially, one by one. So if you use that algo, you can quite
simply read from the file one char at a time, searching for a match.
Writing to the output should be fairly easy as well, just make sure you
only write characters when they are known to be a mismatch.

You'll have to rely on the system to make it I/O efficient.

After you've got it working, you can always optimize it by dropping in
a platform-specific I/O routine, if needed.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
Reply With Quote
 
Friedrich Dominicus
Guest
Posts: n/a
 
      09-23-2007
DarthBob88 <(E-Mail Removed)> writes:

> I have to go through a file and replace any occurrences of a given
> string with the desired string, like replacing "bug" with "feature".
> This is made more complicated by the fact that I have to do this with
> a lot of replacements and by the fact that some of the target strings
> are two words or more long, so I can't just break up the file at
> whitespace, commas, and periods. How's the best way to do this? I've
> thought about using strstr() to find the string and strncpy() to
> replace it, but it occurs to me that it would screw up the string to
> overwrite part of it with strncpy(). How should I do this?

Maybe it would be a good idea to look for a library for handling that
kind of stuff? Maybe some regular expresson libraries would come in
handy?

Regards
Friedrich

--
Please remove just-for-news- to reply via e-mail.
 
Reply With Quote
 
Army1987
Guest
Posts: n/a
 
      09-23-2007
On Sun, 23 Sep 2007 12:08:58 +0200, Army1987 wrote:
> char str[1000] = "This is a bug. \n"
> char *search = "bug";
> char *replace = "feature";
> size_t len = strlen(str);
> size_t s_len = strlen(search);
> size_t r_len = strlen(replace);
> char *current = str;
> while (current = strstr(current, search)) /*assignment*/ {
> memmove(current + r_len , current + s_len,
> len - (current - str) - s_len + 1);
> memcpy(current, replace, r_len);
> }

Finding two bugs and correcting them is left as an exercise.
(Hint: one of them only shows up when search is a substring of
replace.)
--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      09-23-2007
Army1987 <(E-Mail Removed)> writes:
> On Sun, 23 Sep 2007 06:22:23 +0000, DarthBob88 wrote:
>> I have to go through a file and replace any occurrences of a given
>> string with the desired string, like replacing "bug" with "feature".
>> This is made more complicated by the fact that I have to do this with
>> a lot of replacements and by the fact that some of the target strings
>> are two words or more long, so I can't just break up the file at
>> whitespace, commas, and periods. How's the best way to do this? I've
>> thought about using strstr() to find the string and strncpy() to
>> replace it, but it occurs to me that it would screw up the string to
>> overwrite part of it with strncpy(). How should I do this?

> Try to memmove() the remainder of the string forward, like this:
> "This is a bug. \n\0"
> "feature" is four characters longer than "bug", so slide the part
> of the string starting with the period four characters forward,
> then memcpy() "feature" where the 'b' of "bug" was. Probably there
> are better ways to do that, try asking in comp.programming.
>
> e.g.
> char str[1000] = "This is a bug. \n"
> char *search = "bug";
> char *replace = "feature";
> size_t len = strlen(str);
> size_t s_len = strlen(search);
> size_t r_len = strlen(replace);
> char *current = str;
> while (current = strstr(current, search)) /*assignment*/ {
> memmove(current + r_len , current + s_len,
> len - (current - str) - s_len + 1);
> memcpy(current, replace, r_len);
> } /*not compiled, not tested. make sure there's enough space past
> *the end of the string in str. */


You're copying the buffer (well, half of it on average) every time you
do a replacement.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Replacing substrings with images in GUI (parser) Karsten Wutzke Java 3 03-20-2008 12:49 AM
replacing substrings within strings amadain Python 11 02-14-2007 02:31 PM
Replacing large number of substrings Will McGugan Python 3 09-04-2005 10:17 PM
Replacing palindrome substrings of an input string with a given string Tung Chau C Programming 1 08-06-2004 07:27 PM
Replacing palindrome substrings of an input string with a given string. Any effecient algorithm? Tung Chau C Programming 0 08-06-2004 10:18 AM



Advertisments