Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Replace (remove) multiple chars from string.

Reply
Thread Tools

Replace (remove) multiple chars from string.

 
 
Fredrik
Guest
Posts: n/a
 
      10-13-2004
Im trying to remove unwanted chars from a string in order to make it
XML-compatible. The chars that should be avoided are:

[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
[#1FFFE-#x1FFFF], [#2FFFE-#x2FFFF], [#3FFFE-#x3FFFF],
[#4FFFE-#x4FFFF], [#5FFFE-#x5FFFF], [#6FFFE-#x6FFFF],
[#7FFFE-#x7FFFF], [#8FFFE-#x8FFFF], [#9FFFE-#x9FFFF],
[#AFFFE-#xAFFFF], [#BFFFE-#xBFFFF], [#CFFFE-#xCFFFF],
[#DFFFE-#xDFFFF], [#EFFFE-#xEFFFF], [#FFFFE-#xFFFFF],
[#10FFFE-#x10FFFF].

(http://www.w3.org/TR/REC-xml/#charsets)

I've been looking at replaceAll, charAt-loop, regex etc. Since I'm
going to process rather large strings, Im looking for the most
efficent replace-algorithm.
All suggestions are much appreciated.

(a perhaps related question: How can I use loop-generated \u escape
codes? ("\u00" + i) naturally results in a compiler error message,
since the escape code is incomplete)
 
Reply With Quote
 
 
 
 
Thomas Weidenfeller
Guest
Posts: n/a
 
      10-13-2004
Fredrik wrote:
> I've been looking at replaceAll, charAt-loop, regex etc. Since I'm
> going to process rather large strings, Im looking for the most
> efficent replace-algorithm.


A loop with a carefully crafted set of tests, lookup tables etc. to
verify each characters. And a StringBuilder or pre-allocated char[] to
place the valid characters into.

For the tests I would give the binary representation of the char codes
in the illegal ranges a good look. Maybe you can find bit patterns to
test for one or more ranges with simple binary operations.

> (a perhaps related question: How can I use loop-generated \u escape
> codes? ("\u00" + i) naturally results in a compiler error message,
> since the escape code is incomplete)


Why do you want to? Just cast the int to a char.

/Thomas
 
Reply With Quote
 
 
 
 
Fredrik
Guest
Posts: n/a
 
      10-14-2004
Char cast, why didn't i think of that...

I tried a set of nested if-else with consideration to char-frequency,
and it seems to work OK.

Thanks!
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to truncate char string fromt beginning and replace chars instring by other chars in C or C++? Hongyu C++ 9 08-08-2008 12:18 PM
Floats to chars and chars to floats Kosio C Programming 44 09-23-2005 09:49 AM
receiving ??? chars instead of "special" chars M.Posseth ASP .Net Web Services 3 11-16-2004 07:00 PM
Help: replace one char with more chars Hai Xu Perl 2 03-05-2004 08:18 PM
HELP: replace one char with more chars Hai Xu Perl 0 03-03-2004 12:52 AM



Advertisments