Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Remove punctuation from String?

Reply
Thread Tools

Remove punctuation from String?

 
 
dfhLASST
Guest
Posts: n/a
 
      11-11-2004
What is the best way to remove all non-alphabetic characters (e.g. symbols,
spaces etc.) from a String?

My original plan was to loop round the chars in the String and add them to
an array if the value of the chars are alphabetic (i.e. >=65 and <=122).
I've ran into problems with this and it seems more complex than the problem
should be.

Any suggestions?


 
Reply With Quote
 
 
 
 
Michael Borgwardt
Guest
Posts: n/a
 
      11-11-2004
dfhLASST wrote:
> What is the best way to remove all non-alphabetic characters (e.g. symbols,
> spaces etc.) from a String?
>
> My original plan was to loop round the chars in the String and add them to
> an array if the value of the chars are alphabetic (i.e. >=65 and <=122).
> I've ran into problems with this and it seems more complex than the problem
> should be.


The problem is more complex than you think. Are you absolutely sure than
you're only ever going to process English text? If not, use Character.isLetter()
for the condition.

For the accumulation of the output string, use StringBuffer (I gess that's
where you encountered obvious problems).
 
Reply With Quote
 
 
 
 
Chris Smith
Guest
Posts: n/a
 
      11-11-2004
dfhLASST wrote:
> What is the best way to remove all non-alphabetic characters (e.g. symbols,
> spaces etc.) from a String?
>
> My original plan was to loop round the chars in the String and add them to
> an array if the value of the chars are alphabetic (i.e. >=65 and <=122).
> I've ran into problems with this and it seems more complex than the problem
> should be.
>
> Any suggestions?


str = str.replaceAll("[^A-Za-z]", "");

or, if you want more than just ASCII characters:

str = str.replaceAll("[^\\p{L}]", "");

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Reply With Quote
 
dfhLASST
Guest
Posts: n/a
 
      11-11-2004
"Michael Borgwardt" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> dfhLASST wrote:
> > What is the best way to remove all non-alphabetic characters (e.g.

symbols,
> > spaces etc.) from a String?
> >
> > My original plan was to loop round the chars in the String and add them

to
> > an array if the value of the chars are alphabetic (i.e. >=65 and <=122).
> > I've ran into problems with this and it seems more complex than the

problem
> > should be.

>
> The problem is more complex than you think. Are you absolutely sure than
> you're only ever going to process English text? If not, use

Character.isLetter()
> for the condition.
>
> For the accumulation of the output string, use StringBuffer (I gess that's
> where you encountered obvious problems).


Thanks, yeah I used that.

For future reference for anyone else here is my method:


public String stripPunctuation(String s) {

StringBuffer sb = new StringBuffer();

for (int i = 0; i < s.length(); i++) {
if ((s.charAt(i) >= 65 && s.charAt(i) <= 90) || (s.charAt(i) >= 97 &&
s.charAt(i) <= 122)) {

sb = sb.append(s.charAt(i));
}
}

return sb.toString();
}


 
Reply With Quote
 
Woebegone
Guest
Posts: n/a
 
      11-11-2004
"Michael Borgwardt" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> dfhLASST wrote:
>> What is the best way to remove all non-alphabetic characters (e.g.
>> symbols,
>> spaces etc.) from a String?


8<
>
> The problem is more complex than you think. Are you absolutely sure than
> you're only ever going to process English text? If not, use
> Character.isLetter()
> for the condition.
>
> For the accumulation of the output string, use StringBuffer (I gess that's
> where you encountered obvious problems).


I've used something like the following in cases where I know the processing
is constrained to a given (relatively small) set of characters, e.g. English
text. It has the advantage of allowing easy extension by adding characters
to ALPHABET without necessarily requiring char codes.

/* */
public class StringCleanser {
public static final String ALPHABET =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" +
"abcdefghijklmnopqrstuvwxyz";
public static boolean isAlphabetic(char c) {
return StringCleanser.ALPHABET.indexOf(c) != -1;
}
public static String cleanse(String s) {
StringBuffer buf = new StringBuffer();
for (int i = 0; i < s.length(); i++) {
if (StringCleanser.isAlphabetic(s.charAt(i))) {
buf.append(s.charAt(i));
}
}
return buf.toString();
}
public static void main(String[] args) {
String in = "L e,f.t/o;v'e[r]L1e.2t3t4e ,5r6s7";
System.out.println(StringCleanser.cleanse(in));
}
}
/* */
--
Regards,
Sean.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to remove the punctuation and no need words from paragraphs kylin Python 1 11-04-2009 07:42 AM
Can't remove punctuation from string (compile error) Tashfeen Bhimdi C++ 6 10-11-2006 05:32 AM
Combine 2 Columns to one with punctuation DBLWizard ASP .Net 10 04-02-2005 12:07 AM
Re: Regular expression for punctuation Chris R. Timmons ASP .Net 0 07-10-2003 03:57 AM
Regular expression for punctuation Chris Leffer ASP .Net 0 07-09-2003 02:48 PM



Advertisments