Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > how to remove strange characters

Reply
Thread Tools

how to remove strange characters

 
 
Li Chen
Guest
Posts: n/a
 
      10-07-2008
Hi all,

I grap some info from a webpage. Sometimes I get some stranges
characters as follows (by p):
To depart in a hurry; abscond: \342\200\234Your horse
has\nabsquatulated!\342\200\235 (Robert M. Bird) To die.

or (by print):
To depart in a hurry; abscond: “Your horse has absquatulated!”
(Robert M. Bird) To die.

Any idea to to get rid of them?


Thanks,

Li
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Li Chen
Guest
Posts: n/a
 
      10-08-2008
Stephen Celis wrote:

> Those are multi-byte characters (curly quotes, in this case). You
> probably don't want to get rid of them, but you can use the iconv
> library to transliterate them back to their ASCII almost-equivalents:
>
>>> string = "To depart in a hurry; abscond: \342\200\234Your horse has\nabsquatulated!\342\200\235 (Robert M. Bird) To die."

> => "To depart in a hurry; abscond: \342\200\234Your horse
> has\nabsquatulated!\342\200\235 (Robert M. Bird) To die."
>>> require 'iconv'

> => true
>>> puts Iconv.iconv('ascii//translit', 'utf-8', string).to_s

> To depart in a hurry; abscond: "Your horse has
> absquatulated!" (Robert M. Bird) To die.
> => nil
>
> Stephen


Thank you,

Li
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Li Chen
Guest
Posts: n/a
 
      10-08-2008
Hi Stephen and others,

Iconv only works for some characters. It doesn't work for the following
scripts.

Any idea?

Thanks,

Li


C:\Users\Alex>irb
irb(main):001:0> require 'iconv'
=> true
irb(main):002:0> string1="Fatal injury or ruin:\223Hath some fond lover
tic'd thee to thy bane?\224
\342\200\246"
=> "Fatal injury or ruin:\223Hath some fond lover tic'd thee to thy
bane?\224\342\200\246"
irb(main):003:0> puts
Iconv.iconv('ASCII//TRANSLIT','utf-8',string1).to_s
Iconv::IllegalSequence: "\223Hath some fond "...
from (irb):3:in `iconv'
from (irb):3
irb(main):004:0>





--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Pablo Q.
Guest
Posts: n/a
 
      10-08-2008
[Note: parts of this message were removed to make it a legal post.]

what do you think doing something like this?

class String
def remove_nonascii(replacement)
n=self.split("")
self.slice!(0..self.size)
n.each{|b|
if (b[0].to_i< 32 || b[0].to_i>124) then
self.concat(replacement)
elsif
[34,35,37,42,43,44,45,47,60,61,62,63,91,92,93,94,96 ,123].include?(b[0].to_i)
self.concat(replacement)
else
self.concat(b)
end
}
self.to_s
end
end

"Fatal injury or ruin:\223Hath some fond lover tic'd thee to
thybane?\224\342\200\246".remove_nonascii('+')

=> "Fatal injury or ruin:+Hath some fond lover tic'd thee to thybane+++++"

how you can see, it made the replacement with char '+'.


2008/10/8 Li Chen <>

> Hi Stephen and others,
>
> Iconv only works for some characters. It doesn't work for the following
> scripts.
>
> Any idea?
>
> Thanks,
>
> Li
>
>
> C:\Users\Alex>irb
> irb(main):001:0> require 'iconv'
> => true
> irb(main):002:0> string1="Fatal injury or ruin:\223Hath some fond lover
> tic'd thee to thy bane?\224
> \342\200\246"
> => "Fatal injury or ruin:\223Hath some fond lover tic'd thee to thy
> bane?\224\342\200\246"
> irb(main):003:0> puts
> Iconv.iconv('ASCII//TRANSLIT','utf-8',string1).to_s
> Iconv::IllegalSequence: "\223Hath some fond "...
> from (irb):3:in `iconv'
> from (irb):3
> irb(main):004:0>
>
>
>
>
>
> --
> Posted via http://www.ruby-forum.com/.
>
>



--
Pablo Q.

 
Reply With Quote
 
Nit Khair
Guest
Posts: n/a
 
      10-09-2008
Li Chen wrote:
> Hi all,
>
> I grap some info from a webpage. Sometimes I get some stranges
> characters as follows (by p):
> To depart in a hurry; abscond: \342\200\234Your horse
> has\nabsquatulated!\342\200\235 (Robert M. Bird) To die.


Here's a quick hack I used recently. It was messing my display on
ncurses, and I did not need the characters.

dataitem.gsub!(/[^[:space:][rint:]]/,'')

I got this while googling, iirc, its used somewhere in ROR.
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Li Chen
Guest
Posts: n/a
 
      10-09-2008
Nit Khair wrote:
> Here's a quick hack I used recently. It was messing my display on
> ncurses, and I did not need the characters.
>
> dataitem.gsub!(/[^[:space:][rint:]]/,'')
>
> I got this while googling, iirc, its used somewhere in ROR.


It works on scenario where iconv doesn't work. Good job!!!

Li

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Bilyk, Alex
Guest
Posts: n/a
 
      10-10-2008
There is no one-click installer for 1.9 on Windows as far as I can tell. Do=
wnloading and unpacking the ziped binaries didn't get me very far as both r=
uby and irb complain that something is missing. Does binary distribution re=
quire me to install anything else? Like libraries? If this is the case what=
additional stuff do I need to make 1.9 to work and where can I get it?

Thanks,
Alex

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Remove only special characters and junk characters from a file rvino Perl 0 08-14-2007 07:23 AM
How to convert HTML special characters to the real characters with a Java script Stefan Mueller HTML 3 07-23-2006 10:09 PM
Convert Raw Text Escaped Characters to Characters nicholas.wakefield@gmail.com Java 2 07-11-2005 09:17 PM
help-> xslt transformation to pdf (chinese characters changed to # characters) omegaman XML 1 09-21-2004 10:44 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57