Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Encoding, "extended ansi", and unicode in 1.9

Reply
Thread Tools

Encoding, "extended ansi", and unicode in 1.9

 
 
Dennis Nedry
Guest
Posts: n/a
 
      06-16-2010
I have a routine for converting ansi with "extended" ibm characters to
html. It is as follows...

EXTENDED_ANSI_TABLE = {
227.chr => "<br>",
32.chr => "&nbsp;",
128.chr => "&Ccedil;", #128 C, cedilla (199)
129.chr => "&uuml;", #129 u, umlaut (252)
130.chr => "&eacute;", #130 e, acute accent (233)
131.chr => "&acirc;", #131 a, circumflex accent (226)
132.chr => "&auml;", #132 a, umlaut (22
133.chr => "&agrave;", #133 a, grave accent (224)
134.chr => "&aring;", #134 a, ring (229)
135.chr => "&ccedil;", #135 c, cedilla (231)
136.chr => "&ecirc;", #136 e, circumflex accent (234)
137.chr => "&euml;", #137 e, umlaut (235)
138.chr => "&egrave;", #138 e, grave accent (232)
139.chr => "&iuml;", #139 i, umlaut (239)
140.chr => "&icirc;", #140 i, circumflex accent (23
141.chr => "&igrave;", #141 i, grave accent (236)
#big huge list continues for pages...
}


def parse_ansi_ext(str)

EXTENDED_ANSI_TABLE.each_pair {|color, result|
str = str.gsub(color,result)
}
return str
end

This worked in 1.8, no problem.

If the input contains a character above 127.chr, it now bombs with the error:

"Encoding::CompatibilityError at /
incompatible encoding regexp match (ASCII-8BIT regexp with ISO-8859-1 string)"

I've tried various acts of desperation to fix it, to no avail. I
don't understand exactly what is wrong...

Thanks,

Dennis

 
Reply With Quote
 
 
 
 
Michael Fellinger
Guest
Posts: n/a
 
      06-16-2010
On Thu, Jun 17, 2010 at 12:40 AM, Dennis Nedry <> w=
rote:
> I have a routine for converting ansi with "extended" ibm characters to
> html. =C2=A0It is as follows...
>
> EXTENDED_ANSI_TABLE =3D {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0227.chr =3D> "<br>",
> =C2=A0 =C2=A0 =C2=A0 =C2=A032.chr =3D> "&nbsp;",
> =C2=A0 =C2=A0 =C2=A0 =C2=A0128.chr =3D> "&Ccedil;", =C2=A0#128 C, cedilla=

(199)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0129.chr =3D> "&uuml;", =C2=A0 =C2=A0#129 u, um=

laut (252)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0130.chr =3D> "&eacute;", =C2=A0#130 e, acute a=

ccent (233)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0131.chr =3D> "&acirc;", =C2=A0 #131 a, circumf=

lex accent (226)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0132.chr =3D> "&auml;", =C2=A0 =C2=A0#132 a, um=

laut =C2=A0(22
> =C2=A0 =C2=A0 =C2=A0 =C2=A0133.chr =3D> "&agrave;", =C2=A0#133 a, grave a=

ccent (224)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0134.chr =3D> "&aring;", =C2=A0 #134 a, ring (2=

29)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0135.chr =3D> "&ccedil;", =C2=A0#135 c, cedilla=

(231)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0136.chr =3D> "&ecirc;", =C2=A0 #136 e, circumf=

lex accent (234)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0137.chr =3D> "&euml;", =C2=A0 =C2=A0#137 e, um=

laut (235)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0138.chr =3D> "&egrave;", =C2=A0#138 e, grave a=

ccent (232)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0139.chr =3D> "&iuml;", =C2=A0 =C2=A0#139 i, um=

laut (239)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0140.chr =3D> "&icirc;", =C2=A0 #140 i, circumf=

lex accent (23
> =C2=A0 =C2=A0 =C2=A0 =C2=A0141.chr =3D> "&igrave;", =C2=A0#141 i, grave a=

ccent (236)
> =C2=A0#big huge list continues for pages...
> }
>
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0def parse_ansi_ext(str)
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=

=A0 =C2=A0EXTENDED_ANSI_TABLE.each_pair {|color, result|
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=

=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0str =3D str.gsub(color,result)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=

=A0 =C2=A0}
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return str
> =C2=A0 =C2=A0 =C2=A0 =C2=A0end
>
> This worked in 1.8, no problem.
>
> If the input contains a character above 127.chr, it now bombs with the er=

ror:
>
> "Encoding::CompatibilityError at /
> incompatible encoding regexp match (ASCII-8BIT regexp with ISO-8859-1 str=

ing)"
>
> I've tried various acts of desperation to fix it, to no avail. =C2=A0I
> don't understand exactly what is wrong...


str has the encoding ISO-8859-1, probably inherited from your system locale=
 
Reply With Quote
 
 
 
 
Dennis Nedry
Guest
Posts: n/a
 
      06-17-2010
On Wed, Jun 16, 2010 at 6:30 PM, Michael Fellinger
<> wrote:
>
> str has the encoding ISO-8859-1, probably inherited from your system locale.
> Convert it to ASCII-8BIT before processing it.
>
> http://blog.grayproductions.net/arti...uby_19s_string


Thanks, that worked. I guess we should always specify file encoding
from now on.

Take Care,

mark


--
"I've got ham but I'm not a hamster."

-Bill Bailey

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Python unicode utf-8 characters and MySQL unicode utf-8 characters Grzegorz Śliwiński Python 2 01-19-2011 07:31 AM
Help for Unicode char and Unicode char based string in Ruby Chirag Mistry Ruby 6 02-08-2008 12:45 PM
Unicode digit to unicode string Gabriele *darkbard* Farina Python 2 05-16-2006 01:15 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57