Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Encoding issue for special characters on Windows

Reply
Thread Tools

Encoding issue for special characters on Windows

 
 
Nicolas Gaiffe
Guest
Posts: n/a
 
      01-09-2009
Hi,

I am facing an issue with special characters handling inside a Ruby
script running on Windows and am sure some of you could help me on
this.

This script copies files such as "<English_name>.txt" to
"<Other_language_name>.txt". But once translated, the new filename may
have special characters. '' for instance.

Running
puts ''
in a Ruby script gives
''
as an output, whereas the same code in irb gives
''

There must be an encoding issue at some point in my script but I
didn't manage to fix it (tried different values of '#encoding:'
without success). Any clue ?

Many thanks in advance
Best regards

Nicolas
 
Reply With Quote
 
 
 
 
Pascal J. Bourguignon
Guest
Posts: n/a
 
      01-09-2009
Nicolas Gaiffe <(E-Mail Removed)> writes:

> Hi,
>
> I am facing an issue with special characters handling inside a Ruby
> script running on Windows and am sure some of you could help me on
> this.
>
> This script copies files such as "<English_name>.txt" to
> "<Other_language_name>.txt". But once translated, the new filename may
> have special characters. '' for instance.
>
> Running
> puts ''
> in a Ruby script gives
> ''
> as an output, whereas the same code in irb gives
> ''
>
> There must be an encoding issue at some point in my script but I
> didn't manage to fix it (tried different values of '#encoding:'
> without success). Any clue ?


I use emacs. In emacs, you'd just put:

#!/usr/bin/ruby
# -*- coding:utf-8 -*-
puts ""

to have the script encoded in utf-8 and therefore outputing an utf-8 byte stream.
Then of course, you have to have an utf-8 terminal:



[pjb@simias :0.0 tmp]$ chmod 755 test.rb
[pjb@simias :0.0 tmp]$ export LC_CTYPE=en_US.UTF-8
[pjb@simias :0.0 tmp]$ ./test.rb

[pjb@simias :0.0 tmp]$ cat test.rb
#!/usr/bin/ruby
# -*- coding:utf-8 -*-
puts ""
[pjb@simias :0.0 tmp]$

Notice that in irb, with an utf-8 terminal, "".length == 2


Of course, you can choose to use iso-8859-1 or iso-8859-15, just substitute utf-8.
--
__Pascal Bourguignon__
 
Reply With Quote
 
 
 
 
F. Senault
Guest
Posts: n/a
 
      01-10-2009
Le 9 janvier 2009 10:10, Nicolas Gaiffe a crit :

> There must be an encoding issue at some point in my script but I
> didn't manage to fix it (tried different values of '#encoding:'
> without success). Any clue ?


It depends. If you are trying to echo something to the console, you'll
have to use CP850.

The character for is 228 in the ISO8859-1 [1] encoding that your file
seems to use, and that corresponds to the character in CP850 [2].

Now, if you're writing something on the screen as a means of control or
debug while manipulating files, don't convert your output to CP850 in
your resulting file ! You'd better stay in ISO, or maybe even in UTF-8,
depending on what your real goal is (website, internal application,
database, etc).

Fred
[1] : http://en.wikipedia.org/wiki/ISO/IEC_8859-1
[2] : http://en.wikipedia.org/wiki/Code_page_850
--
I don't need no arms around me I don't need no drugs to calm me
I have seen the writing on the wall Don't think I need anything at all
No, don't think I'll need anything at all
(Pink Floyd, Another Brick in The Wall part 3)
 
Reply With Quote
 
Nicolas Gaiffe
Guest
Posts: n/a
 
      01-13-2009
On 10 jan, 16:24, "F. Senault" <(E-Mail Removed)> wrote:
> It depends. *If you are trying to echo something to the console, you'll
> have to use CP850.
>
> The character for is 228 in the ISO8859-1 [1]encodingthat your file
> seems to use, and that corresponds to the character in CP850 [2].
>
> Now, if you're writing something on the screen as a means of control or
> debug while manipulating files, don't convert your output to CP850 in
> your resulting file ! *You'd better stay in ISO


Hi and sorry for the delay,

You were right. The screen output was the only one concerned by the
issue. The result in the filesystem was allright. So everything is
working as expected since I have no need to display the filenames once
in production.

Thanks to both of you




 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Counting utf-8 characters -special characters majna Javascript 4 09-19-2007 01:53 PM
Remove only special characters and junk characters from a file rvino Perl 0 08-14-2007 07:23 AM
Re: Meta-Characters, Special Characters xah@xahlee.org Java 2 05-31-2007 09:25 AM
How to convert HTML special characters to the real characters with a Java script Stefan Mueller HTML 3 07-23-2006 10:09 PM
Preventing ASP.NET from encoding special characters =?Utf-8?B?RWR3YXJk?= ASP .Net 0 12-14-2004 09:13 PM



Advertisments