Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > File.new and encoding

Reply
Thread Tools

File.new and encoding

 
 
Achim Domma (SyynX Solutions GmbH)
Guest
Posts: n/a
 
      11-29-2005
Hi,

I'm still quite new to ruby, but have written a simple code generator.
The generator opens some files and combines them to a new one. The
resulting file is encoded as iso-8859-1, but it looks like ruby writes
an UTF-8 Markter to the beginning of the file. Is that possible?

How can I tell ruby which encoding to use, if I write to textfiles?

Any pointers to documentation are wellcome, but I didn't find something
usefull using google.

regards,
Achim
 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      11-29-2005
Achim Domma (SyynX Solutions GmbH) wrote:
> Hi,
>
> I'm still quite new to ruby, but have written a simple code generator.
> The generator opens some files and combines them to a new one. The
> resulting file is encoded as iso-8859-1, but it looks like ruby writes
> an UTF-8 Markter to the beginning of the file. Is that possible?


What's an UTF-8 marker? I know only two byte UTF-16 marker but AFAIK
there is no marker for UTF-8. Did I miss something?

> How can I tell ruby which encoding to use, if I write to textfiles?
>
> Any pointers to documentation are wellcome, but I didn't find
> something usefull using google.


Encoding is not an easy issue with ruby - I guess by default it uses the
default enconding of your environment. But you can specify certain
(Japanese) encodings with command line option -K. HTH

Kind regards

robert

 
Reply With Quote
 
 
 
 
nobu@ruby-lang.org
Guest
Posts: n/a
 
      11-29-2005
Hi,

At Wed, 30 Nov 2005 00:17:29 +0900,
Robert Klemme wrote in [ruby-talk:167988]:
> > I'm still quite new to ruby, but have written a simple code generator.
> > The generator opens some files and combines them to a new one. The
> > resulting file is encoded as iso-8859-1, but it looks like ruby writes
> > an UTF-8 Markter to the beginning of the file. Is that possible?

>
> What's an UTF-8 marker? I know only two byte UTF-16 marker but AFAIK
> there is no marker for UTF-8. Did I miss something?


It would be UTF-8 encoded BOM, but ruby itself never write it
automatically.

> > How can I tell ruby which encoding to use, if I write to textfiles?


Can't you show the code?

--
Nobu Nakada


 
Reply With Quote
 
Achim Domma (SyynX Solutions GmbH)
Guest
Posts: n/a
 
      11-29-2005
wrote:

> It would be UTF-8 encoded BOM, but ruby itself never write it
> automatically.

[...]
> Can't you show the code?


Trying to reproduce the problem in a smaller example, I figured out,
that I'm reading the BOM from one of my source files. Sorry for the
confusion. I'm doing something like:

File.open("target","w") do |target|
File.open("source","r") do |source|
source.each_line do |line|
... some processing ...
target.write(line)
end
end
end


source seems to contain the BOM and it is writen to target. Any hint on
how to strip the BOM?

regards,
Achim
 
Reply With Quote
 
Alex Fenton
Guest
Posts: n/a
 
      11-29-2005
> I'm doing something like:
>
> File.open("target","w") do |target|
> File.open("source","r") do |source|
> source.each_line do |line|
> ... some processing ...
> target.write(line)
> end
> end
> end


Have you looked at 'iconv' in the standard library?

http://www.ruby-doc.org/stdlib/libdo...ses/Iconv.html

Assuming all your input files were ISO-8859-1, and you wanted your output file in UTF-8, your example might look something like (untested):

File.open("target","w") do |target|
Iconv.open('UTF-8', 'ISO-8859-1') do | converter |
File.open("source","r") do |source|
source.each_line do |line|
# ... some processing ...
target.write( converter.iconv(line) )
end
end
target << converter.iconv(nil)
end
end

Iconv should deal with BOMs, stripping them out or adding them in where necessary. I'm not sure if it will complain if it finds a BOM mid-stream (as you open your second and subsequent input file) - if so you could just instantiate a new Iconv to deal with each input.

HTH
alex
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Reading Text File Encoding and converting to Perls internal UTF-8 encoding sln@netherlands.com Perl Misc 2 04-17-2009 11:22 PM
Getting the encoding of sys.stdout and sys.stdin, and changing it properly velle@velle.dk Python 2 01-05-2006 11:33 AM
one-hot encoding and fale-safe condition. Mohammed A khader VHDL 12 02-01-2005 04:24 PM
changing JVM encoding; setting -Dfile.encoding doesn't work pasmol@plusnet.pl Java 1 10-08-2004 09:50 PM
Encoding.Default and Encoding.UTF8 Hardy Wang ASP .Net 5 06-09-2004 04:04 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57