Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > A question about Iconv arguments

Reply
Thread Tools

A question about Iconv arguments

 
 
Axel Etzold
Guest
Posts: n/a
 
      06-09-2007
Dear all,

I need to convert some accented text, and I would like to know
what arguments I have to give Iconv to produce the desired output.
E.g., in Italian, the word for Friday is "venerdi", where the
"i" carries a dash (small i with grave accent).
If you type this into Wikipedia search in Italian
(which I believed to be in utf-8 encoding),
it will load:

http://it.wikipedia.org/wiki/Venerd%c3%ac ,

yet this syntax:

converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)

gives me "venerd\303\254" when I convert from latin1 encoding.

What arguments do I have to use ?

Thank you,

Best regards,

Axel




--
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

 
Reply With Quote
 
 
 
 
Alex Young
Guest
Posts: n/a
 
      06-09-2007
Axel Etzold wrote:
> Dear all,
>
> I need to convert some accented text, and I would like to know
> what arguments I have to give Iconv to produce the desired output.
> E.g., in Italian, the word for Friday is "venerdi", where the
> "i" carries a dash (small i with grave accent).
> If you type this into Wikipedia search in Italian
> (which I believed to be in utf-8 encoding),
> it will load:
>
> http://it.wikipedia.org/wiki/Venerd%c3%ac ,
>
> yet this syntax:
>
> converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)
>
> gives me "venerd\303\254" when I convert from latin1 encoding.

That looks right to me - if I write that into a UTF-8 HTML document, it
displays correctly. What are you expecting?

--
Alex

 
Reply With Quote
 
 
 
 
Axel Etzold
Guest
Posts: n/a
 
      06-10-2007
Dear Alex,

thank you for responding.
If I try to get a webpage that has accents in its address,
like

> require "rubygems"
> require "rio"
> require 'iconv'
> output_encoding = 'utf-8'
> doc="Venerdý"
> converted_doc = Iconv.new(output_encoding, 'latin1').iconv(doc)
> rio("http://www.wikipedia.org/wiki/" + converted_doc)>rio("a.html")


I get an error message:

/usr/local/lib/ruby/1.8/uri/common.rb:436:in `split': bad URI(is not URI?): http://www.wikipedia.org/wiki/venerd├Č (URI::InvalidURIError)
from /usr/local/lib/ruby/1.8/uri/common.rb:485:in `parse'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/withpath.rb:285:in `uri_from_string_'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:74:in `arg0_info_'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:83:in `init_from_args_'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:56:in `initialize'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/base.rb:80:in `new'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/base.rb:80:in `parse'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/builder.rb:111:in `build'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/factory.rb:412:in `create_state'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:65:in `initialize'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:76:in `new'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:76:in `rio'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/kernel.rb:42:in `rio'


This doesn't happen if I type in:

rio("http://www.wikipedia.org/wiki/Venerd%C3%AC")>rio("a.html")

So I need to know what conversion arguments I need to give Iconv to
turn "Venerdý" into "Venerd%C3%AC".

Best regards,

Axel
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal fŘr Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

 
Reply With Quote
 
Axel Etzold
Guest
Posts: n/a
 
      06-10-2007
I've managed to solve this problem like this:

require "rubygems"
require "rio"
require 'iconv'


def to_hex(number)
number=number.abs
binary=''
while number>0
digit=number%16
if digit<10
binary<<digit.to_s
elsif digit==10
binary<<'A%'
elsif digit==11
binary<<'B%'
elsif digit==12
binary<<'C%'
elsif digit==13
binary<<'D%'
elsif digit==14
binary<<'E%'
elsif digit==15
binary<<'F%'
end
number=(number-digit)/16
end
return binary.reverse.gsub(/%([A-F])%([A-F])/,'%\1\2')
end

class String
def wiki_addr
converted_doc = Iconv.new('utf-8', 'latin1').iconv(self)
res=''
converted_doc.split(//).each{|x|
if /[a-zA-Z0-9\_ ]/.match(x)
res<<x
else
res<<to_hex(x[0])
end
}
return res
end
end


doc ="venerdý"
doc.wiki_addr
rio("http://it.wikipedia.org/wiki/"+ doc.wiki_addr)>rio("a.html")

Best regards,

Axel
--
Psssst! Schon vom neuen GMX MultiMessenger geh÷rt?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

 
Reply With Quote
 
Stefan Rusterholz
Guest
Posts: n/a
 
      06-10-2007
Axel Etzold wrote:
> I've managed to solve this problem like this:
>
> require "rubygems"
> require "rio"
> require 'iconv'
>
>
> def to_hex(number)
> number=number.abs
> binary=''
> while number>0
> digit=number%16
> if digit<10
> binary<<digit.to_s
> elsif digit==10
> ...


I guess you're not aware of neither:
1234.to_s(16)
nor:
"%x" % 1234

For situations like the above, even a lookup-array or a case/when would
be better.

Regards
Stefan

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Axel Etzold
Guest
Posts: n/a
 
      06-10-2007
Dear Stefan,

thank you for bringing this to notice!
(Slightly varying Voltaire, I might
have been able to write a shorter
program had I had more leisure and
more knowledge).
I'll try your suggestion.
Best regards,

Axel
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal fŘr Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

 
Reply With Quote
 
Nobuyoshi Nakada
Guest
Posts: n/a
 
      06-11-2007
Hi,

At Sun, 10 Jun 2007 18:05:49 +0900,
Axel Etzold wrote in [ruby-talk:254981]:
> I've managed to solve this problem like this:


$ ruby -riconv -rcgi -e 'puts CGI.escape(Iconv.conv("utf-8", "latin1", "venerd\354"))'
venerd%C3%AC

--
Nobu Nakada

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
iconv "\n" (Iconv::InvalidCharacter) Krzysztof Cierpisz Ruby 0 09-08-2009 12:11 PM
Run time error on AIX: "Symbol iconv was referenced" kp C Programming 5 03-13-2008 03:38 PM
about iconv yong C Programming 1 03-13-2006 01:54 PM
How to fix the bug about iconv for python? Strong IsOnlyWord Python 1 12-26-2005 06:00 AM
Iconv.iconv and Windows XP Tim Ferrell Ruby 4 10-04-2005 10:20 AM



Advertisments