Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > String problem

Reply
Thread Tools

String problem

 
 
Fresh Mix
Guest
Posts: n/a
 
      05-03-2009
What wrong?

# irb
irb(main):001:0> xxx = "лошадь"
=> "\320\273\320\276\321\210\320\260\320\264\321\ 214"
irb(main):002:0> xxx.length
=> 12
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Tom Cloyd
Guest
Posts: n/a
 
      05-03-2009
Fresh Mix wrote:
> What wrong?
>
> # irb
> irb(main):001:0> xxx = "лошадь"
> => "\320\273\320\276\321\210\320\260\320\264\321\ 214"
> irb(main):002:0> xxx.length
> => 12
>

I assume you're wondering why each character appears to be represented
by two bytes - and I believe it's because the encoding is, of necessity,
UTF-8 or something very similar. If I recall correctly, this encoding is
designed to be able to represent the world's alphabets, etc., rather
than merely the limited character set used in western European
languages, and so two bytes must be used to allow for all the possibilities.

If I don't have this quite right (or right at all), I'm sure I'll be set
right by those who know more here (and they are legion!).

t.


--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~
Tom Cloyd, MS MA, LMHC - Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< >> (email)
<< TomCloyd.com >> (website)
<< sleightmind.wordpress.com >> (mental health weblog)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~


 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      05-03-2009
On 03.05.2009 11:26, Tom Cloyd wrote:
> Fresh Mix wrote:
>> What wrong?
>>
>> # irb
>> irb(main):001:0> xxx = "лошадь"
>> => "\320\273\320\276\321\210\320\260\320\264\321\ 214"
>> irb(main):002:0> xxx.length
>> => 12
>>

> I assume you're wondering why each character appears to be represented
> by two bytes - and I believe it's because the encoding is, of necessity,
> UTF-8 or something very similar. If I recall correctly, this encoding is
> designed to be able to represent the world's alphabets, etc., rather
> than merely the limited character set used in western European
> languages, and so two bytes must be used to allow for all the possibilities.
>
> If I don't have this quite right (or right at all), I'm sure I'll be set
> right by those who know more here (and they are legion!).


Actually I do not call myself in when it comes to encodings in Ruby.
But I believe there is one important bit of information missing that's
needed to properly answer the OP's question: what Ruby version did you use?

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
Reply With Quote
 
Fresh Mix
Guest
Posts: n/a
 
      05-03-2009
Robert Klemme wrote:
> what Ruby version did you
> use?



$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
7stud --
Guest
Posts: n/a
 
      05-03-2009
Fresh Mix wrote:
>
> # irb
> irb(main):001:0> xxx = "лошадь"
> => "\320\273\320\276\321\210\320\260\320\264\321\ 214"
> irb(main):002:0> xxx.length
> => 12
>
> What wrong?



In 1.8.* versions, ruby doesn't recognize unicode, where characters are
represented by multiple bytes. ruby thinks everything in an ascii
character where characters are represented by one byte.

Try this:

xxx = "лошадь"
puts xxx.length

--output:--
12

$KCODE = "u"
require 'jcode'

puts xxx.jlength

--output:--
6

xxx.each_char do |u|
puts u
end

--output:--
л
о
ш
а
д
ь




--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
7stud --
Guest
Posts: n/a
 
      05-03-2009
7stud -- wrote:
>
> In 1.8.* versions, ruby doesn't recognize unicode, where characters are
> represented by multiple bytes. ruby thinks everything in an ascii
> character where characters are represented by one byte.
>


Corrections:

In 1.8.* versions, ruby doesn't recognize unicode, where characters [may
be]
represented by multiple bytes. ruby thinks everything [is] an ascii
character where characters are represented by one byte.
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
7stud --
Guest
Posts: n/a
 
      05-03-2009
Robert Klemme wrote:
>
> But I believe there is one important bit of information missing that's
> needed to properly answer the OP's question: what Ruby version did you
> use?
>


Why is that relevant? Can unicode be switched off in ruby 1.9?

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      05-03-2009
On 03.05.2009 13:27, 7stud -- wrote:
> Robert Klemme wrote:
>> But I believe there is one important bit of information missing that's
>> needed to properly answer the OP's question: what Ruby version did you
>> use?

>
> Why is that relevant? Can unicode be switched off in ruby 1.9?


It is relevant because handling of encodings has significantly changed
between 1.8 and 1.9, which I believe your other posting demonstrates.

Cheers

robert
 
Reply With Quote
 
7stud --
Guest
Posts: n/a
 
      05-04-2009
Yukihiro Matsumoto wrote:
> Regular
> expressions in 1.8.* recognize UTF-8, EUC-JP, and Shift_JIS. So you
> can handle Unicode strings by using regular expressions.


Too vague.


James Gray wrote:
>
> And those interested in how all that works may find this series on my
> blog helpful:
>
> http://blog.grayproductions.net/arti...rstanding_m17n
>


Excellent website. <c-word here>

Here is something that is unclear:

----------
To use the jcode library, set $KCODE and then require the library.
Setting $KCODE first is important, and you will receive a warning if you
require jcode without setting it (as long as you took my advice and
turned ****them*** on)...

http://blog.grayproductions.net/arti..._jcode_library
---------

In the sentence:

-------
Setting $KCODE first is important, and you will receive a warning if you
require jcode without setting it (as long as you took my advice and
turned them on)...
--------

'it' and 'them' are pronouns, which should refer to nouns. The pronoun
'it' looks like it might refer to 'jcode' when 'it' actually refers to
'$KCODE'. That is pretty easy to sort out.

However, what does 'them' refer to? 'them' should refer to a plural
noun, so if you actually stop and try to sort it out rather than just
dismissing the whole paragraph in confusion, 'them' looks like it must
refer to '$Kcode' and 'jcode'. However, that doesn't make sense because
you don't 'set' jcode--you require jcode.

Apparently, 'them' refers to 'warning', which is not only grammatically
incorrect but it is very hard to make that association. In any case, in
that sentence if you change 'it' and 'them' to $KCODE and 'warnings'
respectively, you will change a confusing and unreadable sentence into a
sentence whose clarity will be unmatched in modern literature:

-----
Setting $KCODE first is important, and you will receive a warning if you
require jcode without setting $KCODE (as long as you took my advice and
turned warnings on with -w)...
______

I'd bet that 90% of the readers of your article stop reading at that
exact spot.
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Aldric Giacomoni
Guest
Posts: n/a
 
      05-04-2009
7stud -- wrote:
> Yukihiro Matsumoto wrote:
>
>> Regular
>> expressions in 1.8.* recognize UTF-8, EUC-JP, and Shift_JIS. So you
>> can handle Unicode strings by using regular expressions.
>>

>
> Too vague.
>
>
> James Gray wrote:
>
>> And those interested in how all that works may find this series on my
>> blog helpful:
>>
>> http://blog.grayproductions.net/arti...rstanding_m17n
>>
>>

>
> Excellent website. <c-word here>
>
> Here is something that is unclear:
>
> ----------
> To use the jcode library, set $KCODE and then require the library.
> Setting $KCODE first is important, and you will receive a warning if you
> require jcode without setting it (as long as you took my advice and
> turned ****them*** on)...
>
> http://blog.grayproductions.net/arti..._jcode_library
> ---------
>
> In the sentence:
>
> -------
> Setting $KCODE first is important, and you will receive a warning if you
> require jcode without setting it (as long as you took my advice and
> turned them on)...
> --------
>
> 'it' and 'them' are pronouns, which should refer to nouns. The pronoun
> 'it' looks like it might refer to 'jcode' when 'it' actually refers to
> '$KCODE'. That is pretty easy to sort out.
>
> However, what does 'them' refer to? 'them' should refer to a plural
> noun, so if you actually stop and try to sort it out rather than just
> dismissing the whole paragraph in confusion, 'them' looks like it must
> refer to '$Kcode' and 'jcode'. However, that doesn't make sense because
> you don't 'set' jcode--you require jcode.
>
> Apparently, 'them' refers to 'warning', which is not only grammatically
> incorrect but it is very hard to make that association. In any case, in
> that sentence if you change 'it' and 'them' to $KCODE and 'warnings'
> respectively, you will change a confusing and unreadable sentence into a
> sentence whose clarity will be unmatched in modern literature:
>
> -----
> Setting $KCODE first is important, and you will receive a warning if you
> require jcode without setting $KCODE (as long as you took my advice and
> turned warnings on with -w)...
> ______
>
> I'd bet that 90% of the readers of your article stop reading at that
> exact spot.
>

Alright, so the man who created Ruby doesn't write well enough for you,
and neither does one of the big guys in the community. Maybe you'd like
to offer yourself as proofreader for them, instead of programmer for
others? I liked James' series of articles and was able to read it just
fine... And english is my 4th language.

-- Aldric
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
'System.String[]' from its string representation 'String[] Array' =?Utf-8?B?UmFqZXNoIHNvbmk=?= ASP .Net 0 05-04-2006 04:29 PM
Is "String s = "abc";" equal to "String s = new String("abc");"? Bruce Sam Java 15 11-19-2004 06:03 PM
String[] files = {"a.doc, b.doc"}; VERSUS String[] files = new String[] {"a.doc, b.doc"}; Matt Java 3 09-17-2004 10:28 PM
String.replaceAll(String regex, String replacement) question Mladen Adamovic Java 3 12-05-2003 04:20 PM
Re: String.replaceAll(String regex, String replacement) question Mladen Adamovic Java 0 12-04-2003 04:40 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57