Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > dump the real string

Reply
Thread Tools

dump the real string

 
 
toylet
Guest
Posts: n/a
 
      02-26-2004
Tad McClellan wrote:
>>> printf ".%02x.\n", ord foreach (split //, $line);

>> can I also use split(/\b/,$line)?

> What happened when you tried it?


it worked as well. SO it's the default separator used by split()?


--
.~. Might, Courage, Vision. In Linux We Trust.
/ v \ http://www.linux-sxs.org
/( _ )\ Linux 2.4.22-xfs
^ ^ 10:18am up 16:01 1 user 1.02 1.00
 
Reply With Quote
 
 
 
 
toylet
Guest
Posts: n/a
 
      02-26-2004
>>> That's wrong. length() will return the number of characters in the string.
>>> This is totally different from the length of the string in bytes.

>> You emant each char in a perl string is not stored as one byte?

> You mean ASCII doesn't work for Chinese?


I was talking about the length(). Where is the connection between
Chinese characters and length()?

All computer data are referenced as 8-bit bytes these days.


--
.~. Might, Courage, Vision. In Linux We Trust.
/ v \ http://www.linux-sxs.org
/( _ )\ Linux 2.4.22-xfs
^ ^ 10:20am up 16:03 1 user 1.01 1.00
 
Reply With Quote
 
 
 
 
Martien Verbruggen
Guest
Posts: n/a
 
      02-26-2004
On Thu, 26 Feb 2004 10:22:30 +0800,
toylet <> wrote:

[please leave attribution in place]

>>>> That's wrong. length() will return the number of characters in the string.
>>>> This is totally different from the length of the string in bytes.


>>> You emant each char in a perl string is not stored as one byte?


>> You mean ASCII doesn't work for Chinese?

>
> I was talking about the length(). Where is the connection between
> Chinese characters and length()?


length() gives the length of a strin gin characters. Chinese
characters are not stored in 8-bit bytes.

> All computer data are referenced as 8-bit bytes these days.


Nonsense. And what a particular machine/implementation calls a "byte"
has very little to do with characters.

Martien
--
|
Martien Verbruggen |
Trading Post Australia | 42.6% of statistics is made up on the spot.
|
 
Reply With Quote
 
toylet
Guest
Posts: n/a
 
      02-26-2004
> length() gives the length of a strin gin characters. Chinese
> characters are not stored in 8-bit bytes.


What is a chacacter in Perl's sense?

Under the BIG5 character encoding,each chinese alphabet (or character)
is stored as two bytes. One byte always equal to 8-bits anyway.

> Nonsense. And what a particular machine/implementation calls a "byte"
> has very little to do with characters.


i think we need to define "character".

--
.~. Might, Courage, Vision. In Linux We Trust.
/ v \ http://www.linux-sxs.org
/( _ )\ Linux 2.4.22-xfs
^ ^ 10:58am up 16:41 1 user 0.97 0.94
 
Reply With Quote
 
Martien Verbruggen
Guest
Posts: n/a
 
      02-26-2004
On Thu, 26 Feb 2004 11:02:03 +0800,
toylet <> wrote:
>> length() gives the length of a strin gin characters. Chinese
>> characters are not stored in 8-bit bytes.

>
> What is a chacacter in Perl's sense?


There is no simple and easy answer to that.

I think your question is probably best answered by referring you to
the perluniintro and perlunicode documentation (which come with
Perl); specifically the section titled "Byte and Character
semantics", and to advise you to read up on unicode and the various
encoding schemes that come with it.

> Under the BIG5 character encoding,each chinese alphabet (or character)
> is stored as two bytes. One byte always equal to 8-bits anyway.


No, it does not. An octet is 8 bits. The term "byte" is
context-sensitive and fluid. It could be 9 bits, or it could be 16 or
32 bits behind the screens. It is _generally_ 8 bits, and in certain
contexts it is always 8 bits, but this is certainly not a given in all
contexts. Wherever, for example, a byte refers to the underlying C
type char, it will be whatever the size of that type is.

>> Nonsense. And what a particular machine/implementation calls a "byte"
>> has very little to do with characters.

>
> i think we need to define "character".


See above.

I am assuming your thinking stems from a "in C the char type is a
character" background?

It is important to stop thinking of characters as matching C's char
type, and to stop thinking of C's char type always being 8 bits (even
though a char is always a byte).

Neither is true. Not even in C.

Martien
--
|
Martien Verbruggen | Useful Statistic: 75% of the people make up
Trading Post Australia | 3/4 of the population.
|
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      02-26-2004
toylet wrote:
>> That's wrong. length() will return the number of characters in the
>> string. This is totally different from the length of the string in
>> bytes.

>
> You emant each char in a perl string is not stored as one byte?


I meant that in general it is not possible to store every character in a
single byte. Actually the vast majority of characters in the more commonly
spoken languages typically require at least two bytes to store them.

jue


 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      02-26-2004
toylet wrote:
>>>> That's wrong. length() will return the number of characters in the
>>>> string. This is totally different from the length of the string in
>>>> bytes.
>>> You emant each char in a perl string is not stored as one byte?

>> You mean ASCII doesn't work for Chinese?

>
> I was talking about the length(). Where is the connection between
> Chinese characters and length()?


Maybe that a text in Chinese with 20 characters typically requires 40 bytes
to be stored?
So what to you want to know? The length of the string in characters or the
size of the allocated memory. You were asking for the memory size.

> All computer data are referenced as 8-bit bytes these days.


Which means 256 distinct values which means there is just no way to encode
those tens of thousands of Chinese characters in one single byte.

jue


 
Reply With Quote
 
toylet
Guest
Posts: n/a
 
      02-26-2004
>>> length() gives the length of a strin gin characters. Chinese
>>> characters are not stored in 8-bit bytes.

>> What is a chacacter in Perl's sense?

> There is no simple and easy answer to that.
> I think your question is probably best answered by referring you to
> the perluniintro and perlunicode documentation (which come with
> Perl); specifically the section titled "Byte and Character
> semantics", and to advise you to read up on unicode and the various
> encoding schemes that come with it.


You meant length() would react to unicode settings in Perl?

> It is important to stop thinking of characters as matching C's char
> type, and to stop thinking of C's char type always being 8 bits (even
> though a char is always a byte).


I think one byte always equal to 8 bits. All computer courses taught
that. 9-bit byte? What machines do that?


--
.~. Might, Courage, Vision. In Linux We Trust.
/ v \ http://www.linux-sxs.org
/( _ )\ Linux 2.4.22-xfs
^ ^ 12:38pm up 32 min 1 user 0.79 0.39
 
Reply With Quote
 
toylet
Guest
Posts: n/a
 
      02-26-2004
> Maybe that a text in Chinese with 20 characters typically requires 40 bytes
> to be stored?
> So what to you want to know? The length of the string in characters or the
> size of the allocated memory. You were asking for the memory size.


I didn't expect my question on displaying the bytes in a string would
end up talking about multi-lingual isssues.

> Which means 256 distinct values which means there is just no way to encode
> those tens of thousands of Chinese characters in one single byte.


Of course.

--
.~. Might, Courage, Vision. In Linux We Trust.
/ v \ http://www.linux-sxs.org
/( _ )\ Linux 2.4.22-xfs
^ ^ 12:42pm up 36 min 1 user 0.90 0.52
 
Reply With Quote
 
Martien Verbruggen
Guest
Posts: n/a
 
      02-26-2004
On Thu, 26 Feb 2004 12:40:51 +0800,
toylet <> wrote:
>>>> length() gives the length of a strin gin characters. Chinese
>>>> characters are not stored in 8-bit bytes.
>>> What is a chacacter in Perl's sense?

>> There is no simple and easy answer to that.
>> I think your question is probably best answered by referring you to
>> the perluniintro and perlunicode documentation (which come with
>> Perl); specifically the section titled "Byte and Character
>> semantics", and to advise you to read up on unicode and the various
>> encoding schemes that come with it.

>
> You meant length() would react to unicode settings in Perl?


Have you read the documentation?

>> It is important to stop thinking of characters as matching C's char
>> type, and to stop thinking of C's char type always being 8 bits (even
>> though a char is always a byte).

>
> I think one byte always equal to 8 bits. All computer courses taught
> that. 9-bit byte? What machines do that?


Various PDP architectures do that. 36 bit architectures. there are
other architectures that use larger power of two bytes.

Use Google, or ask in a usenet group that talks about these sorts of
things all the time. I'm done with this subject.

Martien
--
|
Martien Verbruggen | Think of the average person. Approximately
Trading Post Australia | half of the people out there are dumber.
|
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Dump complete java VM state as core dump (not via OS) possible? halfdog Java 12 02-21-2013 06:14 AM
OT: The Interview - Real, Funny...Real Funny The Rev [MCT] MCSE 42 05-31-2005 10:42 PM
call any usa REAL telephone number from the internet at pulver freeworld for .06 per minute - have your own real fone # for $10 month!! ucallvoip@yahoo.com VOIP 0 06-09-2004 01:41 AM
product of real and (integer)(after converted to real one) value - vhdl found fatal error senthil VHDL 5 01-24-2004 04:37 AM
data conversion question (binary string to 'real string') Alexander Eisenhuth Python 5 07-25-2003 06:42 PM



Advertisments