toylet wrote:
> You meant length() would react to unicode settings in Perl?
Yes. UTF-8 encoding uses 8, 16, 24 or 32 bits per character.
>> It is important to stop thinking of characters as matching C's char
>> type, and to stop thinking of C's char type always being 8 bits (even
>> though a char is always a byte).
>
> I think one byte always equal to 8 bits. All computer courses taught
> that. 9-bit byte? What machines do that?
Three of the first five computers connected to the ARPANET were
36-bit computers. They used 7-bit ASCII for regular text,
SIXBIT for COBOL data, strings of 5-bit codes for FORTRAN error
messages. When talking to other computers, the PDP-10 used
8-bit bytes, 9-bit bytes, 12-bit bytes, 16-bit bytes and 18-bit bytes.
A byte is defined to be a contiguous set of bits. When talking
about 8-bit bytes, the proper term is "octet".
-Joe
http://www.inwap.com/pdp10/