On Thu, Jul 2, Eileen inscribed on the eternal scroll:
> Sorry, I left out the first part of the script.there's the full
> script:
>
> #!/usr/local/bin/perl -w
We also recommend "use strict;" around here. Take advantage of all of
Perl's opportunities for helping you identify mistakes.
> $file = "kono.xml";
^
my
> open (IN, $file) or die "cannot open $file\n";
Don't omit "$!" from the error report: it helps to understand the
reason for the failure.
> I didn't realize you could specify the encoding of a file in Perl.
Another good reason to [check that you're using at least version
5.8.0 and] take a few moments out to read the introduction to the
new support for Unicode. (In earlier Perls you'd need to explicitly
invoke the relevant module to do this stuff).
> the \x{0x0D00} was identified by one of my Unicode editors,and was a
> stab in the dark on my part
But what have you learned from the experience?
- if you are reading text, and have properly defined the encoding,
then internally your characters can be referenced by their unicode
code point values, _not_ by their externally-encoded bit patterns.
- if, on the other hand, you are reading the data as a bunch of bytes
(i.e effectively "as binary") then you'd need to handle the byte-pairs
as byte-pairs, not as unicode characters. This is not to be
recommended in current versions of Perl (unless your data is somehow
defective, and you got to write a fixup routine of some kind).
- the new notation e.g \x{263a} denotes a _wide unicode character_ in
Perl's native unicode representation. That value is the Unicode code
point (in this case the smiley, "U+263a" as the Unicode Consortium's
notation would write it). Don't confuse it with the external coding
representation, which (_if_ you had read utf-16LE coding in binary
format, which I don't recommend) would have been \x3a\x26.
hope this helps
(You'd also be advised to take a read of
http://web.presby.edu/~nnqadmin/nnq/nquote.html )
p.s I have the impression that the regulars around here have nominated
me by default as the character encoding spokesman. I must admit that
I'm sometimes at the edge of my expertise, so I _do_ hope they're
watching closely, and will pounce as necessary if I say something
wrong or explain it badly...