Alan J. Flavell wrote:
[re UTF-8 in perl scripts]
> It "works", yes, but (as I understand it, anyway) I think you have to
> ask for it. It could just be that if you call for locale-awareness
> with -CL, and you have utf-8 in your locale, it will come out in the
> wash; but I don't see any harm in asking for it directly, if you're so
> certain that you'll never not want it (sorry for the double-negative).
I also left the L off of -C because I don't think I have that completely
coerced to UTF-8
>>So, things are a little unclear. I put in both,
>
> Looks as if you're (a) right and (b) unlikely to cause any harm.
Sigh, now it starts getting weird. Kind of long, summary at the bottom.
The script with -CSD and use utf8 created a database,
and a test script pulled the records out of the database
and printed them. The non-ASCII characters rendered
correctly BUT that doesn't mean anything, since the test
script had the same -CSD and use utf8. (Right?)
So I figured I needed to eyeball inside the DB file
and see if I could find some nonASCII and see how it was encoded.
But a series of unfortunate events resulted in my having
to re-create the script, and then it crashed (bus error
or segmentation fault). Figured out which record it
was crashing on, put it in its own file, and ....
well to skip over the long tedious details, I eventually
had a version of the script that would crash and one that
would not crash on the same input file.
'diff' showed only one difference:
wgroleau$ diff ~/bin/GEDCOM_DB ./tempGCDB
1c1
< #!/usr/bin/perl -w -CSD
---
> #!/usr/bin/perl -w -CSD
od -xc revealed that the extra space is indeed a (hex 20)
regular space and not a UTF-8 construct.
More study showed that the space made a difference on the only
two systems I currently have access to:
wgroleau$ uname -a
Darwin Groleau.local 7.7.0 Darwin Kernel Version 7.7.0: Sun Nov 7
16:06:51 PST 2004; root

nu/xnu-517.9.5.obj~1/RELEASE_PPC Power
Macintosh powerpc
wgroleau$ perl -v
This is perl, v5.8.1-RC3 built for darwin-thread-multi-2level
(with 1 registered patch, see perl -V for more detail)
Copyright 1987-2003, Larry Wall
AND
[0:ag/g/groleau> uname -a
NetBSD otaku 1.6.2_STABLE NetBSD 1.6.2_STABLE (sdf) #0: Sun Jul 25
04:17:09 UTC 2004 root@ol:/var/src/src/sys/arch/alpha/compile/sdf alpha
[0:ag/g/groleau> perl -v
This is perl, v5.8.0 built for alpha-netbsd
Copyright 1987-2002, Larry Wall
On Darwin/PPC, the extra space prevents bus error/segmentation fault.
On Net-BSD/Alpha, it prevents the following:
[0:ag/g/groleau> rm wgroleau.DB; ./tempGCDB < bad.record.GED
Recompile perl with -DDEBUGGING to use -D switch
Can't emulate -S on #! line at ./tempGCDB line 1.
[255:ag/g/groleau> head -1 ./tempGCDB
#!/usr/pkg/bin/perl -w -CSD
Summary: On two diferent platforms, in
#!/usr/bin/perl -w -CSD
the extra space is required.
If anyone wants to try it on a different system, I can provide
the script and the input file.
--
Wes Groleau
-----------
"Thinking I'm dumb gives people something to
feel smug about. Why should I disillusion them?"
-- Charles Wallace
(in _A_Wrinkle_In_Time_)