![]() |
Unicode: Strings marked 'utf8'. Can they be converted to 'byte' without going the vec() route?
Below is my sample code. This works but if I could just get
a byte string from a *possible* utf8 string with anything simpler than this, I would be a happy camper. In the real app, I have no control over how the sample is generated. Its likely read from PerlIO with whatever encoding layers are applied. I don't want to have to worry about that, just get it back to a byte string for analysis. Thanks alot. -sln -------------------------- use strict; use warnings; my $sample = "unicode->\x{feff}\x{21000}\x{21000}"; print "\nUTF string, length = ".length($sample).", '$sample' :\n "; for (map {ord $_} split //, $sample) { printf ("%x ",$_); } print "\n"; my ($bytes, $offset) = ('',0); for (map {ord $_} split //, $sample) { my @ar = (); while ($_ > 0) { push @ar, $_ & 0xff; $_ >>= 8; } for (reverse @ar) { vec ($bytes, $offset++, 8) = $_; } } print "\nByte converted, length = ".length($bytes).", '$bytes' :\n "; for (map {ord $_} split //, $bytes) { printf ("%02x ",$_); } print "\n"; __END__ Wide character in print at btest.pl line 6. UTF string, length = 12, 'unicode->n++=íÇÇ=íÇÇ' : 75 6e 69 63 6f 64 65 2d 3e feff 21000 21000 Byte converted, length = 17, 'unicode->¦*?? ?? ' : 75 6e 69 63 6f 64 65 2d 3e fe ff 02 10 00 02 10 00 |
| All times are GMT. The time now is 10:00 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.