Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Re: korean character sets

Reply
Thread Tools

Re: korean character sets

 
 
Ben Bacarisse
Guest
Posts: n/a
 
      08-22-2013
Cal Dershowitz <(E-Mail Removed)> writes:
<snip>
> Then I fire up google translate, pasting the 3 paragraphs in, and
> first lifting out the cyrillic characters that say the right thing and
> stuff them in a file called russian1 . Then I take the phonetic
> output that they give you now just because you asked and stuffed it in
> a file called russian2.
>
> http://merrillpjensen.com/pages/obamazombies_3.php


Why do you think this is a Perl question? It is more likely to be a PHP
one, so a PHP group might be better.

In case it helps you frame the question better when you do post in a
suitable group, I offer a few observations:

The server does not specify a character encoding when serving the
page. Not fatal, but it sure helps to get that sorted out.

Almost all the key data is missing. What actually is in "russian1"?
When you post it (to the right group) a hex dump might be the best way
to show the contents, but "cat -A" would also work.

You don't show the PHP code, and I am not sure you show the template
that the code is acting on. You did post a file that might have been
the template but it did not look like one to me. It certainly contained
no clues as to what processing happens to build the page.

Please don't take this an invitation to discuss PHP, server
configuration, or any other non-Perl topics here.

--
Ben.
 
Reply With Quote
 
 
 
 
Ben Bacarisse
Guest
Posts: n/a
 
      08-24-2013
Cal Dershowitz <(E-Mail Removed)> writes:

> On 08/22/2013 05:17 AM, Ben Bacarisse wrote:
>> Cal Dershowitz <(E-Mail Removed)> writes:
>> <snip>
>>> Then I fire up google translate, pasting the 3 paragraphs in, and
>>> first lifting out the cyrillic characters that say the right thing and
>>> stuff them in a file called russian1 . Then I take the phonetic
>>> output that they give you now just because you asked and stuffed it in
>>> a file called russian2.
>>>
>>> http://merrillpjensen.com/pages/obamazombies_3.php

>>
>> Why do you think this is a Perl question? It is more likely to be a PHP
>> one, so a PHP group might be better.

>
> [x-posted to c.l.php]
>
> I find that many applications are mixtures. For example, I would not
> use php to template a php page. The topic here is the perl templating
> of php and is topical in both these groups.


Your previous post had no Perl and no Perl question. You now
cross-post to a PHP group and all I can find is some Perl code and a
report that you've fixed the PHP issue.

I'll remove comp.lang.php since I see nothing PHP related here.

<snip>
> Anyways, the problem was that I was running utf-8 characters through
> html entities, and if you want to see a bunch of upside-down question
> marks, you can do that too.


That can't explain what you were seeing. HTML encoding a bunch bytes
that just happen to be the UTF-8 encoding of some other character can do
what you were seeing, but that is not quite the same thing.

At every stage you need to bear in mind two things: (a) what is the
chracter encoding of the data, and (b) what character encoding does this
part of the system /think/ is being used.

Simply running a UTF-8 encoded string though encode_entities (from
HTML::Entities) will work correctly if the Perl code knows that the
characters are UTF-8 encoded. If not, it will interpret the string as
some other encoding -- probably just plain bytes -- and encode those.

> Q1) If I'm typing on this keyboard and herd the corresponding utf-8
> characters between paragraph tags, do I ever need to call
> HTML::Entities to sort out what I did?


The depends on a whole bunch of things. Very often you can use Perl
transparently -- you read from a file and you output to the browser
without ever having to encode or decode anything. This works if the
file uses the same encoding that the browser will use.

At other times you need to convert between encodings. Think of "HTML
entities" simply as yet another character encoding.

> There's always a bit of html in any properly-phrased question along
> these lines.
>
> This counts as successful output, subject to comment:
>
> http://merrillpjensen.com/pages/obamazombies_1.php
>
>
> # captions
> my $caption = <$CAPTIONS>;
> # I think the next line is a mistake.
> # $caption = encode_entities($caption);


It may be unnecessary. It will only be a mistake if Perl does not know
that the file is UTF-8 encoded characters. Reading perlio will help.
You also need to know about "use utf8" if your perl /source/ contains
any UTF-8 encoded data.

> printf $fh $template, "${word2}/" . $remote_file, $caption;


<snip>
--
Ben.
 
Reply With Quote
 
 
 
 
Ben Bacarisse
Guest
Posts: n/a
 
      08-25-2013
Cal Dershowitz <(E-Mail Removed)> writes:

> On 08/24/2013 06:32 AM, Ben Bacarisse wrote:
>> Cal Dershowitz <(E-Mail Removed)> writes:

>
>>>> Why do you think this is a Perl question? It is more likely to be a PHP
>>>> one, so a PHP group might be better.
>>>
>>> [x-posted to c.l.php]
>>>
>>> I find that many applications are mixtures. For example, I would not
>>> use php to template a php page. The topic here is the perl templating
>>> of php and is topical in both these groups.

>>
>> Your previous post had no Perl and no Perl question. You now
>> cross-post to a PHP group and all I can find is some Perl code and a
>> report that you've fixed the PHP issue.
>>
>> I'll remove comp.lang.php since I see nothing PHP related here.

>
> Alright, but it basically amounted to you suggesting it and then
> unsuggesting it.


Yes, you got me. The message I replied to had not one line of Perl in
it. It did talk about PHP, linked to several .php URLs, and it
contained a reference to some PHP template files. The only questions in
it were a general one ("Why can't I get this right?") and one about HTML
meta charset declarations. But I see now it was about Perl.

This:
> I get you pretty well, but I'm a little stuck. Now I think I'm
> drawing a blank on the printf statement.
>
> Use of uninitialized value $caption in printf at ./russian1.pl line 109.


Talking your code and clipping it to a file suggests that line 109 is
the one before the printf. Did you post what you are running?

<snip>
> I thought I was figuring it out, but don't get why it cats nothing:
>
> http://merrillpjensen.com/pages/utf_1.php


I am not sure what anyone can do with this.

--
Ben.
 
Reply With Quote
 
Charlton Wilbur
Guest
Posts: n/a
 
      08-28-2013
>>>>> "C" == Cal Dershowitz <(E-Mail Removed)> writes:

C> I think my machine or my brain wasn't working.

Finally we are all on the same page.

Charlton

--
Charlton Wilbur
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
download problem with non-ASCII file names (korean) =?Utf-8?B?amlu?= ASP .Net 0 02-15-2006 06:27 AM
Reading and Typing Korean =?Utf-8?B?SmVhbg==?= Microsoft Certification 2 11-30-2004 10:33 PM
JAI, Korean Text and Windows UNICODE Nicholas Pappas Java 0 05-31-2004 03:48 PM
Korean Language Support me Computer Support 5 02-06-2004 06:48 AM
What Can be Done About Obnoxious Korean Spam? Robert H. Risch Computer Support 4 11-19-2003 03:07 AM



Advertisments