Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > xml::twig - writing utf-8

Reply
Thread Tools

xml::twig - writing utf-8

 
 
miletwo@gmail.com
Guest
Posts: n/a
 
      05-25-2006
I'm trying to read xml file and rewrite as RSS using following file.
Problem is, it is not forcing UTF-8 no matter what I do. Any help
appreciated.

***********************
#!/bin/perl -w
#use strict;
use XML::Twig;
use utf8;

use open OUT => ":utf8";
use open IN => ":utf8";

my $shownum = 10;
my $thisyear = '2006';
my $field= 'releasedate';
my $twig= new XML::Twig( keep_encoding=> 1);

open(INFILE, "directorylist.xml");
$twig->parse(\*INFILE);

my $root= $twig->root;
my @releases= $root->children;

my $output = "";

$output .= '<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/">' . "\n";
$output .= '<channel>' . "\n\n";
$output .= <<EOT;
<title>scrubbed Incorporated - Recent News</title>
<link>http://www.scrubbed.com/press/</link>
<description>Visit the scrubbed Press Center where you will find
many resources, including press releases, corporate information,
technology overviews, executive bios and photos, the scrubbed logo and
more.<br />If you are a member of the media and are not able to find
what you are looking for in the Press Center, please send an email to
corpcomm\@scrubbed.com.</description>
<language>en-us</language>

EOT

for(my $i=0; $i < $shownum; $i++){
$output .= "\t" . '<item>' . "\n";
$output .= "\t\t" . '<title>' .
$releases[$i]->first_child('headline')->text . '</title>' . "\n";
$output .= "\t\t" . '<link>http://www.scrubbed.com/press/releases/' .
$thisyear . '/' . $releases[$i]->att('name') . '.html</link>' . "\n";
$output .= "\t\t" . '<description>' .
$releases[$i]->first_child('subheader')->text . '</description>' .
"\n";
$output .= "\t\t" . '<dc:date>' .
$releases[$i]->first_child('releasedate')->text . '</dc:date>' . "\n";
$output .= "\t" . '</item>';
$output .= "\n\n";
}

$output .= "</channel>\n</rss>";
Encode::_utf8_on($output);

open(FILEWRITE,">:utf8", "press.rss");
binmode FILEWRITE, ":utf8";
print FILEWRITE $output;

 
Reply With Quote
 
 
 
 
Peter J. Holzer
Guest
Posts: n/a
 
      05-25-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> I'm trying to read xml file and rewrite as RSS using following file.
> Problem is, it is not forcing UTF-8 no matter what I do. Any help
> appreciated.


Your script works for me. Please provide a complete example that
demonstrates the error. Your script tries to read a file named
directorylist.xml, but you didn't provide that file. I had to read your
script to find out what that file should contain, and write one myself.
Maybe there is an error in your input file.

Also you didn't provide any information about the system you are using.
I tested it with Debian Sarge (perl 5.8.4, XML::Twig 3.17).

hp

--
_ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
|_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
| | | (E-Mail Removed) | würde.
__/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
 
Reply With Quote
 
 
 
 
Michel Rodriguez
Guest
Posts: n/a
 
      05-26-2006
(E-Mail Removed) wrote:
> I'm trying to read xml file and rewrite as RSS using following file.
> Problem is, it is not forcing UTF-8 no matter what I do. Any help
> appreciated.
>
> ***********************
> #!/bin/perl -w
> #use strict;
> use XML::Twig;
> use utf8;
>
> use open OUT => ":utf8";
> use open IN => ":utf8";
>
> my $shownum = 10;
> my $thisyear = '2006';
> my $field= 'releasedate';
> my $twig= new XML::Twig( keep_encoding=> 1);
>
> open(INFILE, "directorylist.xml");
> $twig->parse(\*INFILE);
>
> my $root= $twig->root;
> my @releases= $root->children;
>
> my $output = "";
>
> $output .= '<rss version="2.0"
> xmlns:dc="http://purl.org/dc/elements/1.1/">' . "\n";
> $output .= '<channel>' . "\n\n";
> $output .= <<EOT;
> <title>scrubbed Incorporated - Recent News</title>
> <link>http://www.scrubbed.com/press/</link>
> <description>Visit the scrubbed Press Center where you will find
> many resources, including press releases, corporate information,
> technology overviews, executive bios and photos, the scrubbed logo and
> more.<br />If you are a member of the media and are not able to find
> what you are looking for in the Press Center, please send an email to
> corpcomm\@scrubbed.com.</description>
> <language>en-us</language>
>
> EOT
>
> for(my $i=0; $i < $shownum; $i++){
> $output .= "\t" . '<item>' . "\n";
> $output .= "\t\t" . '<title>' .
> $releases[$i]->first_child('headline')->text . '</title>' . "\n";
> $output .= "\t\t" . '<link>http://www.scrubbed.com/press/releases/' .
> $thisyear . '/' . $releases[$i]->att('name') . '.html</link>' . "\n";
> $output .= "\t\t" . '<description>' .
> $releases[$i]->first_child('subheader')->text . '</description>' .
> "\n";
> $output .= "\t\t" . '<dc:date>' .
> $releases[$i]->first_child('releasedate')->text . '</dc:date>' . "\n";
> $output .= "\t" . '</item>';
> $output .= "\n\n";
> }
>
> $output .= "</channel>\n</rss>";
> Encode::_utf8_on($output);
>
> open(FILEWRITE,">:utf8", "press.rss");
> binmode FILEWRITE, ":utf8";
> print FILEWRITE $output;


Whaouh! You sure want to make sure you get UTF-8 on output! Except of
course that the keep_encoding option tells XML::Twig not output the same
encoding as you got in the input (which you did not show us as
mentionned by the previous poster).

If you want to output utf-8, the best way is NOT to do anything: by
default the parser will convert anything into utf-8, and the output will
be in that encoding.

Did you try your code without the various utf8-related instructions
peppered though it? What was the result?

--
mirod
 
Reply With Quote
 
miletwo@gmail.com
Guest
Posts: n/a
 
      05-26-2006
Here's directorylist.xml. I'm on MacOSX but also tried running this on
my Solaris box and it does the same thing. I've also tried it with and
without keep_encoding, so don't "think" that's it.

Thanks for replies.
<?xml version="1.0" encoding="UTF-8"?>
<directory>
<file name="060525_brings_custom_user">
<releasedate>05-25-2006</releasedate>
<releasetime>04:30 AM</releasetime>
<timezone>America/Los_Angeles</timezone>
<headline><![CDATA[XXSCRUBBEDXX Brings Custom User-Interface
Capabilities to U.S. Cellular's easyedgeSM with the uiOne
Solution]]></headline>
<subheader><![CDATA[]]></subheader>
<division>Corp, QIS</division>
<categories></categories>
<document></document>
<exclude></exclude>
</file>
<file name="060524_initiates_patent_infringement">
<releasedate>05-24-2006</releasedate>
<releasetime>04:30 AM</releasetime>
<timezone>America/Los_Angeles</timezone>
<headline><![CDATA[XXSCRUBBEDXX Initiates Patent Infringement
Proceedings in the UK against Nokia]]></headline>
<subheader><![CDATA[]]></subheader>
<division>Corp</division>
<categories></categories>
<document></document>
<exclude></exclude>
</file>
<file name="060518_takes_XXSCRUBBEDXX_2006">
<releasedate>05-18-2006</releasedate>
<releasetime>04:30 AM</releasetime>
<timezone>America/Los_Angeles</timezone>
<headline><![CDATA[XXSCRUBBEDXX Takes XXSCRUBBEDXX 2006 to the
Next Level with Addition of Telecom Italia and XXSCRUBBEDXX to an
Already Impressive XXSCRUBBEDXX 2006 Conference Agenda]]></headline>
<subheader><![CDATA[Premiere Players in the Industry Showcase
Advanced Data Capabilities at XXSCRUBBEDXX 2006 Conference in San Diego
May 31-June 2]]></subheader>
<division>Corp, QIS</division>
<categories></categories>
<document></document>
<exclude></exclude>
</file>
<file name="060518_averitt_selects_omnitracs">
<releasedate>05-18-2006</releasedate>
<releasetime>04:30 AM</releasetime>
<timezone>America/Los_Angeles</timezone>
<headline><![CDATA[AVERITT Selects XXSCRUBBEDXX's OmniTRACS
and OmniExpress Mobile Communication Systems for Entire Fleet and
Service Centers]]></headline>
<subheader><![CDATA[Leading Freight and Supply Chain Management
Provider with International Reach One of First to Implement End-to-End
Solution for Improved Fleet Communications]]></subheader>
<division>Corp, QWBS</division>
<categories></categories>
<document></document>
<exclude></exclude>
</file>
<file name="060517_clears_up_misunderstandings">
<releasedate>05-17-2006</releasedate>
<releasetime>12:36 PM</releasetime>
<timezone>America/Los_Angeles</timezone>
<headline><![CDATA[XXSCRUBBEDXX Clears Up Misunderstandings
Regarding the ITC Staff Attorney Briefing]]></headline>
<subheader><![CDATA[]]></subheader>
<division>Corp</division>
<categories></categories>
<document></document>
<exclude></exclude>
</file>
<file name="060512_hospital_democratic_republic">
<releasedate>05-12-2006</releasedate>
<releasetime>04:30 AM</releasetime>
<timezone>America/Los_Angeles</timezone>
<headline><![CDATA[Hospital in the Democratic Republic of Congo to
Be Outfitted with CDMA2000 1xEV-DO to Help Improve Healthcare in
Africa]]></headline>
<subheader><![CDATA[XXSCRUBBEDXX Pledges Donation and Technology
to the Dikembe Mutombo Foundation, First Hospital Built in the Congo in
Nearly 40 Years]]></subheader>
<division>Corp</division>
<categories></categories>
<document></document>
<exclude></exclude>
</file>
<file name="060509_british_sky_broadcasting">
<releasedate>05-09-2006</releasedate>
<releasetime>04:30 AM</releasetime>
<timezone>America/Los_Angeles</timezone>
<headline><![CDATA[XXSCRUBBEDXX and British Sky Broadcasting
Announce Intent to Conduct XXSCRUBBEDXX Technology Trial in United
Kingdom]]></headline>
<subheader><![CDATA[Joint Exercise Expected to be Europe's First
Technical Trial of Open, Network-Agnostic FLO Technology]]></subheader>
<division>Corp</division>
<categories></categories>
<document></document>
<exclude></exclude>
</file>
<file name="060509_application_downloads_XXSCRUBBEDXX">
<releasedate>05-09-2006</releasedate>
<releasetime>04:30 AM</releasetime>
<timezone>America/Los_Angeles</timezone>
<headline><![CDATA[Application Downloads with XXSCRUBBEDXX's
XXSCRUBBEDXX Solution Surpass Three Million in Thailand on Hutch's
Advanced CDMA2000 1X Network]]></headline>
<subheader><![CDATA[Active Hutchison CAT Customers Have Downloaded
an Average of 10 Applications Each Since XXSCRUBBEDXX Launched, Numbers
Continue to Grow]]></subheader>
<division>Corp, QIS</division>
<categories></categories>
<document></document>
<exclude></exclude>
</file>
</directory>

 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      05-26-2006
(E-Mail Removed) wrote:
> Here's directorylist.xml. I'm on MacOSX but also tried running this on
> my Solaris box and it does the same thing. I've also tried it with and
> without keep_encoding, so don't "think" that's it.


This file contains only 8 <file/> elements. Your script crashes with

Can't call method "first_child" on an undefined value at ./miletwo line 40.

if there are less than 10 children of the root element, before it even
opens the output file. So with this file, your script doesn't write
anything. How do you determine whether a non-existent file is UTF-8 or
not?

hp

--
_ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
|_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
| | | (E-Mail Removed) | würde.
__/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Writing a program to create a wireless connection. =?Utf-8?B?bWJvd3llckBwdXJkdWUuZWR1?= Wireless Networking 1 10-05-2005 12:59 PM
writing email problems? Bri Firefox 1 05-12-2005 03:03 AM
Any problems with writing the information into a file - Multi-users perform writing the same file at the same time ???? HNguyen ASP .Net 4 12-21-2004 01:53 PM
Unhandled exception in FileStream when writing to a full disk - bug in framework? Amit ASP .Net 8 08-04-2003 03:34 PM
A failure occurred writing to the resources file. Access is denied. -- RESX file is locked? -- WHY? Mark Kamoski ASP .Net 1 07-04-2003 12:02 PM



Advertisments