Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > loosing data while parsing xml with expat

Thread Tools

loosing data while parsing xml with expat

Fabian Kr?ger
Posts: n/a

I got a weird problem and need your help and ideas...

I´ve written an php application which imports data in XML format and
writes this data to a MySQL database to have a faster access.

The application uses Expat 1.95.7 via php to render the xml data.

First everything seemed to work fine. But now I noticed that something
goes wrong:

If the ammount of XML data is larger than used for testing the
application, we´re talking about something between 2 and 4 MB, some
data gets lost.

If the structure of the file doesn´t change the lost data is always
the same.

But if I change the structure of the File e.g. by adding a line
somewhere the problem occures on another place.

For Example:





<EventName>Martin Schneider Karben</EventName>


<Type>Keine Veranstaltungsart</Type>




Let´s assume that "Mar" of the data between the <EventName> Tags gets
lost and we get "tin Schneider Karben".

When I insert a Line above the <event> block the "t" from "tin" gets
also lost, so we have "in Schneider Karben".

Why ?

I also tried to dynamically generate parts of the xml data with php:

//--------------- CODE
// num of datasets
$datasets = 2000;
// build the xml string
$str .= '<?xml version="1.0" encoding="ISO-8859-1"?><program
for($i=0; $i<$datasets; $i++){
$str .= '<event>
<Type>Keine Veranstaltungsart</Type>
<ShowPage href="32160001.jsp">TPP Gutscheine</ShowPage>
<block number="0">
$str .= "</program>";
// write the data to file
$fp = fopen("../DATA/elektra.xml","w");
fputs($fp, $str);
//--------------- CODE END

with this generated file NUM1644 becomes 1644 and NUM1195 becomes 5.
All other data is parsed correctly ?!?!

Here the Code of the two Classes used for parsing and importing:

//--------------- CODE
require_once "DB.php";

class ElektraImporter
var $FileHash;
var $DAO;
var $XMLDataFile;

function ElektraImporter(){
$this->XMLDataFile = Config::getAttribute("Config/Config_Base",

$DB = DB::connect(Config::getAttribute("Config/Config_Base",
$this->DAO = Loader::buildObject("XML/ElektraDAO", null, $DB);
* checks for changes on the elektra xml data.
* If there are changes the database will be refreshed
function checkForUpdate(){
/* if there are changes */
/* read the file and update the database */
} else {
/* everything is o.k. */
* parse the xml file and get the needed data
* @return array $data
function _getElektraData(){
$Parser = &Loader::buildObject("XML/ElektraParser", null,
if( PEAR::isError($Parser) ){
die (PEAR::errorMsg($Parser));
if(PEAR::isError($Parser)){ die($Parser->getMessage()); }

$data = $Parser->getXMLData();

$data['filehash'] = md5_file($this->XMLDataFile);

return $data;
* checks if the file has changed
* @return boolean
function _hasElektraFileChanged($filehash = ""){
$this->FileHash = md5_file($this->XMLDataFile);

if($filehash == $this->FileHash){
return false;
} else {
return true;
//--------------- CODE END

The Parser Class extending the PEAR::XML_Parser

//--------------- CODE
require_once "XML/Parser.php";

class ElektraParser extends XML_Parser
var $XMLData;
var $EventNo;
var $EventName;
var $LastEventNo;
var $ActualEventNo;
var $EventCnt = 0;
var $ShowCnt = 0;

function ElektraParser(&$arr){
$this->XMLData = &$arr;
$this->XML_Parser("ISO-8859-1", "event", "ISO-8859-1");

function startHandler($xp, $element, $attribs) {
$this->Element = $element;
$this->Attribs = $attribs;

function endHandler($xp, $element) {
if ( $element == "EVENT" ){
/* increase event counter */
/* set show counter to 0 */
$this->ShowCnt = 0;
elseif ( $element == "SHOW" ){
/* increase show counter for the next show */
$this->ShowCnt ++;
$this->Element = "";

function cdataHandler($xp, $cdata) {
if($this->Element == "DATE"){
$this->XMLData['creationdate'] = $cdata;
elseif($this->Element == "TIME"){
$this->XMLData['creationtime'] = $cdata;
/* every event has a sysid the sysid and the eventno make the unique
eventid */
elseif($this->Element == "SYSID"){
$this->XMLData['event'][$this->EventCnt]['sysid'] = $cdata;
elseif($this->Element == "CLIENTID"){
$this->XMLData['event'][$this->EventCnt]['clientid'] = $cdata;
elseif($this->Element == "EVENTNO"){
$this->XMLData['event'][$this->EventCnt]['eventno'] = $cdata;
elseif($this->Element == "EVENTNAME"){
$this->XMLData['event'][$this->EventCnt]['eventname'] = $cdata;
elseif($this->Element == "NAME"){
$this->XMLData['event'][$this->EventCnt]['location'] = $cdata;
elseif($this->Element == "CITY"){
$this->XMLData['event'][$this->EventCnt]['city'] = $cdata;

/* eventgroups */
/* get the position of the first occurence of the city in the
eventname */
$pos = strpos($this->XMLData['event'][$this->EventCnt]['eventname'],
/* if there´s the city in the name */
if( $pos ){
$this->XMLData['event'][$this->EventCnt]['group'] =
trim(substr($this->XMLData['event'][$this->EventCnt]['eventname'], 0,
/* otherwise we take the whole eventname as group */
else {
$this->XMLData['event'][$this->EventCnt]['group'] =
/* get the shows */
elseif($this->Element == "SHOWNO") {
= $cdata;
elseif($this->Element == "SHOWDATE") {
= $cdata;
elseif($this->Element == "SHOWTIME") {
= $cdata;
elseif($this->Element == "SHOWPAGE"){
= $this->Attribs['HREF'];
function defaultHandler($xp, $cdata) {

function &getXMLData(){
$p = $this->parse();
if(PEAR::isError($p)){ die($p->getMessage()); }
return $this->XMLData;
//--------------- CODE END

This Problem is real bad because eventIDs have been stripped as well
and then my SQL Statements didn´t work anymore !!!

I have no idea what the reason is or even might be =(
a bug in Expat ?!? ... i can´t really believe
bad formatted XML ? ... not really !?!
problems with expats memory management ?!?
or just my fault? ... where ?

But it seems that the problem is coupled to the format of the xml
If i take out linebreaks or add lines the error occures on other
places !?!
But the same structure always produces the same errors ?!?

My XML skills are not that good so I would be very pleased if you have
an idea or an advice for me.

Thanks for your advice.

With best regards

Fabian Krüger
Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
XML parsing: SAX/expat & yield kj Python 2 08-04-2010 09:49 PM
Parsing XML: SAX, DOM, Expat, or Something Else? aha Python 2 01-23-2009 07:38 PM
Want help on how we convert output to tabular format Using the expat parser ( i have to parse the following xml file and print it on the screen in tabular format. sharan XML 1 10-26-2007 01:20 PM
Using the expat parser ( i have to parse the following xml file and print it on the screen in tabular format. Want a c program on that! sharan XML 1 10-26-2007 07:56 AM
parsing XML with 'expat' Bjoern Hoehrmann XML 2 08-20-2007 09:09 PM