Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Hrs of work on regex: please help

Reply
Thread Tools

Hrs of work on regex: please help

 
 
Robert
Guest
Posts: n/a
 
      07-27-2004
After this message text is a pasted xml file I've been working
(wrestling) with.
The goal is to remove text from the file that begins with:
"<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".
I have done several other s/// type operations to this file to remove
other text parts, and it was no problem. I've heard the 'devil is in
the details' and I believe it now, hehe.

I have copy 'n pasted the text surrounding the target before and
after, and made a string of it in a simple Perl script. I had to use
single quotes, due to the numerous double quotes in the text. I used
the same s/// operation and it printed as I want! Wonderful, I
thought, now to do it on the file contents. But, it just will not do a
replace. It is getting beyond the point where I can think on this
problem without my brain feeling a spinning motion. I humbly submit my
problem for discussion.

My code follows:
#!/usr/bin/perl
my $results_dir = $ARGV[0];
my $expected_results_dir = "$results_dir/expectedresults";
my $cleaned_results_dir = "$results_dir/cleanedresults";
my $cleaned_expected_results_dir =
"$results_dir/expectedresults/cleanedexpectedresults";
my $cleaned_xml = "";
my $clean_file = "";
my $Line = "";
opendir(BIN, $results_dir) or die "Can't open directory: $dir: $!";
FILE_CLEAN: while( defined ($file = readdir BIN) )
{
next FILE_CLEAN if $file =~ /^\.\.?$/; # skip . and ..
next FILE_CLEAN if (-d "$results_dir/$file");# skip if it is
directory
open(To_Clean, "$results_dir/$file") or die "Can't open $To_Clean:
$!\n";
my @data = <To_Clean>; #read file contents
close(To_Clean); #close file
$clean_file = "$cleaned_results_dir/$file";
for (my $i = 0; $i < scalar(@data); ++$i) {
$Line = $data[$i];
#replace whitespaces at beginning and end with nothing
chomp $Line;
$Line =~ tr/\t/ /;
$Line =~ s/\t//g;
$Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;
$cleaned_xml = $cleaned_xml . $Line;
$Line = "";
};#END FOR
open(CLEANFILE, ">$clean_file") or die "Can't open $clean_file:
$!\n";
print CLEANFILE $cleaned_xml;
close(CLEANFILE);
$cleaned_xml = "";
};#END WHILE
print "...DONE\n";
closedir(BIN);
################################################## ##############################

<?xml version="1.0" encoding="UTF-8"?>
<ns0:BOBEntitlementRoot xmlns:ns0="http://www.noco.com/BOBEntitlement"
version="NA"><ns0:ApplicationArea><ns0:CreationDat eTime>2004-07-26T14:07:02.248-07:00</ns0:CreationDateTime><ns0:SourceSystem>HANDSHAKE</ns0:SourceSystem><ns0:Operation><ns0:Name>UnknownO peration</ns0:Name><ns0:Version>NA</ns0:Version></ns0:Operation></ns0:ApplicationArea><ns0ataArea><ns0:Status><ns0 :StatusCode>Failure</ns0:StatusCode><ns0:Error><ns0:ErrorCode>2101</ns0:ErrorCode><ns0:ErrorSever
ty>Error</ns0:ErrorSeverity><ns0:ErrorCategory>InputFormatEr ror</ns0:ErrorCategory><ns0:ErrorDescription>Invalid
XML request. </ns0:ErrorDescription><ns0:ErrorDetails>Job-4296 Error
in [Processes/Integration_Interfaces/getEntitlement/getBHAPIJMSRequest_1.process/Group
(1)/Group/Parse XML]
Output data invalid
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:501)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:42
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher. java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatche r.java:21
caused by: org.xml.sax.SAXException: validation error: unexpected
content "{http://www.noco.com/BOBEntitlement}Sku"; expected
"{http://www.noco.com/BOBEntitlement}Name" or
"{http://www.noco.com/BOBEntitlement}Description" or
"{http://www.noco.com/BOBEntitlement}DomainType" or
"{http://www.noco.com/BOBEntitlement}PropertyTypeStatus" or
"{http://www.noco.com/BOBEntitlement}ChangeDate" or
"{http://www.noco.com/BOBEntitlement}DefaultValue" or
"{http://www.noco.com/BOBEntitlement}UsageType"
({com.tibco.xml.validation}COMPLEX_E_UNEXPECTED_CO NTENT) at
/BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]/Sku[1]
java.lang.Exception: unexpected content
"{http://www.noco.com/BOBEntitlement}Sku"; expected
"{http://www.noco.com/BOBEntitlement}Name" or
"{http://www.noco.com/BOBEntitlement}Description" or
"{http://www.noco.com/BOBEntitlement}DomainType" or
"{http://www.noco.com/BOBEntitlement}PropertyTypeStatus" or
"{http://www.noco.com/BOBEntitlement}ChangeDate" or
"{http://www.noco.com/BOBEntitlement}DefaultValue" or
"{http://www.noco.com/BOBEntitlement}UsageType"
at com.tibco.xml.validation.helpers.d.a(XmlContentVal idatorElementContext.java:34
at com.tibco.xml.validation.helpers.h.if(XmlContentVa lidator.java:753)
at com.tibco.xml.validation.helpers.h.text(XmlContent Validator.java:1601)
at com.tibco.xml.datamodel.nodes.Text.content(Text.ja va:327)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Document.content(Doc ument.java:226)
at com.tibco.xml.datamodel.nodes.Document.serialize(D ocument.java:242)
at com.tibco.xml.xdata.bind.BindingRunner.validate(Bi ndingRunner.java:302)
at com.tibco.xml.xdata.bind.OutputBindingRunner.valid ate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:42
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher. java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatche r.java:21
validation error: no declaration for element
"{http://www.noco.com/BOBEntitlement}Sku"
({com.tibco.xml.validation}COMPLEX_E_MISSING_ELEME NT_DECLARATION) at
/BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]/Sku[1]
java.lang.Exception: no declaration for element
"{http://www.noco.com/BOBEntitlement}Sku"
at com.tibco.xml.validation.helpers.d.if(XmlContentVa lidatorElementContext.java:615)
at com.tibco.xml.validation.helpers.d.a(XmlContentVal idatorElementContext.java:180)
at com.tibco.xml.validation.helpers.h.if(XmlContentVa lidator.java:81
at com.tibco.xml.validation.helpers.h.text(XmlContent Validator.java:1601)
at com.tibco.xml.datamodel.nodes.Text.content(Text.ja va:327)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Document.content(Doc ument.java:226)
at com.tibco.xml.datamodel.nodes.Document.serialize(D ocument.java:242)
at com.tibco.xml.xdata.bind.BindingRunner.validate(Bi ndingRunner.java:302)
at com.tibco.xml.xdata.bind.OutputBindingRunner.valid ate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:42
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher. java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatche r.java:21
validation error: unexpected end of content
({com.tibco.xml.validation}COMPLEX_E_UNEXPECTED_EN D_OF_CONTENT) at
/BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]
java.lang.Exception: unexpected end of content
at com.tibco.xml.validation.helpers.d.case(XmlContent ValidatorElementContext.java:414)
at com.tibco.xml.validation.helpers.h.a(XmlContentVal idator.java:1182)
at com.tibco.xml.validation.helpers.h.endElement(XmlC ontentValidator.java:1034)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:110
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Elem ent.java:1101)
at com.tibco.xml.datamodel.nodes.Document.content(Doc ument.java:226)
at com.tibco.xml.datamodel.nodes.Document.serialize(D ocument.java:242)
at com.tibco.xml.xdata.bind.BindingRunner.validate(Bi ndingRunner.java:302)
at com.tibco.xml.xdata.bind.OutputBindingRunner.valid ate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:42
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher. java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatche r.java:21

at com.tibco.xml.xdata.bind.BindingRemarkHandler.asse rtNoErrors(BindingRemarkHandler.java:43)
at com.tibco.xml.xdata.bind.BindingRunner.validate(Bi ndingRunner.java:319)
at com.tibco.xml.xdata.bind.OutputBindingRunner.valid ate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:42
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher. java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatche r.java:21
</ns0:ErrorDetails></ns0:Error></ns0:Status></ns0ataArea></ns0:BOBEntitlementRoot>
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      07-27-2004
Robert wrote:
> The goal is to remove text from the file that begins with:
> "<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".


Hmm.. Far too much code for my taste.

<snip>

> my @data = <To_Clean>; #read file contents


Here you slurp the file into an array, where each line is a separate
element.

<snip>

> for (my $i = 0; $i < scalar(@data); ++$i) {


Here you start various operations for each line.

<snip>

> $Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;


Since the start and end tags appear on different lines, that pattern
will never match.

Try slurping the file into a scalar variable instead, and add the /s
modifier to the s/// operator.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
 
Robert
Guest
Posts: n/a
 
      07-27-2004
Thanks for the reply. Just to close the loop, what I ended up doing
was using the join function on the @data variable. I then used the
tr/// function to replace tabs and newlines with a space char. Now,
everything is set for the substituion, and the resulting files are
still able to be viewed as xml!

The main thing I have learned is when I spend more than an hour on a
problem, look at it from a different direction.

Thanks, again.
Gunnar Hjalmarsson <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> Robert wrote:
> > The goal is to remove text from the file that begins with:
> > "<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".

>
> Hmm.. Far too much code for my taste.
>
> <snip>
>
> > my @data = <To_Clean>; #read file contents

>
> Here you slurp the file into an array, where each line is a separate
> element.
>
> <snip>
>
> > for (my $i = 0; $i < scalar(@data); ++$i) {

>
> Here you start various operations for each line.
>
> <snip>
>
> > $Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;

>
> Since the start and end tags appear on different lines, that pattern
> will never match.
>
> Try slurping the file into a scalar variable instead, and add the /s
> modifier to the s/// operator.

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      07-29-2004
Jim Gibson wrote:
> Robert wrote:
>>
>> my @data = <To_Clean>; #read file contents

>
> As Gunnar pointed out, you probably want to replace this with 'my
> $data = <To_Clean>;'


That must be combined with enabling "slurp" mode:

local $/;

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      07-29-2004
Robert wrote:
> Gunnar Hjalmarsson wrote:
>> Robert wrote:
>>> The goal is to remove text from the file that begins with:
>>> "<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".

>>
>> Try slurping the file into a scalar variable instead, and add the
>> /s modifier to the s/// operator.

>
> Thanks for the reply. Just to close the loop, what I ended up doing
> was using the join function on the @data variable.


You could have skipped the @data array by just doing:

my $data = do { local $/; <To_Clean> };

> I then used the tr/// function to replace tabs and newlines with a
> space char.


Why? I suspect that the reason is that you are unfamiliar with the /s
modifier. Read about it in "perldoc perlre".

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Last Hrs! Widescreen COLOSSUS:FORBIN PROJECT laserdisc, Sealed DEEP RISING dvd, GOOD, BAD,UGLY SE dvd J Rusnak DVD Video 0 02-19-2006 04:18 PM
Wireless connection drops every 24 hrs =?Utf-8?B?SG9seWtub3dl?= Wireless Networking 7 01-06-2006 07:43 PM
FA: 4 hrs to go! Spinal Tap Criterion DVD, Serenity/Firefly promo DVD kickitt@yahoo.com DVD Video 0 09-30-2005 04:15 AM
Wish to retire early, wish to work 1-2 hrs a day, you CAN if you spend 3 minutes to visit Lucky man DVD Video 0 08-07-2005 04:14 PM
G3 thoughts after 6 hrs.,from a G2 owner Ender W. Digital Photography 0 08-13-2003 01:53 AM



Advertisments