Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > My SAX Parser, regexp style. Cut & paste version .901

Reply
Thread Tools

My SAX Parser, regexp style. Cut & paste version .901

 
 
robic0
Guest
Posts: n/a
 
      01-08-2006
Since so much was learned on the substitution method, thought
this might be a better approach.
This is just the starting framework. The rest will be filled in.
Turn off the debug output for full speed.

Un-wrap the regexp if it is, before using.

print <<EOM;

# -----------------------
# XML (Regex) SAX Parser
# Version .901 - 1/7/06
# Copyright 2005,2006
# by robic0-At-yahoo.com
# -----------------------

EOM

use strict;
use warnings;

open DATA, "config.html" or die "can't open config.html...";
my $gabage1 = join ('', <DATA>);
close DATA;

my ($cnt, $content, $show_pos, $debug) = (1, '', 1, 1);

# master
#/(?:<\?(.*?)\?>)|(?:<META(.*?)>)|(?:<!DOCTYPE(.*?)> )|(?:<!\[CDATA\[(.*?)\]\]>)|(?:<!--(.*?)-->)|(?:<(\/*[\:0-9a-zA-Z]+?[\s]*\/*)>)|(?:<([\:0-9a-zA-Z]+?)[\s]+((?:[\:0-9a-zA-Z]+[\s]*=[\s]*["'][^<]*['"])+[\s]*\/*)>)|(.+?)/sg)
# 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8
8 9 9

while ($gabage1 =~
/(?:<\?(.*?)\?>)|(?:<META(.*?)>)|(?:<!DOCTYPE(.*?)> )|(?:<!\[CDATA\[(.*?)\]\]>)|(?:<!--(.*?)-->)|(?:<(\/*[\:0-9a-zA-Z]+?[\s]*\/*)>)|(?:<([\:0-9a-zA-Z]+?)[\s]+((?:[\:0-9a-zA-Z]+[\s]*=[\s]*["'][^<]*['"])+[\s]*\/*)>)|(.+?)/sg)
# 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8
8 9 9
{
if (defined $9) { $content .= $9; next; }
print "-"x20,"\n" if ($debug);
if (length ($content)) {
print "9 $content\n" if ($debug);
$content = '';
}
if ($show_pos) {
my $rr = pos $gabage1;
print "$rr ";
}
print "1 VERSION: $1\n" if ($debug && defined $1);
print "2 META: $2\n" if ($debug && defined $2);
print "3 DOCTYPE: $3\n" if ($debug && defined $3);
print "4 CDATA: $4\n" if ($debug && defined $4);
print "3 COMMENT: $5\n" if ($debug && defined $5);
## <tag> or </tag> or <tag/>
print "6 TAG: $6\n" if ($debug && defined $6);
## <tag attrib/> or <tag attrib>
print "7,8 TAG: $7 Attr: $8\n" if ($debug && defined $7);
$cnt++;
}

__END__




 
Reply With Quote
 
 
 
 
robic0
Guest
Posts: n/a
 
      01-15-2006
On Sat, 07 Jan 2006 17:31:46 -0800, robic0 wrote:

BIG things on the way...
 
Reply With Quote
 
 
 
 
robic0
Guest
Posts: n/a
 
      01-22-2006
I'm about to finish this thing. Its mostly modeled after Expat.
Its all perl, mine is faster parsing about 1 meg a second.
Its also complient will current xml standards on w3c.org.
There's so much to it, I don't think I want to post it here.
I would like to make it into a "free" module on cpan or Active States
release version.

I think its commercial level. The fact is I can "interject" special
searches and handling if I want to. It is designed using the specs
from here:

http://www.w3.org/TR/xml11/#NT-AttValue

Its version 1.1 If I'm using the wron specs, please let me know.
Its awsome, tremendously fast.
I am going to also write a full featured "schema checker" using this
base parser. I've never seen something so easy as schema checking.
Thinking beyond I will move into modification tools. Even style sheet
mods (i think, its all too easy now). I will do it all in markup.
The code is about 600 lines now. I could plop it down here. I have
all constructs covered in the above 1.1 specs. I'm worried a little
about encoding and unicode. By an large, I've never seen anything
so easy in my life. I fear that my code is approacing a proffessional
level and I may "not" want to just plop it down here.

I may want to contact AS or Cpan to post the module so its not ripped
off. However, I know I could do a schema checker in a week. Since its
all so easy now, I'm wondering if I can make any money at this or is it
all just a give-away...

Oh well, from a homeless man to a middle class man, I know it won't be
that much. However, I have developed tools that could do conversions.
Yea sure I want to put my stuff in the public domain, but the internals
I do with them could do fast custom conversions.

What do you think? Say it now, if it ends up in AS or Cpan you won't have
the option to reccommend. It will arrive there, but whats the money behind
hard core conversions, style, schema, filters, anything?

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Re: Where to get stand alone Dot Net Framework version 1.1, version2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? MowGreen [MVP] ASP .Net 5 02-09-2008 01:55 AM
Re: Where to get stand alone Dot Net Framework version 1.1, version 2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? PA Bear [MS MVP] ASP .Net 0 02-05-2008 03:28 AM
Re: Where to get stand alone Dot Net Framework version 1.1, version 2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? V Green ASP .Net 0 02-05-2008 02:45 AM
My Regexp XML Parser -> Structured Perl Data, Cut & Paste Version, No Module's (Vol I) robic0 Perl Misc 43 01-06-2006 06:04 AM



Advertisments