Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > RegEx Help Needed

Reply
Thread Tools

RegEx Help Needed

 
 
Jürgen Exner
Guest
Posts: n/a
 
      12-04-2004
DeepDiver wrote:
[About parsing HTML]
> "Sherm Pendley" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>>
>> Have a look at HTML:arser on CPAN.
>>

>
> Thanks, but I'm in need of a pure RegEx solution.


Forget it. Nobody with a sane mind would try parsing HTML using pure REs.
Contrary to popular believe parsing HTML is non-trivial and while it is not
decided yet if Perl's advanced REs are powerful enough to do it, most
certainly it would be _way_ too complex to be of any real use.
As this has been discussed many times before please see the FAQ and Google
for further details .

jue


 
Reply With Quote
 
 
 
 
Chris Mattern
Guest
Posts: n/a
 
      12-04-2004
DeepDiver wrote:

> "Sherm Pendley" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>>
>> Have a look at HTML:arser on CPAN.
>>

>
> Thanks, but I'm in need of a pure RegEx solution.


No, you aren't. You may think you are, but you aren't.
--
Christopher Mattern

"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
 
Reply With Quote
 
 
 
 
Chris Mattern
Guest
Posts: n/a
 
      12-04-2004
DeepDiver wrote:

> "David H. Adler" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> On 2004-12-04, DeepDiver <(E-Mail Removed)> wrote:
>> > "Sherm Pendley" <(E-Mail Removed)> wrote in message
>> > news:(E-Mail Removed)...
>> >>
>> >> Have a look at HTML:arser on CPAN.
>> >>
>> >
>> > Thanks, but I'm in need of a pure RegEx solution.

>>
>> This of course raises the question: Why?

>
>
> A few reasons:
>
> 1. I'm not programming in Perl. In fact, my experience with Perl was a
> long time ago (and not very extensive even then). I came here because I
> believe that Perl programmers are generally the most proficient with
> regular expressions.


Regular expressions differ subtly but significantly between the languages
that implement them. Solutions formulated for Perl regular expressions
would have a good chance of not working in your language. Ask in a
forum that deals with your language.
>
> 2. I'm writing the current routine in C#. But I would still prefer a
> "pure" RegEx solution so that I have something that is concise and
> (higher-level) language independent.


See above about the portability of regular expressions.
>
> 3. I'm trying to improve my RegEx skills, so the more I can learn how to
> do things like this in RegEx (without "massaging" in a higher-level
> language) the better.


Regular expressions are a very poor tool for parsing HTML. Depending
on your task, using them to do so will range from hair-tearing frustrating
to simply impossible. Parsing HTML is not a trivial task. The main
lesson you would learn trying to parse HTML with regular expressions would
be, if you were paying attention, "don't parse HTML with regular
expressions".
>
> I hope this addresses your concerns.


Hope these address yours.
>
> Thanks,
> Michael


--
Christopher Mattern

"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
help needed with regex and unicode Pradnyesh Sawant Python 2 03-04-2008 07:43 AM
Help needed on this 857W config. Repost to be clearer what the problemsare and the help needed sparticle Cisco 3 08-30-2007 07:47 PM
Regex help needed rh0dium Python 8 01-11-2006 01:03 AM
Regex help needed Alvin Bruney - ASP.NET MVP ASP .Net 2 09-16-2005 06:29 PM



Advertisments