Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regexp for parsing image filenames out of html code

Reply
Thread Tools

regexp for parsing image filenames out of html code

 
 
Georg Daniel Vassilopulos
Guest
Posts: n/a
 
      08-30-2003
Hello!

I have a lot of html files and I would like to get all image filenames.
The problem is it is not always valid xml.
So I have to use regexps.

The imagetags can be following format:
<img src="/images/pic1.png">
or
<img src='/images/pic1.png'>

the images can be *.png *.gif *.jpg *.bmp

What is the regexp of choice?

Can anyone help?

Thanks a lot!
Georg

http://www.velocityreviews.com/forums/(E-Mail Removed)



 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      08-30-2003
Georg Daniel Vassilopulos <(E-Mail Removed)> wrote:

> I have a lot of html files and I would like to get all image filenames.

^^^^
> The problem is it is not always valid xml.

^^^

So which is it, HTML or XML?


> So I have to use regexps.



Then it will work correctly sometimes and not work correctly sometimes...


> The imagetags can be following format:
><img src="/images/pic1.png">
> or
><img src='/images/pic1.png'>



Those look like valid HTML and valid XML, what is invalide about
your *ML?

These are also valid *ML:

<img src = "/images/pic1.png">

<img
src
=
"/images/pic1.png"
>



> What is the regexp of choice?



There is never a regex of choice for a job not suited for regexes
in the first place.


m/<img src=("[^"]+"|'[^']+')/g


--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
 
 
 
Alan J. Flavell
Guest
Posts: n/a
 
      08-30-2003
On Sat, Aug 30, Tad McClellan inscribed on the eternal scroll:

> Georg Daniel Vassilopulos <(E-Mail Removed)> wrote:
>
> > I have a lot of html files and I would like to get all image filenames.

> ^^^^
> > The problem is it is not always valid xml.

> ^^^
>
> So which is it, HTML or XML?


Or maybe XHTML...

> > So I have to use regexps.


Can we say "petitio principii"? It used to be called "begging the
question" in English, until that phrase was rendered worthless by
folks who didn't know that it meant...

> Then it will work correctly sometimes and not work correctly sometimes...


But isn't that inevitable if you propose to parse material which is
allowed to contain errors? OT but: if you're doing that with
XML-based markup, then you're already in a state if sin.

> > The imagetags can be following format:
> ><img src="/images/pic1.png">
> > or
> ><img src='/images/pic1.png'>

>
> Those look like valid HTML and valid XML,


OK; but they're not, however, acceptable as XHTML. (Have to be
<img ... /> )

> There is never a regex of choice for a job not suited for regexes
> in the first place.


Seems a fair enough comment to me.

But if you're hoping (or "if one's hoping") to recover from syntax
errors - and if one's entitled to assume the much more restrictive
syntax of XML (rather than the bizzare backwaters of SGML), I'm not
sure what better approach to recommend. XML-conforming software is
mandated to deliver an error report and bale out when errors are
encountered, surely? So then what...?

all the best

--
>> Es handelt sich also um ein Zuklappmenu.

> Mag sein. Aber ich seh da _gar kein_ Menue.

Weil es zugeklappt ist.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Regex parsing against filenames Peter Bailey Ruby 3 10-13-2006 07:02 PM
problem with filenames, Filenames and FILENAMES B.J. HTML 4 04-23-2005 08:13 PM
[Twisted] how to find out filenames of uploaded files? Frantisek Fuka Python 2 02-16-2004 09:55 PM
Re: regexp and filenames hokiegal99 Python 1 07-09-2003 02:31 PM



Advertisments