Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Regex: match double OR single quote

Reply
Thread Tools

Regex: match double OR single quote

 
 
Jason C
Guest
Posts: n/a
 
      07-12-2012
I'm struggling with what I thought was a simple thing, and I'm hoping you guys can help.

I have a string that may contain a ", ', or neither. So, I wrote this in the regex:

["|']*

But this doesn't match anything.

Here's the complete code:

# $text comes from a form, so this is just a sample
$text = <<EOF;
<img src="<a href='http://www.example.com/whatever.jpg'
target='_new'>
http://www.example.com/whatever.jpg</a>"
width="300" height="300" border="0">
EOF

# Regex; line breaks added here for the sake of reading
$text =~ s/<img(.*?)src=
["|']*\s*<a.*? href=
["|']*\s*(.*?)
["|']*.*?>(.*?)<\/a>
["|']*(.*?)>
/<img src="$2"$1$4>/gsi;

If I change ["|']* to whatever I have hard coded, then it works fine, so I know the issue is with that pattern. So how do I correctly match them?
 
Reply With Quote
 
 
 
 
Jason C
Guest
Posts: n/a
 
      07-13-2012
On Thursday, July 12, 2012 7:46:04 PM UTC-4, Ben Morrow wrote:
<snip>
> In general, .*? is not a panacea in situations like this. You would
> probably be better off using negated character classes, something like
>
> $text =~ s{
> &lt;img ([^&gt;]*) src=[&quot;&#39;] \s*
> &lt;a [^&gt;]* [ ] href=[&quot;&#39;] \s* ([^&#39;&quot;]*) [&quot;&#39;] [^&gt;]* &gt;
> ([^&lt;]*) &lt;/a&gt;
> [&#39;&quot;] ([^&gt;]*) &gt;
> }{&lt;img src=&quot;$2&quot;$1$4&gt;}gsix;
>
> (I&#39;ve used /x to format it decently, which means the literal space needs
> to be escaped somehow. I usually prefer putting it in a character class
> to backslashing it, though either would work.)
>
> Here each negated character class stops the match running off past the
> next thing, so for instance $2 can&#39;t run past the end of the quotes.
> This isn&#39;t perfect: it will not match at all if there are other tags
> inside the &lt;a&gt;, and it&#39;s not terribly easy to modify it so it will.
> (While it is possible to correctly match arbitrary HTML with Perl
> regexes, it isn&#39;t entirely straightforward.)
>
> Ben


Perfect! I actually did mean for the " or ' to be optional, though (it's possible to have references without a quote), so I had to add the * back in, but the idea of negated characters was exactly what I needed.

For the sake of my own knowledge, does the pattern:

/img([^>])src/

translate to "img, not followed by a >, and followed by src", or "img, followed by anything except a >, and followed by src"?
 
Reply With Quote
 
 
 
 
ccc31807
Guest
Posts: n/a
 
      07-16-2012
On Jul 12, 6:12*pm, Jason C <(E-Mail Removed)> wrote:
> I'm struggling with what I thought was a simple thing, and I'm hoping youguys can help.
>
> I have a string that may contain a ", ', or neither. So, I wrote this in the regex:


If you process CSV files, this can get real hairy. CSV files can
contain one or more double quotes, one or more single quotes, pairs of
double and/or single quotes, and commas embedded within quotation
marks. The best help, and one that I strongly recommend to you, is to
examine the Perl source for one or more of the CSV modules. The
contain regular expressions for dis-entangling CSV strings, and trying
to understand how they work will strengthen your RE chops.

I normally follow two strategies when faced with this situation.
First, is to replace all non-delimiting or non-qulaifying quotation
marks with some unusual character that's unlikely to appear in the
string, such as
s/["']/#/g
and then later, after I've processed the string, reverse the change
like this
s\s/#/'/g
which converts all the quotations to single quotes, which may or may
not work for you (it normally works for me).

Or, I escape the quotations with either single or double backslashes,
depending on whatever subsequent processing you plan to do, like this
s/(["'])/\$1/g
This has the advantage of preserving the kinds of quotes.

I'm posting from memory so the above might have errors, but you
understand the idea.

In practice, I find that single quotes turn up in the oddest places,
where you would never expect them. For this reason, when I process a
string, out of pure defensiveness, I usually escape quotes (as well as
some other potentially trouble makers).

CC

 
Reply With Quote
 
Jason C
Guest
Posts: n/a
 
      07-18-2012
On Friday, July 13, 2012 5:49:03 AM UTC-4, Ben Morrow wrote:
> &gt; &gt; $text =~ s{
> &gt; &gt; &amp;lt;img ([^&amp;gt;]*) src=[&amp;quot;&amp;#39;] \s*
> ^^^^ ^^^^ ^^^^^^^^^^^
> If you&#39;re going to be posting to programming newsgroups you need to find
> a way to stop that from happening. Dropping Google in favour of a real
> newsreader might be a good start.


Blech, why did Google start doing that?? I really don't use NG's that often, but those substitutions sure make it hard to talk about regex!

I guess I'll have to grab a copy of Forte Agent or something...


> 'Optional' is ?, not *. Presumably you don't want to allow


Maybe I really am confused. Regex isn't really my strong point, though, so I appreciate the clarification.

I thought that ? made it not greedy; meaning, instead of catching the next reference, it would find the last reference.

Example:

$text = "Example >->->";
$text = s/>?//;

would return:

Example ->->

But this:

$text = "Example >->->";
$text = s/>//;

would return:

Example --

Then, I thought that * meant "0 or more times", which would essentially make it optional?
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      07-18-2012
Ben Morrow <(E-Mail Removed)> writes:

[...]

>> Then, I thought that * meant "0 or more times", which would essentially
>> make it optional?

>
> Well, yes, in a sense. 'Optional' is ambiguous; in this case, as I said,
> I believe you want 0-or-1-times rather than 0-or-more-times.


I don't think so. ? is equivalent to the quantifier {0,1}, * is
equivalent to the quantifier {0,}. Both imply that the re they apply
to is optional for success of the match (not matching it at all is
fine). The difference is on the right side of the comma: The first one
may match at most once, the second one represents an unbounded
sequence.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem with single-quote and double-quote when using subprocess. Evan Python 3 11-04-2008 12:24 PM
problem with quote and single-quote when using "subprocess" Evan Python 1 11-04-2008 12:08 AM
How to handling string contains single quote and double quote vikrant Perl Misc 8 05-17-2007 04:37 PM
Datagrid on load; replace all double single quote to single quote to display to user Eric Layman ASP .Net 3 04-14-2007 07:16 AM
Single Quote Versus Double Quote In A href link knee-dragger@hotmail.com HTML 3 06-13-2006 12:42 AM



Advertisments