Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Need help with regular expression to parse URLs

Reply
Thread Tools

Need help with regular expression to parse URLs

 
 
Tom Anderson
Guest
Posts: n/a
 
      08-10-2009
On Mon, 10 Aug 2009, Stefan Ram wrote:

> Neil <(E-Mail Removed)> writes:
>> I am having trouble figuring out how to write a regular expression to
>> parse our parts of a url.

>
> http://web.archive.org/web/200707050...erl/url3.regex


Nice. But since all those groups are non-capturing, completely bloody
useless!

tom

--
I do not fear death. I had been dead for billions and billions of years
before I was born. -- Mark Twain
 
Reply With Quote
 
 
 
 
Neil
Guest
Posts: n/a
 
      08-10-2009
On Aug 10, 4:48*pm, Tom Anderson <(E-Mail Removed)> wrote:
> Firstly, the repeated group as written has no way to admit slashes
> *between* pairs of path elements.


Yes! I see it now. Thank you.

> Secondly, you get one matching group per occurrence of a capturing group
> in the *pattern*, not per occurrence of the subpattern in the match. That
> is, if the above pair group matches five times, you'll still only get a
> single pair of captured groups (the last ones). That, i think, means
> there's no way to use a regular expression to do what you want to do here..


I did not realize this was a limitation of the regex matching.

I will use split.

Thanks,
Neil

--
Neil Aggarwal, (281)846-8957, www.JAMMConsulting.com
Will your e-commerce site go offline if you have
a DB server failure, fiber cut, flood, fire, or other disaster?
If so, ask about our geographically redundant database system.
 
Reply With Quote
 
 
 
 
markspace
Guest
Posts: n/a
 
      08-10-2009
Roedy Green wrote:
> On Mon, 10 Aug 2009 11:35:04 -0700 (PDT), Neil
> <(E-Mail Removed)> wrote, quoted or indirectly quoted someone
> who said :
>
>> http://jammconsulting.com/jamm/page/...Backpacks.html

>
> Complicated regexes are such a bitch to debug. We need a tool that
> shows you just how far it got.



I use a little regex tester that I wrote. It's a Jar file that pops up
a gui that allows me to test regex against a pattern. It's really handy
and faster than making a new project in the IDE, and running compiles
against a Java string.

I'll post it if you think it would be generally useful.

 
Reply With Quote
 
markspace
Guest
Posts: n/a
 
      08-10-2009
Stefan Ram wrote:
> Neil <(E-Mail Removed)> writes:
>> I am having trouble figuring out how to write a regular expression to
>> parse our parts of a url.

>
> http://web.archive.org/web/200707050...erl/url3.regex
>



That hurted my brain.
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      08-10-2009
On Mon, 10 Aug 2009 22:49:16 +0100, Tom Anderson
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone who
said :

>Writing a loop to iterate over the elements of the chunks array in pairs
>is a pain, but a very minor one.


You don't even have to. Split tosses out the '/'s for you. You just
have to choose a magic subscript to bypass the unwanted lead fields.

--
Roedy Green Canadian Mind Products
http://mindprod.com

"You can have quality software, or you can have pointer arithmetic; but you cannot have both at the same time."
~ Bertrand Meyer (born: 1950 age: 59) 1989, creator of design by contract and the Eiffel language.
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      08-11-2009
On Mon, 10 Aug 2009 15:07:54 -0700 (PDT), Neil
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>
>I did not realize this was a limitation of the regex matching.


Regexes have a number of limitations. They can't, for example ensure
() are balanced. When they run out of steam, try a parser.

See http://mindprod.com/jgloss/parser.html
--
Roedy Green Canadian Mind Products
http://mindprod.com

"You can have quality software, or you can have pointer arithmetic; but you cannot have both at the same time."
~ Bertrand Meyer (born: 1950 age: 59) 1989, creator of design by contract and the Eiffel language.
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      08-11-2009
On Mon, 10 Aug 2009 13:19:48 -0700, markspace <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>
>If you can write a custom parser in two minutes,


The problem with regexes is I never feel confident they are fully
debugged. With all the greedy/reluctant stuff the expected behaviour
becomes a matter of experiment, rather than something you just read.

Except for very simple ones, I never feel fully confident they are
correct.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"You can have quality software, or you can have pointer arithmetic; but you cannot have both at the same time."
~ Bertrand Meyer (born: 1950 age: 59) 1989, creator of design by contract and the Eiffel language.
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      08-11-2009
On Mon, 10 Aug 2009 16:19:31 -0700, markspace <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>
>That hurted my brain


I think it was an entry in an obsured coding contest.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"You can have quality software, or you can have pointer arithmetic; but you cannot have both at the same time."
~ Bertrand Meyer (born: 1950 age: 59) 1989, creator of design by contract and the Eiffel language.
 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      08-11-2009
On Mon, 10 Aug 2009, Roedy Green wrote:

> On Mon, 10 Aug 2009 22:49:16 +0100, Tom Anderson
> <(E-Mail Removed)> wrote, quoted or indirectly quoted someone who
> said :
>
>> Writing a loop to iterate over the elements of the chunks array in pairs
>> is a pain, but a very minor one.

>
> You don't even have to. Split tosses out the '/'s for you. You just
> have to choose a magic subscript to bypass the unwanted lead fields.


No, the OP wanted to process path elements in pairs. If he had:

prefix/a/b/c/d.html

Then he wanted to get two pairs

a + b
c + d

Split will give you an array {a, b, c, d}. You need to write something
like:

String[] elements = path.split("/");
for (int i = 0; i < elements.length; i += 2) {
String first = elements[i];
String second = elements[i + 1];
}

Except that as you say, you need to stick in a magic subscript to skip
over the boring bits at the start of the path.

tom

--
I think the Vengaboys compliment his dark visions splendidly well. -- Mark
Watson, on 'Do you listen to particular music when reading lovecraft?'
 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      08-11-2009
On Mon, 10 Aug 2009, Roedy Green wrote:

> On Mon, 10 Aug 2009 15:07:54 -0700 (PDT), Neil
> <(E-Mail Removed)> wrote, quoted or indirectly quoted someone
> who said :
>
>> I did not realize this was a limitation of the regex matching.

>
> Regexes have a number of limitations. They can't, for example ensure
> () are balanced.


A popular myth!

Regular languages cannot balance parentheses. But regular expressions as
we know and use them outgrew being a regular language years ago - once you
have backreferences and other modern conveniences, you *can* do things
like balancing parens. Somehow.

tom

--
I think the Vengaboys compliment his dark visions splendidly well. -- Mark
Watson, on 'Do you listen to particular music when reading lovecraft?'
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help to find a regular expression to parse po file gialloporpora Python 4 07-06-2009 05:42 PM
Converting Relative URLs into Absolute URLs Nathan Sokalski ASP .Net 1 08-12-2008 07:03 AM
Need to parse SQL statements...use regular expression? Justin F Perl Misc 4 03-05-2004 04:43 PM
Distinguish text URLs from non-text URLs? Kaidi Java 5 01-04-2004 10:15 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments