![]() |
|
|
|||||||
![]() |
Python - Help with Regex for domain names |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
I'm trying to figure out how to write efficiently write a regex for
domain names with a particular top level domain. Let's say, I want to grab all domain names with country codes .us, .au, and .de. I could create three different regexs that would work: regex = re.compile(r'[\w\-\.]+\.us) regex = re.compile(r'[\w\-\.]+\.au) regex = re.compile(r'[\w\-\.]+\.de) How would I write one to accommodate all three, or, better yet, to accommodate a list of them that I can pass into a method call? Thanks! Feyo |
|
|
|
|
#2 |
|
Posts: n/a
|
Feyo wrote:
> I'm trying to figure out how to write efficiently write a regex for > domain names with a particular top level domain. Let's say, I want to > grab all domain names with country codes .us, .au, and .de. > > I could create three different regexs that would work: > regex = re.compile(r'[\w\-\.]+\.us) > regex = re.compile(r'[\w\-\.]+\.au) > regex = re.compile(r'[\w\-\.]+\.de) > > How would I write one to accommodate all three, or, better yet, to > accommodate a list of them that I can pass into a method call? Thanks! Just a point of interest: A correctly formed domain name may have a trailing period at the end of the TLD [1]. Example: foo.bar.com. Though you do not often see this, it's worth accommodating "just in case"... [1] http://homepages.tesco.net/J.deBoyne...main-name.html -- ---------------------------------------------------------------------------- Tim Daneliuk PGP Key: http://www.tundraware.com/PGP/ Tim Daneliuk |
|
|
|
#3 |
|
Posts: n/a
|
Feyo wrote:
> I'm trying to figure out how to write efficiently write a regex for > domain names with a particular top level domain. Let's say, I want to > grab all domain names with country codes .us, .au, and .de. > > I could create three different regexs that would work: > regex = re.compile(r'[\w\-\.]+\.us) > regex = re.compile(r'[\w\-\.]+\.au) > regex = re.compile(r'[\w\-\.]+\.de) > > How would I write one to accommodate all three, or, better yet, to > accommodate a list of them that I can pass into a method call? Thanks! > regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)') If you have a list of country codes ["us", "au", "de"] then you can build the regular expression from it: regex = re.compile(r'[\w\-\.]+\.(?:%s)' % '|'.join(domains)) MRAB |
|
|
|
#4 |
|
Posts: n/a
|
On Jul 30, 11:56*am, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Feyo wrote: > > I'm trying to figure out how to write efficiently write a regex for > > domain names with a particular top level domain. Let's say, I want to > > grab all domain names with country codes .us, .au, and .de. > > > I could create three different regexs that would work: > > regex = re.compile(r'[\w\-\.]+\.us) > > regex = re.compile(r'[\w\-\.]+\.au) > > regex = re.compile(r'[\w\-\.]+\.de) > > > How would I write one to accommodate all three, or, better yet, to > > accommodate a list of them that I can pass into a method call? Thanks! > > *> > regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)') > > If you have a list of country codes ["us", "au", "de"] then you can > build the regular expression from it: > > regex = re.compile(r'[\w\-\.]+\.(?:%s)' % '|'.join(domains)) Perfect! Thanks. Feyo |
|
|
|
#5 |
|
Posts: n/a
|
On Jul 30, 9:56 am, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Feyo wrote: > > I'm trying to figure out how to write efficiently write a regex for > > domain names with a particular top level domain. Let's say, I want to > > grab all domain names with country codes .us, .au, and .de. > > > I could create three different regexs that would work: > > regex = re.compile(r'[\w\-\.]+\.us) > > regex = re.compile(r'[\w\-\.]+\.au) > > regex = re.compile(r'[\w\-\.]+\.de) > > > How would I write one to accommodate all three, or, better yet, to > > accommodate a list of them that I can pass into a method call? Thanks! > > > > regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)') You might also want to consider that some country codes such as "co" for Columbia might match more than you want, for example: re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com') will match. rurpy@yahoo.com |
|
|
|
#6 |
|
Posts: n/a
|
On Thu, 30 Jul 2009 10:29:09 -0700, rurpy wrote:
>> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)') > > You might also want to consider that some country > codes such as "co" for Columbia might match more than > you want, for example: > > re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com') > > will match. .... so put \b at the end, i.e.: regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b') Nobody |
|
|
|
#7 |
|
Posts: n/a
|
Nobody wrote:
> On Thu, 30 Jul 2009 10:29:09 -0700, rurpy wrote: > >>> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)') >> You might also want to consider that some country >> codes such as "co" for Columbia might match more than >> you want, for example: >> >> re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com') >> >> will match. > > ... so put \b at the end, i.e.: > > regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b') > It would still match "www.bbc.co.uk", so you might need: regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b(?!\.\b)') MRAB |
|
|
|
#8 |
|
Posts: n/a
|
In article <mailman.3998.1248989346.8015.python->,
MRAB <> wrote: >Nobody wrote: >> On Thu, 30 Jul 2009 10:29:09 -0700, rurpy wrote: >> >>>> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)') >>> You might also want to consider that some country >>> codes such as "co" for Columbia might match more than >>> you want, for example: >>> >>> re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com') >>> >>> will match. >> >> ... so put \b at the end, i.e.: >> >> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b') >> >It would still match "www.bbc.co.uk", so you might need: > >regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b(?!\.\b)') If it's a string containing just the candidate domain, you can do regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)$') -- Aahz () <*> http://www.pythoncraft.com/ "Many customs in this life persist because they ease friction and promote productivity as a result of universal agreement, and whether they are precisely the optimal choices is much less important." --Henry Spencer Aahz |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Getting the Parameter Names | Nagaveni | Software | 0 | 04-29-2008 06:28 AM |
| I think big studio names are still exciting do you? | peter.may@g2.com | DVD Video | 10 | 08-15-2007 04:51 AM |
| ATI: Recycling Old Cards With New Names | Silverstrand | Front Page News | 2 | 08-29-2006 04:02 PM |
| Newbie DVD help!!! MP3's, DIVX, XVID, and Long File Names | stever | DVD Video | 0 | 09-15-2005 06:41 PM |
| I LOVE FULLSCREEN | Lookingglass | DVD Video | 139 | 01-06-2004 02:13 AM |