Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Extract domain name

Reply
Thread Tools

Extract domain name

 
 
Shabam
Guest
Posts: n/a
 
      11-12-2004
How do you fetch just the domain name part of a variable in a script? The
variable can be "http://www.domain.com/blahblah/whatever/page.htm" or
"http://sub.domain.com/blahblah/whatever/page.htm".

What I need is to extract just the "domain.com".


 
Reply With Quote
 
 
 
 
Paul Lalli
Guest
Posts: n/a
 
      11-12-2004
[removed non-existant groups, removed off topic AOL group, set followups
to c.l.p.m.]

"Shabam" <> wrote in message
news:3u-dnd1_9JRvQAncRVn-...
> How do you fetch just the domain name part of a variable in a script?

The
> variable can be "http://www.domain.com/blahblah/whatever/page.htm" or
> "http://sub.domain.com/blahblah/whatever/page.htm".
>
> What I need is to extract just the "domain.com".


Try using the Regexp::Common module from CPAN. I seem to recall it has
a method for parsing URIs

Paul Lalli

 
Reply With Quote
 
 
 
 
Andrew Tkachenko
Guest
Posts: n/a
 
      11-12-2004
Look for URI module. IMHO, its a good and simple thing for parsing URLs

use URI;
($domain = URI->new("http://www.domain.com/blahblah/whatever/page.htm")->authority) =~ s/^www\.//i


Regards,
Andrew

Shabam wrote on 12 Ноябрь 2004 16:02:

> How do you fetch just the domain name part of a variable in a script? The
> variable can be "http://www.domain.com/blahblah/whatever/page.htm" or
> "http://sub.domain.com/blahblah/whatever/page.htm".
>
> What I need is to extract just the "domain.com".


--
Andrew
 
Reply With Quote
 
Andrew Tkachenko
Guest
Posts: n/a
 
      11-12-2004
Sorry, did'nt pay attention to sub-domains in your example.
So, IMHO, it depends on your task - if it allows to guess possible
TLD values, then just split domain name into parts and leave just matched
TLD and SLD.

Regards,
Andrew

Ryan Thompson wrote on 12 Ноябрь 2004 17:38:

> [ Cross-post trimmed ]
>
> Shabam wrote to :
>
>> How do you fetch just the domain name part of a variable in a script?
>> The variable can be "http://www.domain.com/blahblah/whatever/page.htm"
>> or "http://sub.domain.com/blahblah/whatever/page.htm".
>>
>> What I need is to extract just the "domain.com".

>
> This is definitely a non-trivial problem. Fortunately, it's been
> partially solved already. I'm involved in the SpamAssassin and SURBL
> projects, where this really became obvious when spammers started
> obfuscating URIs, and using domains from many different TLDs where it
> takes a lot of research to determine where to chop the hostname to get
> the actual registrar domain.
>
> There's much more to it than using a library or regexp.
>
> See get_uri_list() in SpamAssassin 3's PerMsgStatus.pm for one
> "industrial strength" solution to this problem, which still has room for
> improvement.
>
> - Ryan
>


--
Andrew
 
Reply With Quote
 
Joe Smith
Guest
Posts: n/a
 
      11-14-2004
Shabam wrote:

> How do you fetch just the domain name part of a variable in a script? The
> variable can be "http://www.domain.com/blahblah/whatever/page.htm" or
> "http://sub.domain.com/blahblah/whatever/page.htm".
>
> What I need is to extract just the "domain.com".


The problem is not well defined.

For "http://www.tacp.toshiba.com/" do you want "tacp.toshiba.com" or just
"toshiba.com"? For "http://story.news.yahoo.com", is "news" included or not?
You can't just use the last two components in all cases, such as
"http://www.toyota.co.jp" or "http://www.bbc.co.uk".

-Joe
 
Reply With Quote
 
Shabam
Guest
Posts: n/a
 
      11-14-2004
> The problem is not well defined.
>
> For "http://www.tacp.toshiba.com/" do you want "tacp.toshiba.com" or just
> "toshiba.com"? For "http://story.news.yahoo.com", is "news" included or

not?
> You can't just use the last two components in all cases, such as
> "http://www.toyota.co.jp" or "http://www.bbc.co.uk".


What I would need is just the domain name part. In this case it would be
"toshiba.com" only. No subdomains. My domains will be simple
(com/net/org), so complicated situations like "toyota.co.jp" wouldn't apply.


 
Reply With Quote
 
sam
Guest
Posts: n/a
 
      11-18-2004
Shabam wrote:

>>The problem is not well defined.
>>
>>For "http://www.tacp.toshiba.com/" do you want "tacp.toshiba.com" or just
>>"toshiba.com"? For "http://story.news.yahoo.com", is "news" included or

>
> not?
>
>>You can't just use the last two components in all cases, such as
>>"http://www.toyota.co.jp" or "http://www.bbc.co.uk".

>
>
> What I would need is just the domain name part. In this case it would be
> "toshiba.com" only. No subdomains. My domains will be simple
> (com/net/org), so complicated situations like "toyota.co.jp" wouldn't apply.
>
>

I m not an expert, but the following regex will apply:

$url = "http://www.abc.xyz.toy-0-ota.com";
($domain) = ($url =~ /http:\/\/.*\.([0-9a-zA-Z\-]+\.com|net|org)/);
print $domain . "\n";

Sam
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to extract domain name without sub domain from url Chem Leakhina Ruby 2 06-23-2009 09:11 AM
Making a server on one domain the domain controller of a new domain Limited Wisdom MCSA 7 09-13-2006 02:18 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57