Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > domain name ordering

Reply
Thread Tools

domain name ordering

 
 
Ivan Shmakov
Guest
Posts: n/a
 
      02-28-2013
One more "domain name" problem: write an "inequality" predicate
to compare domain names "right to left". IOW:

my @ordered
= qw (qux.example.net bar.example.org foo.bar.example.org foo.example.org);

A trivial solution is to split /\./ each of the names, reverse
the resulting lists, and then compare them elementwise, until
either one of the lists ends (which is then the lesser one), or
an inequality is found. Hence:

sub list_cmp (&$$) {
## BTW, how do I imitate sort's own signature here?
## (i. e., make $cmp truly optional.)
my ($cmp, $a, $b) = @_;
$cmp
//= sub { $_[0] cmp $_[1]; };
for (my $i = 0; $i <= $#$a; $i++) {
## .
return 1
if ($i > $#$b);
my $v
= &$cmp ($a->[$i], $b->[$i]);
## .
return $v
if ($v != 0);
}

## .
return ($#$a < $#$b ? -1 : 0);
}

sub dns_name_cmp ($$) {
my ($a, $b) = @_;

## .
list_cmp (undef,
[reverse (split (/\./, $a, -1))],
[reverse (split (/\./, $b, -1))]);
}

## print in reverse
print (join ("\n", sort { - dns_name_cmp ($a, $b); } (@ordered)), "\n");
# foo.example.org
# foo.bar.example.org
# bar.example.org
# qux.example.net

However, doing a split on every sort iteration doesn't seem all
that sensible (or does it?) Let's try with rindex () instead:

sub dns_name_cmp ($$) {
my ($a, $b) = @_;

my ($ta, $tb)
= (-1 + length ($a), -1 + length ($b));
while ($ta >= 0 && $tb >= 0) {
## NB: $[ is deprecated, thus rindex () >= -1
my ($i, $j)
= (rindex ($a, ".", $ta),
rindex ($b, ".", $tb));
## .
my $v
= (substr ($a, 1 + $i, $ta - $i)
cmp substr ($b, 1 + $j, $tb - $j));
return $v
if ($v != 0);
($ta, $tb)
= (-1 + $i, -1 + $j);
}

## .
return ($ta > 0 ? +1
: $tb > 0 ? -1
: 0);
}

## print in reverse
print (join ("\n", sort { - dns_name_cmp ($a, $b); } (@ordered)), "\n");
# foo.example.org
# foo.bar.example.org
# bar.example.org
# qux.example.net

Hopefully I didn't miss any corner case with these.

Now, it makes me wonder if there's an easier way to write this
function...

--
FSF associate member #7257
 
Reply With Quote
 
 
 
 
Dr.Ruud
Guest
Posts: n/a
 
      02-28-2013
On 2013-02-28 09:36, Ivan Shmakov wrote:
> One more "domain name" problem: write an "inequality" predicate
> to compare domain names "right to left". IOW:
>
> my @ordered
> = qw (qux.example.net bar.example.org foo.bar.example.org foo.example.org);


You can build a hash from that, for example:

my %dname = (
net => {
example => { qux => { '.' => -1 } },
},
org => {
example => {
bar => {
foo => { '.' => -1 },
'.' => -1,
},
foo => { '.' => -1 },
},
},
);



Or do
my @dnames= sort { length($b) <=> length($a) or $a cmp $b } @ordered;
to make it:
qw(foo.bar.x.org bar.x.org foo.x.org qux.x.net);
and use a 'for my $dname (@dnames) { ... } loop
that stops at the first match on
/(?:^|\.)\Q$dname$/

--
Ruud

 
Reply With Quote
 
 
 
 
Bo Lindbergh
Guest
Posts: n/a
 
      02-28-2013
Transform each FQDN into something that will sort correctly
using the default string comparison; sort; reverse the transformation.

@ordered = map {
join ".", reverse split / /;
} sort map {
join " ", reverse split /\./;
} @unordered;


/Bo Lindbergh
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      02-28-2013
Ivan Shmakov <> writes:
> One more "domain name" problem: write an "inequality" predicate
> to compare domain names "right to left". IOW:
>
> my @ordered
> = qw (qux.example.net bar.example.org foo.bar.example.org foo.example.org);
>
> A trivial solution is to split /\./ each of the names, reverse
> the resulting lists, and then compare them elementwise, until
> either one of the lists ends (which is then the lesser one), or
> an inequality is found. Hence:
>
> sub list_cmp (&$$) {


[...]

> However, doing a split on every sort iteration doesn't seem all
> that sensible (or does it?) Let's try with rindex () instead:
>
>
> sub dns_name_cmp ($$) {
> my ($a, $b) = @_;
>
> my ($ta, $tb)
> = (-1 + length ($a), -1 + length ($b));


-1 + length($a) == length($a) - 1

> while ($ta >= 0 && $tb >= 0) {
> ## NB: $[ is deprecated, thus rindex () >= -1
> my ($i, $j)
> = (rindex ($a, ".", $ta),
> rindex ($b, ".", $tb));
> ## .
> my $v
> = (substr ($a, 1 + $i, $ta - $i)
> cmp substr ($b, 1 + $j, $tb - $j));


Have you benchmarked that? You're still doing the structure analysis
and component extraction for ever comparison and out of my head, I
wouldn't be absolutely sure that doing it this way instead of with
split is actually faster.

The obvious other idea would be to split the domain names once and
leave them this way until the list is sorted. If you could afford to
lose some speed at the expense of code clarity, this could be put in a
class overloading cmp which either constructs the 'non-splitted,
non-reversed' representation 'on demand' (=> overloaded "") or keeps
both around or constructs the 'exploded' one on demand.

Instead of reversing the splitted lists, one could also compare from
the end, possibly using negative indices ($a[$#a] is the same as $a[-1]).
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      02-28-2013
Bo Lindbergh <> writes:
> Transform each FQDN into something that will sort correctly
> using the default string comparison; sort; reverse the transformation.
>
> @ordered = map {
> join ".", reverse split / /;
> } sort map {
> join " ", reverse split /\./;
> } @unordered;


This is a neat idea. It could be made slightly more general by using
\000 instead of the space as alternate label separator.
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      02-28-2013
Rainer Weikusat <> writes:
> Bo Lindbergh <> writes:
>> Transform each FQDN into something that will sort correctly
>> using the default string comparison; sort; reverse the transformation.
>>
>> @ordered = map {
>> join ".", reverse split / /;
>> } sort map {
>> join " ", reverse split /\./;
>> } @unordered;

>
> This is a neat idea. It could be made slightly more general by using
> \000 instead of the space as alternate label separator.


A 'better' ('faster') idea which needs even less Perl code would be to
build a hash of sort keys and sort the domain name array based on
comparing these instead of destroying and recreating the dataset which
is supposed to be sorted:

------------------
use Benchmark;

my @in = reverse(qw(qux.example.net bar.example.org foo.bar.example.org foo.example.org));

timethese(-5,
{
sort_x => sub {
return map {
join '.', reverse split /\000/
} sort map {
join "\000", reverse split /\./
} @in;
},

sort_y => sub {
my %keys;

$keys{$_} = join("\000", reverse(split(/\./, $_))) for @in;
return sort { $keys{$a} cmp $keys{$b} } @in;
}});

 
Reply With Quote
 
Ivan Shmakov
Guest
Posts: n/a
 
      02-28-2013
>>>>> Rainer Weikusat <> writes:

[...]

> A 'better' ('faster') idea which needs even less Perl code would be
> to build a hash of sort keys and sort the domain name array based on
> comparing these instead of destroying and recreating the dataset
> which is supposed to be sorted:


[...]

Unless I be mistaken, sort { } () with the second variant of
dns_name_cmp I've originally posted is still faster.

$ cat < 3zytr64ehn1aqckj5mk93da6zg.perl
use common::sense;

require Benchmark;

## Courtesy of /usr/share/dict/words
my @n1 = qw {
igloo.burgs.tuft leads.polyp.lakes doily.rude.corks rains.cuds.bilks
aces.mumps.colts pink.bebop.prune coeds.troop.sees tour.hulks.rungs
vent.dial.whelk logos.loped.nor mated.bran.cited pies.loaf.rune
sperm.yelp.harms how.scaly.ante dogma.lawns.plods dell.yaw.piano
caged.daub.gores weird.sofas.arias rave.gouty.tamp sics.avows.today
crass.execs.smuts dale.real.stack boots.dew.pool demur.fancy.stoop
gruff.aspen.memo gaze.homey.sorts new.slot.queue wiry.share.gulf
phase.canny.vogue grace.eddy.beaux juts.wands.doc helm.was.ladle
};
my @n2 = qw {
bums.example.com fools.example.com sic.example.org feel.example.com
wound.example.net stop.example.com diced.example.org
purse.example.com sip.example.net poll.example.com whole.example.net
kinky.example.net homer.example.org ulnas.example.com
won.example.org chase.example.com prong.example.net
hacks.example.net link.example.com monk.example.com sexy.example.net
bogs.example.net crest.example.org rays.example.com
solar.example.com click.example.net mixer.example.com
rib.example.org pinup.example.com gees.example.net fizzy.example.com
aisle.example.org
};

## sub dns_name_cmp { ... as per news: ... }

sub rw_sort (@) {
my %keys;

$keys{$_}
= join ("\000", reverse (split (/\./, $_)))
for @_;
## .
return
sort { $keys{$a} cmp $keys{$b} } @_;
}

Benchmark::timethese (1024 * 1024, {
"is-1" => sub { sort dns_name_cmp (@n1); },
"is-2" => sub { sort dns_name_cmp (@n2); },
"rw-1" => sub { rw_sort (@n1); },
"rw-2" => sub { rw_sort (@n1); }
});
$ perl 3zytr64ehn1aqckj5mk93da6zg.perl
Benchmark: timing 1048576 iterations of is-1, is-2, rw-1, rw-2...
is-1: -1 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU) @ 34952533.33/s (n=1048576)
(warning: too few iterations for a reliable count)
is-2: -1 wallclock secs ( 0.16 usr + 0.00 sys = 0.16 CPU) @ 6553600.00/s (n=1048576)
(warning: too few iterations for a reliable count)
rw-1: 69 wallclock secs (69.14 usr + 0.03 sys = 69.17 CPU) @ 15159.40/s (n=1048576)
rw-2: 69 wallclock secs (68.61 usr + 0.02 sys = 68.63 CPU) @ 15278.68/s (n=1048576)
$

--
FSF associate member #7257 np. Lord of the Rings -- Blind Guardian
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      02-28-2013
Ben Morrow <> writes:
> Quoth Rainer Weikusat <>:
>> Rainer Weikusat <> writes:
>> > Bo Lindbergh <> writes:
>> >> Transform each FQDN into something that will sort correctly
>> >> using the default string comparison; sort; reverse the transformation.
>> >>
>> >> @ordered = map {
>> >> join ".", reverse split / /;
>> >> } sort map {
>> >> join " ", reverse split /\./;
>> >> } @unordered;
>> >
>> > This is a neat idea. It could be made slightly more general by using
>> > \000 instead of the space as alternate label separator.

>>
>> A 'better' ('faster') idea which needs even less Perl code would be to
>> build a hash of sort keys and sort the domain name array based on
>> comparing these instead of destroying and recreating the dataset which
>> is supposed to be sorted:

> [...]
>> my %keys;
>>
>> $keys{$_} = join("\000", reverse(split(/\./, $_))) for @in;
>> return sort { $keys{$a} cmp $keys{$b} } @in;

>
> Or just use a standard ST
>
> map $$_[0],
> sort { $$a[1] cmp $$b[1] }
> map [ $_, join "\0", reverse split /\./ ],
> @in;


I tried that. It is/ was slower for the data I tested with than the
algorithm originally posted by Bo Lindbergh.
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      02-28-2013
Ivan Shmakov <> writes:
>>>>>> Rainer Weikusat <> writes:

>
> [...]
>
> > A 'better' ('faster') idea which needs even less Perl code would be
> > to build a hash of sort keys and sort the domain name array based on
> > comparing these instead of destroying and recreating the dataset
> > which is supposed to be sorted:

>
> [...]
>
> Unless I be mistaken, sort { } () with the second variant of
> dns_name_cmp I've originally posted is still faster.


[...]

> sub rw_sort (@) {
> my %keys;
>
> $keys{$_}
> = join ("\000", reverse (split (/\./, $_)))
> for @_;
> ## .
> return
> sort { $keys{$a} cmp $keys{$b} } @_;
> }
>
> Benchmark::timethese (1024 * 1024, {
> "is-1" => sub { sort dns_name_cmp (@n1); },
> "is-2" => sub { sort dns_name_cmp (@n2); },
> "rw-1" => sub { rw_sort (@n1); },
> "rw-2" => sub { rw_sort (@n1); }
> });


The obvious issue with this would be that you're adding a gratuitious
subroutine call per iteration for the 3rd and 4th test. But that
doesn't matter that much: Comparing the labels via substring and
stopping at the first mismatch is several orders of magnitude faster
then processing the complete data once per sort and sorting based on
the 'preprocessed' data.
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      02-28-2013
Ivan Shmakov <> wrote:
[...]
> However, doing a split on every sort iteration doesn't seem all
> that sensible (or does it?)

[...]
> Now, it makes me wonder if there's an easier way to write this
> function...


Standard answer to this whole class of problem is to use a Schwartzian
Transformation: http://en.wikipedia.org/wiki/Schwartzian_transform

jue
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Z-Ordering (Morton ordering) question nbigaouette C Programming 2 11-06-2009 05:26 AM
Making a server on one domain the domain controller of a new domain Limited Wisdom MCSA 7 09-13-2006 02:18 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57