Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > nested strings

Reply
Thread Tools

nested strings

 
 
sausenet@gmail.com
Guest
Posts: n/a
 
      08-05-2008
Could anyone suggest a nice way of nesting strings, as follows?

Given: a list of words sorted first in length order, then
alphabetically:
FORMATION
FORMATIVE
FORMATTER
DEFORMATION
DEFORMATIVE
FORMATIVELY
INFORMATICS
INFORMATION
INFORMATIVE
INFORMATORY
CONFORMATION
MALFORMATION
DEFORMATIONAL
INFORMATIONAL
INFORMATIVELY
UNINFORMATIVE

Desired output: the same set of words showing nested substrings
FORMATION [DEFORMATION [DEFORMATIONAL] INFORMATION [INFORMATIONAL]
CONFORMATION MALFORMATION]
FORMATIVE [DEFORMATIVE FORMATIVELY [INFORMATIVELY] INFORMATIVE
PERFORMATIVE UNINFORMATIVE]
FORMATTER
INFORMATICS
INFORMATORY

Note as to form: they'll probably start as a concatenation of the
strings separated by blanks.

Note as to ambiguous case: yes, if the starting list has
AAAB
CAAA
CAAAB
then CAAAB could be deemed a child of AAAB or CAAA or both. Any of
these ways of treating CAAAB would be fine.

As a compromise, sacrificing the sub-nesting feature would be fine.
I.e.,
AAAB
AAABC
AAABCC
becomes
AAAB [AAABC AAABCC]

Thanks for any help or pointers.

Steven Alexander
 
Reply With Quote
 
 
 
 
Ben Morrow
Guest
Posts: n/a
 
      08-05-2008

Quoth http://www.velocityreviews.com/forums/(E-Mail Removed):
> Could anyone suggest a nice way of nesting strings, as follows?
>
> Given: a list of words sorted first in length order, then
> alphabetically:
> FORMATION
> FORMATIVE
> FORMATTER
> DEFORMATION
> DEFORMATIVE
> FORMATIVELY
> INFORMATICS
> INFORMATION
> INFORMATIVE
> INFORMATORY
> CONFORMATION
> MALFORMATION
> DEFORMATIONAL
> INFORMATIONAL
> INFORMATIVELY
> UNINFORMATIVE
>
> Desired output: the same set of words showing nested substrings
> FORMATION [DEFORMATION [DEFORMATIONAL] INFORMATION [INFORMATIONAL]
> CONFORMATION MALFORMATION]
> FORMATIVE [DEFORMATIVE FORMATIVELY [INFORMATIVELY] INFORMATIVE
> PERFORMATIVE UNINFORMATIVE]
> FORMATTER
> INFORMATICS
> INFORMATORY
>
> Note as to form: they'll probably start as a concatenation of the
> strings separated by blanks.


I would start by maintaining a hash mapping strings to arrayrefs of
'children', using logic something like

use List::Util qw/first/;

if (my $parent = first { $nextword =~ /\Q$_/ } keys %children) {
$children{$first}{$nextword} = undef;
}
elsif (my $child = first { /\Q$nextword/ } keys %children) {
$children{$nextword}{$child} = $children{$child};
}

I haven't tested that, and I don't think it's right, but the idea is to
end up with a structure like

my %children;
%children = (
FORMATION => {
DEFORMATION => $children{DEFORMATION},
INFORMATION => $children{INFORMATION},
},
DEFORMATION => { DEFORMATIONAL => undef },
INFORMATION => { INFORMATIONAL => undef },
);

....nah, that isn't right. I think you need two hashes: one flat, with an
entry for each word, mapping word to parent (if any)

my %parents = (
FORMATION => undef,
DEFORMATION => 'FORMATION',
INFORMATION => 'FORMATION',
DEFORMATIONAL => 'DEFORMATION',
INFORMATIONAL => 'INFORMATION',
);

and one structured as a tree

my %children = (
FORMATION => {
DEFORMATION => {
DEFORMATIONAL => {},
},
INFORMATION => {
INFORMATIONAL => {},
},
},
);

You can then see if a word matches something in the %parents hash; if it
does, walk up the chain until you find the top, and then walk down the
%children hash until you find the point where it fits. For instance, if
you'd sorted 'FORMATION' and 'DEFORMATIONAL' and you read 'DEFORMATION'
you would

* Find that 'FORMATION' was a substring of 'DEFORMATION', and since
'FORMATION' has no parent, start there in %children;

* Find that 'DEFORMATIONAL', one of the children of 'FORMATION', has
a substring 'DEFORMATION', so 'DEFORMATIONAL' (and all it's
children) become a child of 'DEFORMATION', which becomes a child
of 'DEFORMATIONAL's (former) parent.

Try coding something like that, and post if you get stuck.

Ben

--
Raise your hand if you're invulnerable.
[(E-Mail Removed)]
 
Reply With Quote
 
 
 
 
Ted Zlatanov
Guest
Posts: n/a
 
      08-06-2008
On Tue, 5 Aug 2008 14:51:27 -0700 (PDT) (E-Mail Removed) wrote:

s> Given: a list of words sorted first in length order, then
s> alphabetically:
s> FORMATION
s> FORMATIVE
s> FORMATTER
s> DEFORMATION
s> DEFORMATIVE
s> FORMATIVELY
s> INFORMATICS
s> INFORMATION
s> INFORMATIVE
s> INFORMATORY
s> CONFORMATION
s> MALFORMATION
s> DEFORMATIONAL
s> INFORMATIONAL
s> INFORMATIVELY
s> UNINFORMATIVE

s> Desired output: the same set of words showing nested substrings
s> FORMATION [DEFORMATION [DEFORMATIONAL] INFORMATION [INFORMATIONAL]
s> CONFORMATION MALFORMATION]
s> FORMATIVE [DEFORMATIVE FORMATIVELY [INFORMATIVELY] INFORMATIVE
s> PERFORMATIVE UNINFORMATIVE]
s> FORMATTER
s> INFORMATICS
s> INFORMATORY

You probably want a trie.

http://en.wikipedia.org/wiki/Trie

http://search.cpan.org/~avif/Tree-Trie-1.5/Trie.pm

Ted
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      08-07-2008
On 2008-08-05 23:42, Ben Morrow <(E-Mail Removed)> wrote:
> Quoth (E-Mail Removed):
>> Could anyone suggest a nice way of nesting strings, as follows?
>>
>> Given: a list of words sorted first in length order, then
>> alphabetically:

[...]
>>
>> Desired output: the same set of words showing nested substrings
>> FORMATION [DEFORMATION [DEFORMATIONAL] INFORMATION [INFORMATIONAL]
>> CONFORMATION MALFORMATION]
>> FORMATIVE [DEFORMATIVE FORMATIVELY [INFORMATIVELY] INFORMATIVE
>> PERFORMATIVE UNINFORMATIVE]
>> FORMATTER
>> INFORMATICS
>> INFORMATORY

[...]
> I think you need two hashes: one flat, with an entry for each word,
> mapping word to parent (if any)
>
> my %parents = (
> FORMATION => undef,
> DEFORMATION => 'FORMATION',
> INFORMATION => 'FORMATION',
> DEFORMATIONAL => 'DEFORMATION',
> INFORMATIONAL => 'INFORMATION',
> );
>
> and one structured as a tree
>
> my %children = (
> FORMATION => {
> DEFORMATION => {
> DEFORMATIONAL => {},
> },
> INFORMATION => {
> INFORMATIONAL => {},
> },
> },
> );


You don't need the %parents hash. The %children hash together with a
recursive function is sufficient:


#!/usr/bin/perl
use warnings;
use strict;
use Data:umper;

my $tree = {};

while (<DATA>) {
chomp;
addword($tree, $_);
}
print Dumper $tree;
exit(0);

sub addword {
my ($tree, $word) = @_;
for (keys %$tree) {
if (index($word, $_) >= 0) {
return addword($tree->{$_}, $word);
}
}
$tree->{$word} = {};
}

__DATA__
FORMATION
FORMATIVE
FORMATTER
DEFORMATION
DEFORMATIVE
FORMATIVELY
INFORMATICS
INFORMATION
INFORMATIVE
INFORMATORY
CONFORMATION
MALFORMATION
DEFORMATIONAL
INFORMATIONAL
INFORMATIVELY
UNINFORMATIVE

Printing the tree structure in the format desired by the OP is left as
an exercise to the reader .

hp
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      08-07-2008
On 2008-08-06 16:23, Ted Zlatanov <(E-Mail Removed)> wrote:
> On Tue, 5 Aug 2008 14:51:27 -0700 (PDT) (E-Mail Removed) wrote:
> s> Given: a list of words sorted first in length order, then
> s> alphabetically:

[...]
> s> Desired output: the same set of words showing nested substrings
> s> FORMATION [DEFORMATION [DEFORMATIONAL] INFORMATION [INFORMATIONAL]
> s> CONFORMATION MALFORMATION]
> s> FORMATIVE [DEFORMATIVE FORMATIVELY [INFORMATIVELY] INFORMATIVE
> s> PERFORMATIVE UNINFORMATIVE]
> s> FORMATTER
> s> INFORMATICS
> s> INFORMATORY
>
> You probably want a trie.


Isn't a trie defined as a tree where the key of the parent node is a
prefix of the keys of its children? Here it's an infix.

hp
 
Reply With Quote
 
Ted Zlatanov
Guest
Posts: n/a
 
      08-07-2008
On Thu, 7 Aug 2008 11:54:23 +0200 "Peter J. Holzer" <(E-Mail Removed)> wrote:

PJH> On 2008-08-06 16:23, Ted Zlatanov <(E-Mail Removed)> wrote:
>> On Tue, 5 Aug 2008 14:51:27 -0700 (PDT) (E-Mail Removed) wrote:

s> Given: a list of words sorted first in length order, then
s> alphabetically:
PJH> [...]
s> Desired output: the same set of words showing nested substrings
s> FORMATION [DEFORMATION [DEFORMATIONAL] INFORMATION [INFORMATIONAL]
s> CONFORMATION MALFORMATION]
s> FORMATIVE [DEFORMATIVE FORMATIVELY [INFORMATIVELY] INFORMATIVE
s> PERFORMATIVE UNINFORMATIVE]
s> FORMATTER
s> INFORMATICS
s> INFORMATORY
>>
>> You probably want a trie.


PJH> Isn't a trie defined as a tree where the key of the parent node is a
PJH> prefix of the keys of its children? Here it's an infix.

The OP said it was OK to have just prefix matching, though infix
matching would be nice.

You can also just iterate over the substrings of the target, e.g. for
INFORMATION you check for I-N-F-O-R-M-A-T-I-O-N and then
N-F-O-R-M-A-T-I-O-N and then F-O-R-M-A-T-I-O-N, etc. in the trie.

Ted
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      08-07-2008
On 2008-08-07 12:59, Ted Zlatanov <(E-Mail Removed)> wrote:
> On Thu, 7 Aug 2008 11:54:23 +0200 "Peter J. Holzer" <(E-Mail Removed)> wrote:
>
> PJH> On 2008-08-06 16:23, Ted Zlatanov <(E-Mail Removed)> wrote:
>>> On Tue, 5 Aug 2008 14:51:27 -0700 (PDT) (E-Mail Removed) wrote:

> s> Given: a list of words sorted first in length order, then
> s> alphabetically:
> PJH> [...]
> s> Desired output: the same set of words showing nested substrings
> s> FORMATION [DEFORMATION [DEFORMATIONAL] INFORMATION [INFORMATIONAL]

[...]
>>>
>>> You probably want a trie.

>
> PJH> Isn't a trie defined as a tree where the key of the parent node is a
> PJH> prefix of the keys of its children? Here it's an infix.
>
> The OP said it was OK to have just prefix matching, though infix
> matching would be nice.


You mean "as a compromise, sacrificing the subnesting would be fine"? I
understood that to mean that flattening the tree to just two levels
would be acceptable, not that prefix matching would be ok.


> You can also just iterate over the substrings of the target, e.g. for
> INFORMATION you check for I-N-F-O-R-M-A-T-I-O-N and then
> N-F-O-R-M-A-T-I-O-N and then F-O-R-M-A-T-I-O-N, etc. in the trie.


It would be quite inefficient, though, since you have to search the trie
for a lot of substrings. I don't see a reason to do that, given that the
correct solution is so easy (see my answer to Ben).

hp
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
dealing with nested xml within nested xml within...... Ultrus Python 3 07-09-2007 09:00 PM
Is nested class automatically friend of class that it is nested in? request@no_spam.com C++ 5 09-25-2006 08:31 AM
Strings, Strings and Damned Strings Ben C Programming 14 06-24-2006 05:09 AM
Nested Vector Nester Classes are Nested in my Brain Chad E. Dollins C++ 3 11-08-2005 04:46 AM
Nested iterators (well, not nested exactly...) Russ Perry Jr Java 2 08-20-2004 06:51 PM



Advertisments