Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Generating all ordered substrings of a string (http://www.velocityreviews.com/forums/t359593-generating-all-ordered-substrings-of-a-string.html)

girish@it.usyd.edu.au 07-11-2006 09:20 AM

Generating all ordered substrings of a string
 
Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:

>>> colocn = 'abcd'
>>> k = 4
>>> for i in range(k-1):

for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)
>>> rules

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]


Any ideas??

TIA,
girish



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Bruno Desthuilliers 07-12-2006 12:43 AM

Re: Generating all ordered substrings of a string
 
girish@it.usyd.edu.au a écrit :
> Hi,
> I want to generate all non-empty substrings of a string of length >=2.
> Also,
> each substring is to be paired with 'string - substring' part and vice
> versa.
> Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
> ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> 'bd'],['bd','ac']]
>
> I've tried the following but i cant prevent duplicates and i'm missing
> some substrings:
>
>
>>>>colocn = 'abcd'
>>>>k = 4
>>>>for i in range(k-1):

>
> for j in range(1,k):
> rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
> rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
> rules.append(rule1)
> rules.append(rule2)
>
>>>>rules

>
> [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
> ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
> ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
> ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
>
>
> Any ideas??


Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) > 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]


print listsubstrings("abcd")







































girish@it.usyd.edu.au 07-12-2006 01:44 AM

Re: Generating all ordered substrings of a string
 
Quoting Bruno Desthuilliers <bdesth.quelquechose@free.quelquepart.fr>:

> girish@it.usyd.edu.au a écrit :
> > Hi,
> > I want to generate all non-empty substrings of a string of length
> >=2.
> > Also,
> > each substring is to be paired with 'string - substring' part and vice
> > versa.
> > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',

> 'c'],
> > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> > 'bd'],['bd','ac']]
> >
> > I've tried the following but i cant prevent duplicates and i'm

> missing
> > some substrings:
> >
> >
> >>>>colocn = 'abcd'
> >>>>k = 4
> >>>>for i in range(k-1):

> >
> > for j in range(1,k):
> > rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
> > rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
> > rules.append(rule1)
> > rules.append(rule2)
> >
> >>>>rules

> >
> > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
> > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
> > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
> > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
> >
> >
> > Any ideas??

>
> Algorithmic problem.
>
> First, you need to get all permutations. This is a known algorithm, that
> you'll find examples of on the net. Then for each permutation, list
> possibles 'pair-splits'.
>
> Here's a (probably sub-optimal, but it's getting late here) possible
> implementation in functional style:
>
> def rotate(lst):
> yield lst
> max = len(lst)
> for i in range(1, max):
> yield lst[i:] + lst[:-(max-i)]
>
> def permute(lst):
> if len(lst) > 2:
> for rotated in rotate(lst):
> head, tail = rotated[0], rotated[1:]
> for permuted in permute(tail):
> yield [head] + permuted
> elif len(lst) == 2:
> yield lst
> yield lst[::-1]
> else:
> yield lst
>
> def splits(lst):
> for i in range(1, len(lst)):
> yield lst[0:i], lst[i:]
>
> def allsplits(lst):
> for permuted in permute(lst):
> for pair in splits(permuted):
> yield pair
>
> def listsubstrings(thestr):
> format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
> return [format(list(pair)) for pair in allsplits(list(thestr))]
>
>
> print listsubstrings("abcd")

Thanks Bruno. I wanted to avoid permutations as it would take more time,
but i guess will have to deal with them now :((
Also this one gives each rule twice...i've to search for some double
counting...

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>





----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

girish@it.usyd.edu.au 07-12-2006 01:46 AM

Re: How to generate all substrings of a string
 
Quoting Bruno Desthuilliers <bdesth.quelquechose@free.quelquepart.fr>:

> girish@it.usyd.edu.au a écrit :
> > Hi,
> > I want to generate all non-empty substrings of a string of length
> >=2.
> > Also,
> > each substring is to be paired with 'string - substring' part and vice
> > versa.
> > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',

> 'c'],
> > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> > 'bd'],['bd','ac']]
> >
> > I've tried the following but i cant prevent duplicates and i'm

> missing
> > some substrings:
> >
> >
> >>>>colocn = 'abcd'
> >>>>k = 4
> >>>>for i in range(k-1):

> >
> > for j in range(1,k):
> > rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
> > rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
> > rules.append(rule1)
> > rules.append(rule2)
> >
> >>>>rules

> >
> > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
> > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
> > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
> > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
> >
> >
> > Any ideas??

>
> Algorithmic problem.
>
> First, you need to get all permutations. This is a known algorithm, that
> you'll find examples of on the net. Then for each permutation, list
> possibles 'pair-splits'.
>
> Here's a (probably sub-optimal, but it's getting late here) possible
> implementation in functional style:
>
> def rotate(lst):
> yield lst
> max = len(lst)
> for i in range(1, max):
> yield lst[i:] + lst[:-(max-i)]
>
> def permute(lst):
> if len(lst) > 2:
> for rotated in rotate(lst):
> head, tail = rotated[0], rotated[1:]
> for permuted in permute(tail):
> yield [head] + permuted
> elif len(lst) == 2:
> yield lst
> yield lst[::-1]
> else:
> yield lst
>
> def splits(lst):
> for i in range(1, len(lst)):
> yield lst[0:i], lst[i:]
>
> def allsplits(lst):
> for permuted in permute(lst):
> for pair in splits(permuted):
> yield pair
>
> def listsubstrings(thestr):
> format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
> return [format(list(pair)) for pair in allsplits(list(thestr))]
>
>
> print listsubstrings("abcd")
>


thanks Bruno....wanted to avoid permute function but i guess i've no no
option :((...
also there is some double counting in this one (all rules outputted
twice)...i've to find out where...
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>





----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

girish@it.usyd.edu.au 07-12-2006 02:44 AM

Re: How to generate all substrings of a string
 
Quoting girish@it.usyd.edu.au:

> Quoting Bruno Desthuilliers <bdesth.quelquechose@free.quelquepart.fr>:
>
> > girish@it.usyd.edu.au a écrit :
> > > Hi,
> > > I want to generate all non-empty substrings of a string of length
> > >=2.
> > > Also,
> > > each substring is to be paired with 'string - substring' part and

> vice
> > > versa.
> > > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'],

> ['c',
> > > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> > > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'],

> ['abc',
> > > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',

> > 'c'],
> > > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> > > 'bd'],['bd','ac']]
> > >
> > > I've tried the following but i cant prevent duplicates and i'm

> > missing
> > > some substrings:
> > >
> > >
> > >>>>colocn = 'abcd'
> > >>>>k = 4
> > >>>>for i in range(k-1):
> > >
> > > for j in range(1,k):
> > > rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
> > > rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
> > > rules.append(rule1)
> > > rules.append(rule2)
> > >
> > >>>>rules
> > >
> > > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc',

> 'd'],
> > > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad',

> 'bc'],
> > > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd',

> 'ab'],
> > > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
> > >
> > >
> > > Any ideas??

> >
> > Algorithmic problem.
> >
> > First, you need to get all permutations. This is a known algorithm,

> that
> > you'll find examples of on the net. Then for each permutation, list
> > possibles 'pair-splits'.
> >
> > Here's a (probably sub-optimal, but it's getting late here) possible
> > implementation in functional style:
> >
> > def rotate(lst):
> > yield lst
> > max = len(lst)
> > for i in range(1, max):
> > yield lst[i:] + lst[:-(max-i)]
> >
> > def permute(lst):
> > if len(lst) > 2:
> > for rotated in rotate(lst):
> > head, tail = rotated[0], rotated[1:]
> > for permuted in permute(tail):
> > yield [head] + permuted
> > elif len(lst) == 2:
> > yield lst
> > yield lst[::-1]
> > else:
> > yield lst
> >
> > def splits(lst):
> > for i in range(1, len(lst)):
> > yield lst[0:i], lst[i:]
> >
> > def allsplits(lst):
> > for permuted in permute(lst):
> > for pair in splits(permuted):
> > yield pair
> >
> > def listsubstrings(thestr):
> > format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
> > return [format(list(pair)) for pair in allsplits(list(thestr))]
> >
> >
> > print listsubstrings("abcd")
> >

>
> thanks Bruno....wanted to avoid permute function but i guess i've no no
> option :((...
> also there is some double counting in this one (all rules outputted
> twice)...i've to find out where...

A Recursive Attempt:
def gen(s):
sList = [s[:i]+s[i+1:] for i in range(len(s))]
substringList = sList
s = sList
while len(s[0]) != 1:
substrings = []
for string in s:
sList = [string[:i]+string[i+1:] for i in range(len(string))]
substrings.extend(sList)
s = set(substrings)
s = list(s)
substringList.extend(s)
print substringList
return substringList
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >

>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> --
> http://mail.python.org/mailman/listinfo/python-list
>





----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Tal Einat 07-12-2006 11:23 AM

Re: Generating all ordered substrings of a string
 

girish@it.usyd.edu.au wrote:
> Hi,
> I want to generate all non-empty substrings of a string of length >=2.
> Also,
> each substring is to be paired with 'string - substring' part and vice
> versa.
> Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
> ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> 'bd'],['bd','ac']]
>


In your last example you have ['ac','bd'], but neither 'ac' nor 'bd' is
a _substring_ of 'abcd'.

If you want to compute all possible (non-empty) sub-groups of a group
(a group of characters, in your case), that's really quite a common
algorthmic problem and you should be able to Google for a solution.

Once you have all possible subgroups, just make your (weird) pairs,
remove doubles (either by using a set or by sorting and removing
identical neighboring objects), and you're done.

If you're looking for a more efficient solution, specialized for your
specific problem, you'll have to explain more precisely what you're
trying to do, as well as why existing solutions aren't good enough.

- Tal


Thorsten Kampe 07-12-2006 06:11 PM

Re: Generating all ordered substrings of a string
 
* girish@it.usyd.edu.au (2006-07-11 10:20 +0000)
> I want to generate all non-empty substrings of a string of length >=2.
> Also,
> each substring is to be paired with 'string - substring' part and vice
> versa.
> Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
> ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> 'bd'],['bd','ac']]


No, you don't want to generate all substrings, you want to generate
all partions of a given set with length 2:

filter(lambda x: len(x) == 2, part(['abcd']))

I've written an utility that generates all possible partions of a set;
the "pairing" as you call it, is trivial, so you can do it yourself

def part(seq):
import copy
partset = [[]]
for item in seq:
newpartset = []
for partition in partset:
for index in range(len(partition)):
newpartset.append(copy.deepcopy(partition))
newpartset[-1][index].append(item)
partition.append([item])
newpartset.append(partition)
partset = newpartset
return partset

Gerard Flanagan 07-12-2006 08:30 PM

Re: Generating all ordered substrings of a string
 

girish@it.usyd.edu.au wrote:
> Hi,
> I want to generate all non-empty substrings of a string of length >=2.
> Also,
> each substring is to be paired with 'string - substring' part and vice
> versa.
> Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
> ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> 'bd'],['bd','ac']]
>


from a previous post
(http://groups.google.com/group/comp....f5b578bb00e61b)

def nkRange(n,k):
m = n - k + 1
indexer = range(0, k)
vector = range(1, k+1)
last = range(m, n+1)
yield vector
while vector != last:
high_value = -1
high_index = -1
for i in indexer:
val = vector[i]
if val > high_value and val < m + i:
high_value = val
high_index = i
for j in range(k - high_index):
vector[j+high_index] = high_value + j + 1
yield vector

def kSubsets(s, k):
for vector in nkRange(len(s),k):
yield ''.join( s[i-1] for i in vector )

print list( kSubsets('abcd',2) )

['ab', 'ac', 'ad', 'bc', 'bd', 'cd']


Thorsten Kampe 07-13-2006 07:41 AM

Re: Generating all ordered substrings of a string
 
* Thorsten Kampe (2006-07-12 19:11 +0000)
> filter(lambda x: len(x) == 2, part(['abcd']))


That's "filter(lambda x: len(x) == 2, part('abcd'))" of course...


All times are GMT. The time now is 10:09 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.