Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Generating all ordered substrings of a string

Reply
Thread Tools

Generating all ordered substrings of a string

 
 
girish@it.usyd.edu.au
Guest
Posts: n/a
 
      07-11-2006
Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:

>>> colocn = 'abcd'
>>> k = 4
>>> for i in range(k-1):

for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)
>>> rules

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]


Any ideas??

TIA,
girish



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
 
Reply With Quote
 
 
 
 
Bruno Desthuilliers
Guest
Posts: n/a
 
      07-12-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) a écrit :
> Hi,
> I want to generate all non-empty substrings of a string of length >=2.
> Also,
> each substring is to be paired with 'string - substring' part and vice
> versa.
> Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
> ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> 'bd'],['bd','ac']]
>
> I've tried the following but i cant prevent duplicates and i'm missing
> some substrings:
>
>
>>>>colocn = 'abcd'
>>>>k = 4
>>>>for i in range(k-1):

>
> for j in range(1,k):
> rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
> rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
> rules.append(rule1)
> rules.append(rule2)
>
>>>>rules

>
> [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
> ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
> ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
> ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
>
>
> Any ideas??


Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[max-i)]

def permute(lst):
if len(lst) > 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]


print listsubstrings("abcd")






































 
Reply With Quote
 
 
 
 
girish@it.usyd.edu.au
Guest
Posts: n/a
 
      07-12-2006
Quoting Bruno Desthuilliers <(E-Mail Removed)>:

> (E-Mail Removed) a écrit :
> > Hi,
> > I want to generate all non-empty substrings of a string of length
> >=2.
> > Also,
> > each substring is to be paired with 'string - substring' part and vice
> > versa.
> > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',

> 'c'],
> > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> > 'bd'],['bd','ac']]
> >
> > I've tried the following but i cant prevent duplicates and i'm

> missing
> > some substrings:
> >
> >
> >>>>colocn = 'abcd'
> >>>>k = 4
> >>>>for i in range(k-1):

> >
> > for j in range(1,k):
> > rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
> > rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
> > rules.append(rule1)
> > rules.append(rule2)
> >
> >>>>rules

> >
> > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
> > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
> > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
> > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
> >
> >
> > Any ideas??

>
> Algorithmic problem.
>
> First, you need to get all permutations. This is a known algorithm, that
> you'll find examples of on the net. Then for each permutation, list
> possibles 'pair-splits'.
>
> Here's a (probably sub-optimal, but it's getting late here) possible
> implementation in functional style:
>
> def rotate(lst):
> yield lst
> max = len(lst)
> for i in range(1, max):
> yield lst[i:] + lst[max-i)]
>
> def permute(lst):
> if len(lst) > 2:
> for rotated in rotate(lst):
> head, tail = rotated[0], rotated[1:]
> for permuted in permute(tail):
> yield [head] + permuted
> elif len(lst) == 2:
> yield lst
> yield lst[::-1]
> else:
> yield lst
>
> def splits(lst):
> for i in range(1, len(lst)):
> yield lst[0:i], lst[i:]
>
> def allsplits(lst):
> for permuted in permute(lst):
> for pair in splits(permuted):
> yield pair
>
> def listsubstrings(thestr):
> format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
> return [format(list(pair)) for pair in allsplits(list(thestr))]
>
>
> print listsubstrings("abcd")

Thanks Bruno. I wanted to avoid permutations as it would take more time,
but i guess will have to deal with them now (
Also this one gives each rule twice...i've to search for some double
counting...

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>





----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
 
Reply With Quote
 
girish@it.usyd.edu.au
Guest
Posts: n/a
 
      07-12-2006
Quoting Bruno Desthuilliers <(E-Mail Removed)>:

> (E-Mail Removed) a écrit :
> > Hi,
> > I want to generate all non-empty substrings of a string of length
> >=2.
> > Also,
> > each substring is to be paired with 'string - substring' part and vice
> > versa.
> > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',

> 'c'],
> > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> > 'bd'],['bd','ac']]
> >
> > I've tried the following but i cant prevent duplicates and i'm

> missing
> > some substrings:
> >
> >
> >>>>colocn = 'abcd'
> >>>>k = 4
> >>>>for i in range(k-1):

> >
> > for j in range(1,k):
> > rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
> > rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
> > rules.append(rule1)
> > rules.append(rule2)
> >
> >>>>rules

> >
> > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
> > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
> > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
> > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
> >
> >
> > Any ideas??

>
> Algorithmic problem.
>
> First, you need to get all permutations. This is a known algorithm, that
> you'll find examples of on the net. Then for each permutation, list
> possibles 'pair-splits'.
>
> Here's a (probably sub-optimal, but it's getting late here) possible
> implementation in functional style:
>
> def rotate(lst):
> yield lst
> max = len(lst)
> for i in range(1, max):
> yield lst[i:] + lst[max-i)]
>
> def permute(lst):
> if len(lst) > 2:
> for rotated in rotate(lst):
> head, tail = rotated[0], rotated[1:]
> for permuted in permute(tail):
> yield [head] + permuted
> elif len(lst) == 2:
> yield lst
> yield lst[::-1]
> else:
> yield lst
>
> def splits(lst):
> for i in range(1, len(lst)):
> yield lst[0:i], lst[i:]
>
> def allsplits(lst):
> for permuted in permute(lst):
> for pair in splits(permuted):
> yield pair
>
> def listsubstrings(thestr):
> format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
> return [format(list(pair)) for pair in allsplits(list(thestr))]
>
>
> print listsubstrings("abcd")
>


thanks Bruno....wanted to avoid permute function but i guess i've no no
option (...
also there is some double counting in this one (all rules outputted
twice)...i've to find out where...
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>





----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
 
Reply With Quote
 
girish@it.usyd.edu.au
Guest
Posts: n/a
 
      07-12-2006
Quoting (E-Mail Removed):

> Quoting Bruno Desthuilliers <(E-Mail Removed)>:
>
> > (E-Mail Removed) a écrit :
> > > Hi,
> > > I want to generate all non-empty substrings of a string of length
> > >=2.
> > > Also,
> > > each substring is to be paired with 'string - substring' part and

> vice
> > > versa.
> > > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'],

> ['c',
> > > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> > > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'],

> ['abc',
> > > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',

> > 'c'],
> > > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> > > 'bd'],['bd','ac']]
> > >
> > > I've tried the following but i cant prevent duplicates and i'm

> > missing
> > > some substrings:
> > >
> > >
> > >>>>colocn = 'abcd'
> > >>>>k = 4
> > >>>>for i in range(k-1):
> > >
> > > for j in range(1,k):
> > > rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
> > > rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
> > > rules.append(rule1)
> > > rules.append(rule2)
> > >
> > >>>>rules
> > >
> > > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc',

> 'd'],
> > > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad',

> 'bc'],
> > > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd',

> 'ab'],
> > > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
> > >
> > >
> > > Any ideas??

> >
> > Algorithmic problem.
> >
> > First, you need to get all permutations. This is a known algorithm,

> that
> > you'll find examples of on the net. Then for each permutation, list
> > possibles 'pair-splits'.
> >
> > Here's a (probably sub-optimal, but it's getting late here) possible
> > implementation in functional style:
> >
> > def rotate(lst):
> > yield lst
> > max = len(lst)
> > for i in range(1, max):
> > yield lst[i:] + lst[max-i)]
> >
> > def permute(lst):
> > if len(lst) > 2:
> > for rotated in rotate(lst):
> > head, tail = rotated[0], rotated[1:]
> > for permuted in permute(tail):
> > yield [head] + permuted
> > elif len(lst) == 2:
> > yield lst
> > yield lst[::-1]
> > else:
> > yield lst
> >
> > def splits(lst):
> > for i in range(1, len(lst)):
> > yield lst[0:i], lst[i:]
> >
> > def allsplits(lst):
> > for permuted in permute(lst):
> > for pair in splits(permuted):
> > yield pair
> >
> > def listsubstrings(thestr):
> > format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
> > return [format(list(pair)) for pair in allsplits(list(thestr))]
> >
> >
> > print listsubstrings("abcd")
> >

>
> thanks Bruno....wanted to avoid permute function but i guess i've no no
> option (...
> also there is some double counting in this one (all rules outputted
> twice)...i've to find out where...

A Recursive Attempt:
def gen(s):
sList = [s[:i]+s[i+1:] for i in range(len(s))]
substringList = sList
s = sList
while len(s[0]) != 1:
substrings = []
for string in s:
sList = [string[:i]+string[i+1:] for i in range(len(string))]
substrings.extend(sList)
s = set(substrings)
s = list(s)
substringList.extend(s)
print substringList
return substringList
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >

>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> --
> http://mail.python.org/mailman/listinfo/python-list
>





----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
 
Reply With Quote
 
Tal Einat
Guest
Posts: n/a
 
      07-12-2006

(E-Mail Removed) wrote:
> Hi,
> I want to generate all non-empty substrings of a string of length >=2.
> Also,
> each substring is to be paired with 'string - substring' part and vice
> versa.
> Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
> ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> 'bd'],['bd','ac']]
>


In your last example you have ['ac','bd'], but neither 'ac' nor 'bd' is
a _substring_ of 'abcd'.

If you want to compute all possible (non-empty) sub-groups of a group
(a group of characters, in your case), that's really quite a common
algorthmic problem and you should be able to Google for a solution.

Once you have all possible subgroups, just make your (weird) pairs,
remove doubles (either by using a set or by sorting and removing
identical neighboring objects), and you're done.

If you're looking for a more efficient solution, specialized for your
specific problem, you'll have to explain more precisely what you're
trying to do, as well as why existing solutions aren't good enough.

- Tal

 
Reply With Quote
 
Thorsten Kampe
Guest
Posts: n/a
 
      07-12-2006
* (E-Mail Removed) (2006-07-11 10:20 +0000)
> I want to generate all non-empty substrings of a string of length >=2.
> Also,
> each substring is to be paired with 'string - substring' part and vice
> versa.
> Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
> ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> 'bd'],['bd','ac']]


No, you don't want to generate all substrings, you want to generate
all partions of a given set with length 2:

filter(lambda x: len(x) == 2, part(['abcd']))

I've written an utility that generates all possible partions of a set;
the "pairing" as you call it, is trivial, so you can do it yourself

def part(seq):
import copy
partset = [[]]
for item in seq:
newpartset = []
for partition in partset:
for index in range(len(partition)):
newpartset.append(copy.deepcopy(partition))
newpartset[-1][index].append(item)
partition.append([item])
newpartset.append(partition)
partset = newpartset
return partset
 
Reply With Quote
 
Gerard Flanagan
Guest
Posts: n/a
 
      07-12-2006

(E-Mail Removed) wrote:
> Hi,
> I want to generate all non-empty substrings of a string of length >=2.
> Also,
> each substring is to be paired with 'string - substring' part and vice
> versa.
> Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
> 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
> 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
> ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> 'bd'],['bd','ac']]
>


from a previous post
(http://groups.google.com/group/comp....f5b578bb00e61b)

def nkRange(n,k):
m = n - k + 1
indexer = range(0, k)
vector = range(1, k+1)
last = range(m, n+1)
yield vector
while vector != last:
high_value = -1
high_index = -1
for i in indexer:
val = vector[i]
if val > high_value and val < m + i:
high_value = val
high_index = i
for j in range(k - high_index):
vector[j+high_index] = high_value + j + 1
yield vector

def kSubsets(s, k):
for vector in nkRange(len(s),k):
yield ''.join( s[i-1] for i in vector )

print list( kSubsets('abcd',2) )

['ab', 'ac', 'ad', 'bc', 'bd', 'cd']

 
Reply With Quote
 
Thorsten Kampe
Guest
Posts: n/a
 
      07-13-2006
* Thorsten Kampe (2006-07-12 19:11 +0000)
> filter(lambda x: len(x) == 2, part(['abcd']))


That's "filter(lambda x: len(x) == 2, part('abcd'))" of course...
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to get all possible substrings Robert Reno C++ 4 01-28-2010 12:46 PM
Ordered list inside ordered list DL Javascript 6 11-21-2009 11:43 PM
newbie help - finding all substrings with index? kadau Perl Misc 3 03-25-2005 01:20 PM
Replacing palindrome substrings of an input string with a given string Tung Chau C Programming 1 08-06-2004 07:27 PM
Replacing palindrome substrings of an input string with a given string. Any effecient algorithm? Tung Chau C Programming 0 08-06-2004 10:18 AM



Advertisments