Velocity Reviews > finding sublist

# finding sublist

giampiero mu
Guest
Posts: n/a

 08-02-2005
hi everyone

my target is implement a function controlla(string - a binary string-)
that check if there is in a string two equal not overlapping
subsequences at least of length limitem:

my code:

def controlla(test):
# print test
limitem=4
lunghezz=len(test)

l1=lunghezz-limitem+1
l2=lunghezz-limitem+1
f=0

for ii in range(l1):
for kk in range(l2):
t1=test[ii:ii+limitem]
t2=test[kk:kk+limitem]
if abs(ii-kk)>=limitem :
if t1==t2 :
f=1
if f==1 :
break
break
break
if f==0 :
return test
else:
return 'no'

the question is: Is there a faster way doing that? I don't know,
changing string into list or array, using map, or module, using
different loop, regular expression,funcional programming , list
comprehensions , sets, different looping techniques, i dont
know.......(!)

Kent Johnson
Guest
Posts: n/a

 08-02-2005
giampiero mu wrote:
> hi everyone
>
> my target is implement a function controlla(string - a binary string-)
> that check if there is in a string two equal not overlapping
> subsequences at least of length limitem:

You can do this with a regular expression:

>>> import re
>>> r=re.compile(r'(?P<seq>.{4,}).*(?P=seq)')
>>> r.match('abcaoeaeoudabc')
>>> r.match('abcdaeaeuabcd')

<_sre.SRE_Match object at 0x008D67E0>
>>> _.group(1)

'abcd'
>>> r.match('abcdefgaeaeuabcdefg')

<_sre.SRE_Match object at 0x008D68E0>
>>> _.group(1)

'abcdefg'

Kent

>
> my code:
>
> def controlla(test):
> # print test
> limitem=4
> lunghezz=len(test)
>
> l1=lunghezz-limitem+1
> l2=lunghezz-limitem+1
> f=0
>
> for ii in range(l1):
> for kk in range(l2):
> t1=test[ii:ii+limitem]
> t2=test[kk:kk+limitem]
> if abs(ii-kk)>=limitem :
> if t1==t2 :
> f=1
> if f==1 :
> break
> break
> break
> if f==0 :
> return test
> else:
> return 'no'
>
>
> the question is: Is there a faster way doing that? I don't know,
> changing string into list or array, using map, or module, using
> different loop, regular expression,funcional programming , list
> comprehensions , sets, different looping techniques, i dont
> know.......(!)
>

Guest
Posts: n/a

 08-02-2005
giampiero mu wrote:
> hi everyone

Hi, you appear to be fairly new to Python, so you might want to take a
look at the tutorial at Python.org

Kents suggestion to use RE is good.

This should be faster than your example by quite a bit if you don't want
to use RE.

def controlla(test, size=4):
# Return substring if a second substring is found.
# Return None if no match found.

for i in range(len(test)-size):
match=test[i:i+size]
left=test[:i]
right=test[i+size:]
if match in left or match in right:
return match

print controlla('qqqabcdrrabcd')

--> 'abcd'

Here's a few notes on your example for future reference:

Multiple breaks don't work. The first break will jump out of the loop
before the other breaks are reached.

Any function that ends without a return will return None. So you don't
need to return 'no'.

Cheers,
Ron

Johan Lindberg
Guest
Posts: n/a

 08-02-2005
Hello.

> my target is implement a function controlla(string - a binary string-)
> that check if there is in a string two equal not overlapping
> subsequences at least of length limitem:
>
> my code:
> [snip]
>

I may have misunderstood it, but does your function work the way you
want it to?

>>>controlla("testeststest")

no

I can't get it to print anything other than "no". But then again, I'm
reading and posting via Google and I guess all those break statements
shouldn't be on the same indent-level.

> the question is: Is there a faster way doing that? I don't know,
> changing string into list or array, using map, or module, using
> different loop, regular expression,funcional programming , list
> comprehensions , sets, different looping techniques, i dont
> know.......(!)

Since you're using nested for loops when searching the string you
should make sure not to perform more iterations than neccessary. The
function below returns a list of all, non-overlapping, substrings in
text where len(substring)>= minLength. The outer loop is limited to
about half of the length of the text which is where most of the speed
comes from but I'm sure it can be tweaked for more.

def foo(text, minLength):
result= []
for length in range(minLength, len(text)/ 2+ 1):
for start in range(len(text)):
end= start+ length
if end< len(text):
part= text[start:end]
if text.find(part, end)!= -1:
if part not in result:
result.append(part)

return result

>>>foo("testeststest", 4)

['test', 'stes', 'stest']

>>>foo("testeststest", 3)

['tes', 'est', 'ste', 'test', 'stes', 'stest']

HTH
Johan Lindberg
http://www.velocityreviews.com/forums/(E-Mail Removed)

giampiero mu
Guest
Posts: n/a

 08-02-2005
controlla("12345678") -> "12345678"

thanks everyone. only a question. there is a way to advantage of binary
sequences?

Johan Lindberg
Guest
Posts: n/a

 08-03-2005
> thanks everyone. only a question. there is a way to advantage of binary
> sequences?

I doubt you'll find any way to optimize the code that somehow only
applies to binary sequences. You still have to find each possible
subsequence of minimum length within the sequence and compare it to all
other possible subsequences and that's what's going to take most of the
time.

If you haven't already, check out psyco
(http://psyco.sourceforge.net/). It will most definitely make your code
run faster.

BR
Johan Lindberg
(E-Mail Removed)