Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: create lowercase strings in lists - was: (No subject)

Reply
Thread Tools

Re: create lowercase strings in lists - was: (No subject)

 
 
Mark Devine
Guest
Posts: n/a
 
      12-16-2004
Actually what I want is element 'class-map match-all cmap1' from list 1 to match 'class-map cmap1 (match-all)' or 'class-map cmap1 mark match-all done' in list 2 but not to match 'class-map cmap1'.
Each element in both lists have multiple words in them. If all the words of any element of the first list appear in any order within any element of the second list I want a match but if any of the words are missing then there is no match. There are far more elements in list 2 than in list 1.




Steve Holden <(E-Mail Removed)> wrote:

>
> Mark Devine wrote:
>
> > Sorry for not putting a subject in the last e-mail. The function lower suited my case exactly. Here however is my main problem:
> > Given that my new list is :
> > [class-map match-all cmap1', 'match ip any', 'class-map match-any cmap2', 'match any', 'policy-map policy1', 'class cmap1', 'policy-map policy2', 'service-policy policy1', 'class cmap2']
> >
> > Each element in my new list could appear in any order together within another larger list (list1) and I want to count how many matches occur. For example the larger list could have an element 'class-map cmap2 (match any)' and I want to match that but if only 'class-map match-any' or 'class-map cmap2' appears I don't want it to match.
> >
> > Can anybody help?
> > Is my problem clearly stated?
> >

>
> Well, let's see: you'd like to know which strings occur in both lists,
> right?
>
> You might like to look at the "Efficient grep using Python?" thread for
> suggestions. My favorite would be:
>
> .>>> lst1 = ["ab", "ac", "ba", "bb", "bc"]
> .>>> lst2 = ["ac", "ab", "bd", "cb", "bb"]
> .>>> dct1 = dict.fromkeys(lst1)
> .>>> [x for x in lst2 if x not in dct1]
> ['bd', 'cb']
> .>>> [x for x in lst2 if x in dct1]
> ['ac', 'ab', 'bb']
>
> regards
> Steve
> --
> Steve Holden http://www.holdenweb.com/
> Python Web Programming http://pydish.holdenweb.com/
> Holden Web LLC +1 703 861 4237 +1 800 494 3119
> --
> http://mail.python.org/mailman/listinfo/python-list
>




__________________________________________________ _______________
Sign up for eircom broadband now and get a free two month trial.*
Phone 1850 73 00 73 or visit http://home.eircom.net/broadbandoffer


 
Reply With Quote
 
 
 
 
Steve Holden
Guest
Posts: n/a
 
      12-16-2004
Mark Devine wrote:

> Actually what I want is element 'class-map match-all cmap1' from list 1 to match 'class-map cmap1 (match-all)' or 'class-map cmap1 mark match-all done' in list 2 but not to match 'class-map cmap1'.
> Each element in both lists have multiple words in them. If all the words of any element of the first list appear in any order within any element of the second list I want a match but if any of the words are missing then there is no match. There are far more elements in list 2 than in list 1.
>

Well since that's the case it would seem you'd be best processing each
item from the large list against the small list, though in truth it may
not make any difference.

It looks like the best way to proceed might be to reduce each line to a
canonical form -- strip the parens and other irrelevant characters out,
and sort the words in order. After that it'd be relatively simple to
determine whether two lines match - they'd be the same!

The only slight wrinkle would be keeping the original lines for
reference, but that's not difficult.

Does this give you enough of an idea, or do you need code samples?

regards
Steve
>
>
>
> Steve Holden <(E-Mail Removed)> wrote:
>
>
>>Mark Devine wrote:
>>
>>
>>>Sorry for not putting a subject in the last e-mail. The function lower suited my case exactly. Here however is my main problem:
>>>Given that my new list is :
>>>[class-map match-all cmap1', 'match ip any', 'class-map match-any cmap2', 'match any', 'policy-map policy1', 'class cmap1', 'policy-map policy2', 'service-policy policy1', 'class cmap2']
>>>
>>>Each element in my new list could appear in any order together within another larger list (list1) and I want to count how many matches occur. For example the larger list could have an element 'class-map cmap2 (match any)' and I want to match that but if only 'class-map match-any' or 'class-map cmap2' appears I don't want it to match.
>>>
>>>Can anybody help?
>>>Is my problem clearly stated?
>>>

>>
>>Well, let's see: you'd like to know which strings occur in both lists,
>>right?
>>
>>You might like to look at the "Efficient grep using Python?" thread for
>> suggestions. My favorite would be:
>>
>>.>>> lst1 = ["ab", "ac", "ba", "bb", "bc"]
>>.>>> lst2 = ["ac", "ab", "bd", "cb", "bb"]
>>.>>> dct1 = dict.fromkeys(lst1)
>>.>>> [x for x in lst2 if x not in dct1]
>>['bd', 'cb']
>>.>>> [x for x in lst2 if x in dct1]
>>['ac', 'ab', 'bb']
>>
>>regards
>> Steve
>>--
>>Steve Holden http://www.holdenweb.com/
>>Python Web Programming http://pydish.holdenweb.com/
>>Holden Web LLC +1 703 861 4237 +1 800 494 3119
>>--
>>http://mail.python.org/mailman/listinfo/python-list
>>

>
>
>
>
> __________________________________________________ _______________
> Sign up for eircom broadband now and get a free two month trial.*
> Phone 1850 73 00 73 or visit http://home.eircom.net/broadbandoffer
>
>



--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119
 
Reply With Quote
 
 
 
 
Mike Meyer
Guest
Posts: n/a
 
      12-17-2004
Steve Holden <(E-Mail Removed)> writes:

> Mark Devine wrote:
>
>> Actually what I want is element 'class-map match-all cmap1' from list 1 to match 'class-map cmap1 (match-all)' or 'class-map cmap1 mark match-all done' in list 2 but not to match 'class-map cmap1'.
>> Each element in both lists have multiple words in them. If all the words of any element of the first list appear in any order within any element of the second list I want a match but if any of the words are missing then there is no match. There are far more elements in list 2 than in list 1.
>>

> Well since that's the case it would seem you'd be best processing each
> item from the large list against the small list, though in truth it
> may not make any difference.
>
> It looks like the best way to proceed might be to reduce each line to
> a canonical form -- strip the parens and other irrelevant characters
> out, and sort the words in order. After that it'd be relatively simple
> to determine whether two lines match - they'd be the same!


No, that doesn't work. What happens if an element of the second list
has *more* words than the element in the first list? In that case, the
two canonical forms would be different, but it should still be a
match.

How about this (If I had sample data, I'd try it out directly...):

Create a dictionary of sets. For each word in an element in the small
list, insert into the set indexed by that word in the dictionary a
tuple version of the list (you'll want to create the tuples in
advance, and associate them with each list somehow).

Then go through the long list, and for each element collect all the
sets that are indexed by the words in that element, and take the
intersection of them all. If there are any tuples in the intersection,
then you have a match.

<mike
--
Mike Meyer <(E-Mail Removed)> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
 
Reply With Quote
 
Michael Spencer
Guest
Posts: n/a
 
      12-17-2004
Steve Holden wrote:
> Mark Devine wrote:
>
>> Actually what I want is element 'class-map match-all cmap1' from list
>> 1 to match 'class-map cmap1 (match-all)' or 'class-map cmap1 mark
>> match-all done' in list 2 but not to match 'class-map cmap1'.
>> Each element in both lists have multiple words in them. If all the
>> words of any element of the first list appear in any order within any
>> element of the second list I want a match but if any of the words are
>> missing then there is no match. There are far more elements in list 2
>> than in list 1.
>>

>

sounds like a case for sets...

>>> # NB Python 2.4

...
>>> # Test if the words of list2 elements appear in any order in list1 elements
>>> # disregarding case and parens

...
>>> # Reference list
>>> list1 = ["a b C (D)",

... "D A B",
... "A B E"]
>>> # Test list
>>> list2 = ["A B C D", #True

... "A B D", #True
... "A E F", #False
... "A (E) B", #True
... "A B", #True
... "E A B" ]
...
>>> def normalize(text, unwanted = "()"):

... conv = "".join(char.lower() for char in text if char not in unwanted)
... return set(conv.split())
...
>>> reflist = [normalize(element) for element in list1]
>>> print reflist

...
[set(['a', 'c', 'b', 'd']), set(['a', 'b', 'd']), set(['a', 'b', 'e'])]

This is the list of sets to test against


>>> def testmember(element):

... """is element a member of the reflist, according to the above rules?"""
... testelement = normalize(element)
... #brute force comparison until match - depends on small reflist
... for el in reflist:
... if el.issuperset(testelement):
... return True
... return False
...
>>> for element in list2:

... print element, testmember(element)
...
A B C D True
A B D True
A E F False
A (E) B True
A B True
E A B True
>>>


Michael

 
Reply With Quote
 
Steven Bethard
Guest
Posts: n/a
 
      12-17-2004
Michael Spencer wrote:
> ... conv = "".join(char.lower() for char in text if char not in
> unwanted)


Probably a good place to use str.replace, e.g.

conv = text.lower()
for char in unwanted:
conv = conv.replace(char, '')

Some timings to support my assertion: =)

C:\Documents and Settings\Steve>python -m timeit -s "s =
''.join(map(str, range(100)))" "s = ''.join(c for c in s if c not in '01')"
10000 loops, best of 3: 74.6 usec per loop

C:\Documents and Settings\Steve>python -m timeit -s "s =
''.join(map(str, range(100)))" "for c in '01': s = s.replace(c, '')"
100000 loops, best of 3: 2.82 usec per loop

Steve
 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      12-17-2004
On Fri, 17 Dec 2004 02:06:01 GMT, Steven Bethard <(E-Mail Removed)> wrote:

>Michael Spencer wrote:
>> ... conv = "".join(char.lower() for char in text if char not in
>> unwanted)

>
>Probably a good place to use str.replace, e.g.
>
>conv = text.lower()
>for char in unwanted:
> conv = conv.replace(char, '')
>
>Some timings to support my assertion: =)
>
>C:\Documents and Settings\Steve>python -m timeit -s "s =
>''.join(map(str, range(100)))" "s = ''.join(c for c in s if c not in '01')"
>10000 loops, best of 3: 74.6 usec per loop
>
>C:\Documents and Settings\Steve>python -m timeit -s "s =
>''.join(map(str, range(100)))" "for c in '01': s = s.replace(c, '')"
>100000 loops, best of 3: 2.82 usec per loop
>

If unwanted has more than one character in it, I would expect unwanted as
deletechars in

>>> help(str.translate)

Help on method_descriptor:

translate(...)
S.translate(table [,deletechars]) -> string

Return a copy of the string S, where all characters occurring
in the optional argument deletechars are removed, and the
remaining characters have been mapped through the given
translation table, which must be a string of length 256.

to compete well, if table setup were for free
(otherwise, UIAM, table should be ''.join([chr(i) for i in xrange(256)])
for identity translation, and that might pay for a couple of .replace loops,
depending).

Regards,
Bengt Richter
 
Reply With Quote
 
Michael Spencer
Guest
Posts: n/a
 
      12-17-2004
Bengt Richter wrote:
> On Fri, 17 Dec 2004 02:06:01 GMT, Steven Bethard <(E-Mail Removed)> wrote:
>
>
>>Michael Spencer wrote:
>>
>>> ... conv = "".join(char.lower() for char in text if char not in
>>>unwanted)

>>
>>Probably a good place to use str.replace, e.g.
>>
>>conv = text.lower()
>>for char in unwanted:
>> conv = conv.replace(char, '')
>>
>>Some timings to support my assertion: =)
>>
>>C:\Documents and Settings\Steve>python -m timeit -s "s =
>>''.join(map(str, range(100)))" "s = ''.join(c for c in s if c not in '01')"
>>10000 loops, best of 3: 74.6 usec per loop
>>
>>C:\Documents and Settings\Steve>python -m timeit -s "s =
>>''.join(map(str, range(100)))" "for c in '01': s = s.replace(c, '')"
>>100000 loops, best of 3: 2.82 usec per loop
>>

Well, sure, if it's just speed, conciseness and backwards-compatibility that you
want

>
> If unwanted has more than one character in it, I would expect unwanted as
> deletechars in
>
> >>> help(str.translate)

> Help on method_descriptor:
>
> translate(...)
> S.translate(table [,deletechars]) -> string
>
> Return a copy of the string S, where all characters occurring
> in the optional argument deletechars are removed, and the
> remaining characters have been mapped through the given
> translation table, which must be a string of length 256.
>
> to compete well, if table setup were for free
> (otherwise, UIAM, table should be ''.join([chr(i) for i in xrange(256)])
> for identity translation, and that might pay for a couple of .replace loops,
> depending).
>
> Regards,
> Bengt Richter

Good point - and there is string.maketrans to set up the table too. So
normalize can be rewritten as:


def normalize1(text, unwanted = "()", table = maketrans("","")):
text = text.lower()
text.translate(table,unwanted)
return set(text.split())

which gives:
>>> t= timeit.Timer("normalize1('(UPPER CASE) lower case')", "from listmembers

import normalize1")
>>> t.repeat(3,10000)

[0.29812783468287307, 0.29807782832722296, 0.3021370034462052]


But, while we're at it, we can use str.translate to do the case conversion too:

So:

def normalize2(text, unwanted = "()", table =
maketrans(ascii_uppercase,ascii_lowercase)):
text.translate(table,unwanted)
return set(text.split())

>>> t= timeit.Timer("normalize2('(UPPER CASE) lower case')", "from listmembers

import normalize2")
>>> t.repeat(3,10000)

[0.24295154831133914, 0.24174497038029585, 0.25234855267899547]


....which is a little faster still

Thanks for the comments: they were interesting for me - hope some of this is
useful to OP

Regards

Michael








 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
List of lists of lists of lists... =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==?= Python 5 05-15-2006 11:47 AM
converting lists to strings to lists robin Python 10 04-12-2006 04:58 PM
Re: create lowercase strings in lists - was: (No subject) Mark Devine Python 3 12-17-2004 02:13 PM
Re: create lowercase strings in lists - was: (No subject) Mark Devine Python 1 12-17-2004 09:49 AM
Re: create lowercase strings in lists - was: (No subject) Mark Devine Python 2 12-16-2004 08:28 PM



Advertisments