Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > multi split function taking delimiter list

Reply
Thread Tools

multi split function taking delimiter list

 
 
martinskou@gmail.com
Guest
Posts: n/a
 
      11-14-2006
Hi, I'm looking for something like:

multi_split( 'a:=b+c' , [':=','+'] )

returning:
['a', ':=', 'b', '+', 'c']

whats the python way to achieve this, preferably without regexp?

Thanks.

Martin

 
Reply With Quote
 
 
 
 
Raymond Hettinger
Guest
Posts: n/a
 
      11-14-2006

http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Hi, I'm looking for something like:
>
> multi_split( 'a:=b+c' , [':=','+'] )
>
> returning:
> ['a', ':=', 'b', '+', 'c']
>
> whats the python way to achieve this, preferably without regexp?


I think regexps are likely the right way to do this kind of
tokenization.

The string split() method doesn't return the split value so that is
less than helpful for your application: 'a=b'.split() --> ['a',
'b']

The new str.partition() method will return the split value and is
suitable for successive applications: 'a:=b+c'.partition(':=') -->
('a', ':=', 'b+c')

FWIW, when someone actually does want something that behaves like
str.split() but with multiple split values, one approach is to replace
each of the possible splitters with a single splitter:

def multi_split(s, splitters):
first = splitters[0]
for splitter in splitters:
s = s.replace(splitter, first)
return s.split(first)

print multi_split( 'a:=b+c' , [':=','+'] )


Raymond

 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      11-14-2006
(E-Mail Removed) wrote:

> Hi, I'm looking for something like:
>
> multi_split( 'a:=b+c' , [':=','+'] )
>
> returning:
> ['a', ':=', 'b', '+', 'c']
>
> whats the python way to achieve this, preferably without regexp?


I think in this case the regexp approach is the simplest, though:

>>> def multi_split(text, splitters):

.... return re.split("(%s)" % "|".join(re.escape(splitter) for splitter
in splitters), text)
....
>>> multi_split("a:=b+c", [":=", "+"])

['a', ':=', 'b', '+', 'c']

Peter
 
Reply With Quote
 
Kent Johnson
Guest
Posts: n/a
 
      11-14-2006
(E-Mail Removed) wrote:
> Hi, I'm looking for something like:
>
> multi_split( 'a:=b+c' , [':=','+'] )
>
> returning:
> ['a', ':=', 'b', '+', 'c']
>
> whats the python way to achieve this, preferably without regexp?


What do you have against regexp? re.split() does exactly what you want:

In [1]: import re

In [2]: re.split(r'(:=|\+)', 'a:=b+c')
Out[2]: ['a', ':=', 'b', '+', 'c']

Kent
 
Reply With Quote
 
Paddy
Guest
Posts: n/a
 
      11-14-2006
(E-Mail Removed) wrote:

> Hi, I'm looking for something like:
>
> multi_split( 'a:=b+c' , [':=','+'] )
>
> returning:
> ['a', ':=', 'b', '+', 'c']
>
> whats the python way to achieve this, preferably without regexp?
>
> Thanks.
>
> Martin


I resisted my urge to use a regexp and came up with this:

>>> from itertools import groupby
>>> s = 'apple=blue+cart'
>>> [''.join(g) for k,g in groupby(s, lambda x: x in '=+')]

['apple', '=', 'blue', '+', 'cart']
>>>


For me, the regexp solution would have been clearer, but I need to
stretch my itertools skills.

- Paddy.

 
Reply With Quote
 
Sam Pointon
Guest
Posts: n/a
 
      11-14-2006
On Nov 14, 7:56 pm, "(E-Mail Removed)" <(E-Mail Removed)>
wrote:
> Hi, I'm looking for something like:
>
> multi_split( 'a:=b+c' , [':=','+'] )
>
> returning:
> ['a', ':=', 'b', '+', 'c']
>
> whats the python way to achieve this, preferably without regexp?


pyparsing <http://pyparsing.wikispaces.com/> is quite a cool package
for doing this sort of thing. Using your example:

#untested
from pyparsing import *

splitat = Or(":=", "+")
lexeme = Word(alphas)
grammar = splitat | lexeme

grammar.parseString("a:=b+c")
#returns (the equivalent of) ['a', ':=', 'b', '+', 'c'].

--Sam

 
Reply With Quote
 
Paddy
Guest
Posts: n/a
 
      11-15-2006

Paddy wrote:

> (E-Mail Removed) wrote:
>
> > Hi, I'm looking for something like:
> >
> > multi_split( 'a:=b+c' , [':=','+'] )
> >
> > returning:
> > ['a', ':=', 'b', '+', 'c']
> >
> > whats the python way to achieve this, preferably without regexp?
> >
> > Thanks.
> >
> > Martin

>
> I resisted my urge to use a regexp and came up with this:
>
> >>> from itertools import groupby
> >>> s = 'apple=blue+cart'
> >>> [''.join(g) for k,g in groupby(s, lambda x: x in '=+')]

> ['apple', '=', 'blue', '+', 'cart']
> >>>

>
> For me, the regexp solution would have been clearer, but I need to
> stretch my itertools skills.
>
> - Paddy.

Arghhh!
No colon!
Forget the above please.

- Pad.

 
Reply With Quote
 
Paddy
Guest
Posts: n/a
 
      11-15-2006

Paddy wrote:

> Paddy wrote:
>
> > (E-Mail Removed) wrote:
> >
> > > Hi, I'm looking for something like:
> > >
> > > multi_split( 'a:=b+c' , [':=','+'] )
> > >
> > > returning:
> > > ['a', ':=', 'b', '+', 'c']
> > >
> > > whats the python way to achieve this, preferably without regexp?
> > >
> > > Thanks.
> > >
> > > Martin

> >
> > I resisted my urge to use a regexp and came up with this:
> >
> > >>> from itertools import groupby
> > >>> s = 'apple=blue+cart'
> > >>> [''.join(g) for k,g in groupby(s, lambda x: x in '=+')]

> > ['apple', '=', 'blue', '+', 'cart']
> > >>>

> >
> > For me, the regexp solution would have been clearer, but I need to
> > stretch my itertools skills.
> >
> > - Paddy.

> Arghhh!
> No colon!
> Forget the above please.
>
> - Pad.


With colon:

>>> from itertools import groupby
>>> s = 'apple:=blue+cart'
>>> [''.join(g) for k,g in groupby(s,lambda x: x in ':=+')]

['apple', ':=', 'blue', '+', 'cart']
>>>


- Pad.

 
Reply With Quote
 
Frederic Rentsch
Guest
Posts: n/a
 
      11-16-2006
Paddy wrote:
> Paddy wrote:
>
>> Paddy wrote:
>>
>>> (E-Mail Removed) wrote:
>>>
>>>> Hi, I'm looking for something like:
>>>>
>>>> multi_split( 'a:=b+c' , [':=','+'] )
>>>>
>>>> returning:
>>>> ['a', ':=', 'b', '+', 'c']
>>>>
>>>> whats the python way to achieve this, preferably without regexp?
>>>>
>>>> Thanks.
>>>>
>>>> Martin
>>> I resisted my urge to use a regexp and came up with this:
>>>
>>>>>> from itertools import groupby
>>>>>> s = 'apple=blue+cart'
>>>>>> [''.join(g) for k,g in groupby(s, lambda x: x in '=+')]
>>> ['apple', '=', 'blue', '+', 'cart']
>>> For me, the regexp solution would have been clearer, but I need to
>>> stretch my itertools skills.
>>>
>>> - Paddy.

>> Arghhh!
>> No colon!
>> Forget the above please.
>>
>> - Pad.

>
> With colon:
>
>>>> from itertools import groupby
>>>> s = 'apple:=blue+cart'
>>>> [''.join(g) for k,g in groupby(s,lambda x: x in ':=+')]

> ['apple', ':=', 'blue', '+', 'cart']
>
> - Pad.
>

Automatic grouping may or may not work as intended. If some subsets
should not be split, the solution raises a new problem.

I have been demonstrating solutions based on SE with such frequency of
late that I have begun to irritate some readers and SE in sarcastic
exaggeration has been characterized as the 'Solution of Everything'.
With some trepidation I am going to demonstrate another SE solution,
because the truth of the exaggeration is that SE is a versatile tool for
handling a variety of relatively simple problems in a simple,
straightforward manner.

>>> test_string = 'a:=b+c: apple:=blue:+cart''
>>> SE.SE (':\==/:\=/ +=/+/')(test_string).split ('/') # For repeats

the SE object would be assigned to a variable
['a', ':=', 'b', '+', 'c: apple', ':=', 'blue:', '+', 'cart']

This is a nuts-and-bolts approach. What you do is what you get. What you
want is what you do. By itself SE doesn't do anything but search and
replace, a concept without a learning curve. The simplicity doesn't
suggest versatility. Versatility comes from application techniques.
SE is a game of challenge. You know the result you want. You know
the pieces you have. The game is how to get the result with the pieces
using search and replace, either per se or as an auxiliary, as in this
case for splitting. That's all. The example above inserts some
appropriate split mark ('/'). It takes thirty seconds to write it up and
see the result. No need to ponder formulas and inner workings. If you
don't like what you see you also see what needs to be changed. Supposing
we should split single colons too, adding the corresponding substitution
and verifying the effect is a matter of another ten seconds:

>>> SE.SE (':\==/:\=/ +=/+/ :=/:/')(test_string).split ('/')

['a', ':=', 'b', '+', 'c', ':', ' apple', ':=', 'blue', ':', '', '+',
'cart']

Now we see an empty field we don't like towards the end. Why?

>>> SE.SE (':\==/:\=/ +=/+/ :=/:/')(test_string)

'a/:=/b/+/c/:/ apple/:=/blue/://+/cart'

Ah! It's two slashes next to each other. No problem. We de-multiply
double slashes in a second pass:

>>> SE.SE (':\==/:\=/ +=/+/ :=/:/ | //=/')(test_string).split ('/')

['a', ':=', 'b', '+', 'c', ':', ' apple', ':=', 'blue', ':', '+', 'cart']

On second thought the colon should not be split if a plus sign follows:

>>> SE.SE (':\==/:\=/ +=/+/ :=/:/ :+=:/+/ | //=/')(test_string).split ('/')


['a', ':=', 'b', '+', 'c', ':', ' apple', ':=', 'blue:', '+', 'cart']

No, wrong again! 'Colon-plus' should be exempt altogether. And no spaces
please:

>>> SE.SE (':\==/:\=/ +=/+/ :=/:/ :+=:+ " =" |

//=/')(test_string).split ('/')
['a', ':=', 'b', '+', 'c', ':', 'apple', ':=', 'blue:+cart']

etc.

It is easy to get carried away and to forget that SE should not be used
instead of Python's built-ins, or to get carried away doing contextual
or grammar processing explicitly, which gets messy very fast. SE fills a
gap somewhere between built-ins and parsers.
Stream editing is not a mainstream technique. I believe it has the
potential to make many simple problems trivial and many harder ones
simpler. This is why I believe the technique deserves more attention,
which, again, may explain the focus of my posts.

Frederic

 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      11-16-2006
On Nov 14, 5:41 pm, "Sam Pointon" <(E-Mail Removed)> wrote:
> On Nov 14, 7:56 pm, "(E-Mail Removed)" <(E-Mail Removed)>
> wrote:
>
> > Hi, I'm looking for something like:

>
> > multi_split( 'a:=b+c' , [':=','+'] )

>
> > returning:
> > ['a', ':=', 'b', '+', 'c']

>
> > whats the python way to achieve this, preferably without regexp?

>
> pyparsing <http://pyparsing.wikispaces.com/> is quite a cool package
> for doing this sort of thing.


Thanks for mentioning pyparsing, Sam!

This is a good example of using pyparsing for just basic tokenizing,
and it will do a nice job of splitting up the tokens, whether there is
whitespace or not.

For instance, if you were tokenizing using the string split() method,
you would get nice results from "a := b + c", but not so good from "a:=
b+ c". Using Sam Pointon's simple pyparsing expression, you can split
up the arithmetic using the symbol expressions, and the whitespace is
pretty much ignored.

But pyparsing can be used for more than just tokenizing. Here is a
slightly longer pyparsing example, using a new pyparsing helper method
called operatorPrecedence, which can shortcut the definition of
operator-separated expressions with () grouping. Note how this not
only tokenizes the expression, but also identifies the implicit groups
based on operator precedence. Finally, pyparsing allows you to label
the parsed results - in this case, you can reference the LHS and RHS
sides of your assignment statement using the attribute names "lhs" and
"rhs". This can really be handy for complicated grammars.

-- Paul


from pyparsing import *

number = Word(nums)
variable = Word(alphas)
operand = number | variable

arithexpr = operatorPrecedence( operand,
[("!", 1, opAssoc.LEFT), # factorial
("^", 2, opAssoc.RIGHT), # exponentiation
(oneOf('+ -'), 1, opAssoc.RIGHT), # leading sign
(oneOf('* /'), 2, opAssoc.LEFT), # multiplication
(oneOf('+ -'), 2, opAssoc.LEFT),] # addition
)

assignment = (variable.setResultsName("lhs") +
":=" +
arithexpr.setResultsName("rhs"))

test = ["a:= b+c",
"a := b + -c",
"y := M*X + B",
"e := m * c^2",]

for t in test:
tokens = assignment.parseString(t)
print tokens.asList()
print tokens.lhs, "<-", tokens.rhs
print

Prints:
['a', ':=', ['b', '+', 'c']]
a <- ['b', '+', 'c']

['a', ':=', ['b', '+', ['-', 'c']]]
a <- ['b', '+', ['-', 'c']]

['y', ':=', [['M', '*', 'X'], '+', 'B']]
y <- [['M', '*', 'X'], '+', 'B']

['e', ':=', ['m', '*', ['c', '^', 2]]]
e <- ['m', '*', ['c', '^', 2]]

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
java String split() does not work for delimiter "|" ? chunji08@gmail.com Java 18 08-11-2013 03:42 PM
split problem if the delimiter is inside the text limiter rewonka Python 10 03-19-2009 03:24 PM
String split drops the delimiter basi Ruby 8 12-06-2005 08:08 AM
Re: String.Split with multi character delimiter Kevin Spencer ASP .Net 5 01-21-2004 05:31 PM
Delimiter Split Mark Fox ASP .Net 2 08-11-2003 07:19 AM



Advertisments