Velocity Reviews > Splitting a string into substrings of equal size

# Splitting a string into substrings of equal size

candide
Guest
Posts: n/a

 08-15-2009
Suppose you need to split a string into substrings of a given size (except
possibly the last substring). I make the hypothesis the first slice is at the
end of the string.
A typical example is provided by formatting a decimal string with thousands
separator.

What is the pythonic way to do this ?

For my part, i reach to this rather complicated code:

# ----------------------

def comaSep(z,k=3, sep=','):
z=z[::-1]
x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
return sep.join(x)

# Test
for z in ["75096042068045", "509", "12024", "7", "2009"]:
print z+" --> ", comaSep(z)

# ----------------------

outputting :

75096042068045 --> 75,096,042,068,045
509 --> 509
12024 --> 12,024
7 --> 7
2009 --> 2,009

Thanks

Gabriel Genellina
Guest
Posts: n/a

 08-15-2009
En Fri, 14 Aug 2009 21:22:57 -0300, candide <(E-Mail Removed)>
escribió:

> Suppose you need to split a string into substrings of a given size
> (except
> possibly the last substring). I make the hypothesis the first slice is
> at the
> end of the string.
> A typical example is provided by formatting a decimal string with
> thousands
> separator.
>
>
> What is the pythonic way to do this ?

py> import locale
py> locale.setlocale(locale.LC_ALL, '')
'Spanish_Argentina.1252'
py> locale.format("%d", 75096042068045, True)
'75.096.042.068.045'

> For my part, i reach to this rather complicated code:

Mine isn't very simple either:

py> def genparts(z):
.... n = len(z)
.... i = n%3
.... if i: yield z[:i]
.... for i in xrange(i, n, 3):
.... yield z[i:i+3]
....
py> ','.join(genparts("75096042068045"))
'75,096,042,068,045'

--
Gabriel Genellina

Jan Kaliszewski
Guest
Posts: n/a

 08-15-2009
15-08-2009 candide <(E-Mail Removed)> wrote:

> Suppose you need to split a string into substrings of a given size
> (except
> possibly the last substring). I make the hypothesis the first slice is
> at the end of the string.
> A typical example is provided by formatting a decimal string with
> thousands separator.

I'd use iterators, especially for longer strings...

import itertools

def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '123,456,78'"
repeated_iterator = [iter(text)] * grouplen
groups = itertools.izip_longest(fillvalue='', *repeated_iterator)
strings = (''.join(group) for group in groups) # gen. expr.
return sep.join(strings)

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
repeated_iterator = [reversed(text)] * grouplen
groups = itertools.izip_longest(fillvalue='', *repeated_iterator)
strings = [''.join(reversed(group)) for group in groups] # list compr.
return sep.join(reversed(strings))

print separate('12345678')
print back_separate('12345678')

# alternate implementation
# (without "materializing" 'strings' as a list in back_separate):
def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '12,345,678'"
textlen = len(text)
end = textlen - (textlen % grouplen)
repeated_iterator = [iter(itertools.islice(text, 0, end))] * grouplen
strings = itertools.imap(lambda *chars: ''.join(chars),
*repeated_iterator)
return sep.join(itertools.chain(strings, (text[end:],)))

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
beg = len(text) % grouplen
repeated_iterator = [iter(itertools.islice(text, beg, None))] *
grouplen
strings = itertools.imap(lambda *chars: ''.join(chars),
*repeated_iterator)
return sep.join(itertools.chain((text[:beg],), strings))

print separate('12345678')
print back_separate('12345678')

http://docs.python.org/library/itertools.html#recipes
was the inspiration for me (especially grouper).

Cheers,
*j
--
Jan Kaliszewski (zuo) <(E-Mail Removed)>

Jan Kaliszewski
Guest
Posts: n/a

 08-15-2009
15-08-2009 Jan Kaliszewski <(E-Mail Removed)> wrote:

> 15-08-2009 candide <(E-Mail Removed)> wrote:
>
>> Suppose you need to split a string into substrings of a given size
>> (except
>> possibly the last substring). I make the hypothesis the first slice is
>> at the end of the string.
>> A typical example is provided by formatting a decimal string with
>> thousands separator.

>
> I'd use iterators, especially for longer strings...
>
>
> import itertools

[snip]

Err... It's too late for coding... Now I see obvious and simpler variant:

def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '123,456,78'"
textlen = len(text)
end = textlen - (textlen % grouplen)
strings = (text[i:i+grouplen] for i in xrange(0, end, grouplen))
return sep.join(itertools.chain(strings, (text[end:],)))

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
textlen = len(text)
beg = textlen % grouplen
strings = (text[i:i+grouplen] for i in xrange(beg, textlen, grouplen))
return sep.join(itertools.chain((text[:beg],), strings))

print separate('12345678')
print back_separate('12345678')

--
Jan Kaliszewski (zuo) <(E-Mail Removed)>

Rascal
Guest
Posts: n/a

 08-15-2009
I'm bored for posting this, but here it is:

str_list = list(str)
str_len = len(str)
for i in range(3, str_len, 3):
str_list.insert(str_len - i, ',')
return ''.join(str_list)

candide
Guest
Posts: n/a

 08-15-2009
Thanks to all for your response. I particularly appreciate Rascal's solution.

Jan Kaliszewski
Guest
Posts: n/a

 08-15-2009
Dnia 15-08-2009 o 08:08:14 Rascal <(E-Mail Removed)> wrote:

> I'm bored for posting this, but here it is:
>
> str_list = list(str)
> str_len = len(str)
> for i in range(3, str_len, 3):
> str_list.insert(str_len - i, ',')
> return ''.join(str_list)

For short strings (for sure most common case) it's ok: simple and clear.
But for huge ones, it's better not to materialize additional list for the
string -- then pure-iterator-sollutions would be better (like Gabriel's or
mine).

Cheers,
*j

--
Jan Kaliszewski (zuo) <(E-Mail Removed)>

Emile van Sebille
Guest
Posts: n/a

 08-15-2009
On 8/14/2009 5:22 PM candide said...
> Suppose you need to split a string into substrings of a given size (except
> possibly the last substring). I make the hypothesis the first slice is at the
> end of the string.
> A typical example is provided by formatting a decimal string with thousands
> separator.
>
>
> What is the pythonic way to do this ?

I like list comps...

>>> jj = '1234567890123456789'
>>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])

'123,456,789,012,345,678,9'
>>>

Emile

Gregor Lingl
Guest
Posts: n/a

 08-15-2009

> What is the pythonic way to do this ?
>
>
> For my part, i reach to this rather complicated code:
>
>
> # ----------------------
>
> def comaSep(z,k=3, sep=','):
> z=z[::-1]
> x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
> return sep.join(x)
>
> # Test
> for z in ["75096042068045", "509", "12024", "7", "2009"]:
> print z+" --> ", comaSep(z)
>

Just if you are interested, a recursive solution:

>>> def comaSep(z,k=3,sep=","):

return comaSep(z[:-3],k,sep)+sep+z[-3:] if len(z)>3 else z

>>> comaSep("7")

'7'
>>> comaSep("2007")

'2,007'
>>> comaSep("12024")

'12,024'
>>> comaSep("509")

'509'
>>> comaSep("75096042068045")

'75,096,042,068,045'
>>>

Gregor

Gregor Lingl
Guest
Posts: n/a

 08-15-2009

> What is the pythonic way to do this ?
>
>
> For my part, i reach to this rather complicated code:
>
>
> # ----------------------
>
> def comaSep(z,k=3, sep=','):
> z=z[::-1]
> x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
> return sep.join(x)
>
> # Test
> for z in ["75096042068045", "509", "12024", "7", "2009"]:
> print z+" --> ", comaSep(z)
>

Just if you are interested, a recursive solution:

>>> def comaSep(z,k=3,sep=","):

return comaSep(z[:-3],k,sep)+sep+z[-3:] if len(z)>3 else z

>>> comaSep("7")

'7'
>>> comaSep("2007")

'2,007'
>>> comaSep("12024")

'12,024'
>>> comaSep("509")

'509'
>>> comaSep("75096042068045")

'75,096,042,068,045'
>>>

Gregor