Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: itertools.groupby

Reply
Thread Tools

Re: itertools.groupby

 
 
Peter Otten
Guest
Posts: n/a
 
      04-21-2013
Jason Friedman wrote:

> I have a file such as:
>
> $ cat my_data
> Starting a new group
> a
> b
> c
> Starting a new group
> 1
> 2
> 3
> 4
> Starting a new group
> X
> Y
> Z
> Starting a new group
>
> I am wanting a list of lists:
> ['a', 'b', 'c']
> ['1', '2', '3', '4']
> ['X', 'Y', 'Z']
> []
>
> I wrote this:
> ------------------------------------
> #!/usr/bin/python3
> from itertools import groupby
>
> def get_lines_from_file(file_name):
> with open(file_name) as reader:
> for line in reader.readlines():


readlines() slurps the whole file into memory! Don't do that, iterate over
the file directly instead:

for line in reader:

> yield(line.strip())
>
> counter = 0
> def key_func(x):
> if x.startswith("Starting a new group"):
> global counter
> counter += 1
> return counter
>
> for key, group in groupby(get_lines_from_file("my_data"), key_func):
> print(list(group)[1:])
> ------------------------------------
>
> I get the output I desire, but I'm wondering if there is a solution
> without the global counter.


If you were to drop the empty groups you could simplify it to

def is_header(line):
return line.startswith("Starting a new group")

with open("my_data") as lines:
stripped_lines = (line.strip() for line in lines)
for header, group in itertools.groupby(stripped_lines, key=is_header):
if not header:
print(list(group))

And here's a refactoring for your initial code. The main point is the use of
nonlocal instead of global state to make the function reentrant.

def split_groups(items, header):
odd = True
def group_key(item):
nonlocal odd
if header(item):
odd = not odd
return odd

for _key, group in itertools.groupby(items, key=group_key):
yield itertools.islice(group, 1, None)

def is_header(line):
return line.startswith("Starting a new group")

with open("my_data") as lines:
stripped_lines = map(str.strip, lines)
for group in split_groups(stripped_lines, header=is_header):
print(list(group))

One remaining problem with that code is that it will silently drop the first
line of the file if it doesn't start with a header:

$ cat my_data
alpha
beta
gamma
Starting a new group
a
b
c
Starting a new group
Starting a new group
1
2
3
4
Starting a new group
X
Y
Z
Starting a new group
$ python3 group.py
['beta', 'gamma'] # where's alpha?
['a', 'b', 'c']
[]
['1', '2', '3', '4']
['X', 'Y', 'Z']
[]

How do you want to handle that case?

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments