Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: count

Reply
Thread Tools

Re: count

 
 
Vilya Harvey
Guest
Posts: n/a
 
      07-08-2009
2009/7/8 Dhananjay <>:
> I wanted to sort column 2 in assending order* and I read whole file in array
> "data" and did the following:
>
> data.sort(key = lambda fieldsfields[2]))
>
> I have sorted column 2, however I want to count the numbers in the column 2.
> i.e. I want to know, for example, how many repeates of say '3' (first row,
> 2nd column in above data) are there in column 2.


One thing: indexes in Python start from 0, so the second column has an
index of 1 not 2. In other words, it should be data.sort(key = lambda
fields: fields[1]) instead.

With that out of the way, the following will print out a count of each
unique item in the second column:

from itertools import groupby
for x, g in groupby([fields[1] for fields in data]):
print x, len(tuple(g))

Hope that helps,
Vil.
 
Reply With Quote
 
 
 
 
Bearophile
Guest
Posts: n/a
 
      07-08-2009
Vilya Harvey:
> from itertools import groupby
> for x, g in groupby([fields[1] for fields in data]):
> * * print x, len(tuple(g))


Avoid that len(tuple(g)), use something like the following, it's lazy
and saves some memory.


def leniter(iterator):
"""leniter(iterator): return the length of a given
iterator, consuming it, without creating a list.
Never use it with infinite iterators.

>>> leniter()

Traceback (most recent call last):
...
TypeError: leniter() takes exactly 1 argument (0 given)
>>> leniter([])

0
>>> leniter([1])

1
>>> leniter(iter([1]))

1
>>> leniter(x for x in xrange(100) if x%2)

50
>>> from itertools import groupby
>>> [(leniter(g), h) for h,g in groupby("aaaabccaadeeee")]

[(4, 'a'), (1, 'b'), (2, 'c'), (2, 'a'), (1, 'd'), (4, 'e')]

>>> def foo0():

... if False: yield 1
>>> leniter(foo0())

0

>>> def foo1(): yield 1
>>> leniter(foo1())

1
"""
# This code is faster than: sum(1 for _ in iterator)
if hasattr(iterator, "__len__"):
return len(iterator)
nelements = 0
for _ in iterator:
nelements += 1
return nelements

Bye,
bearophile
 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      07-08-2009
Bearophile <> writes:
> > * * print x, len(tuple(g))

>
> Avoid that len(tuple(g)), use something like the following


print x, sum(1 for _ in g)
 
Reply With Quote
 
Aahz
Guest
Posts: n/a
 
      07-08-2009
In article <050094ea-faf4-4e03-875d->,
Bearophile <> wrote:
>Vilya Harvey:
>>
>> from itertools import groupby
>> for x, g in groupby([fields[1] for fields in data]):
>> =A0 =A0 print x, len(tuple(g))

>
>Avoid that len(tuple(g)), use something like the following, it's lazy
>and saves some memory.


The question is whether it saves time, have you tested it?
--
Aahz () <*> http://www.pythoncraft.com/

"as long as we like the same operating system, things are cool." --piranha
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      07-08-2009
(Aahz) writes:
> >Avoid that len(tuple(g)), use something like the following, it's lazy
> >and saves some memory.

> The question is whether it saves time, have you tested it?


len(tuple(xrange(100000000))) ... hmm.
 
Reply With Quote
 
Aahz
Guest
Posts: n/a
 
      07-08-2009
In article <>,
Paul Rubin <http://> wrote:
> (Aahz) writes:
>>Paul Rubin deleted an attribution:
>>>
>>>Avoid that len(tuple(g)), use something like the following, it's lazy
>>>and saves some memory.

>>
>> The question is whether it saves time, have you tested it?

>
>len(tuple(xrange(100000000))) ... hmm.


When dealing with small N, O() can get easily swamped by the constant
factors. How often do you deal with more than a hundred fields?
--
Aahz () <*> http://www.pythoncraft.com/

"as long as we like the same operating system, things are cool." --piranha
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      07-08-2009
(Aahz) writes:
> When dealing with small N, O() can get easily swamped by the constant
> factors. How often do you deal with more than a hundred fields?


The number of fields in the OP's post was not stated. Expecting it to
be less than 100 seems like an ill-advised presumption. If N is
unknown, speed-tuning the case where N is small at the expense of
consuming monstrous amounts of memory when N is large sounds
somewhere between a premature optimization and a nasty bug.

 
Reply With Quote
 
J. Clifford Dyer
Guest
Posts: n/a
 
      07-09-2009
On Wed, 2009-07-08 at 14:45 -0700, Paul Rubin wrote:
> (Aahz) writes:
> > >Avoid that len(tuple(g)), use something like the following, it's lazy
> > >and saves some memory.

> > The question is whether it saves time, have you tested it?

>
> len(tuple(xrange(100000000))) ... hmm.


timer.py
--------
from datetime import datetime

def tupler(n):
return len(tuple(xrange(n)))

def summer(n):
return sum(1 for x in xrange(n))

def test_func(f, n):
print f.__name__,
start = datetime.now()
print f(n)
end = datetime.now()
print "Start: %s" % start
print "End: %s" % end
print "Duration: %s" % (end - start,)

if __name__ == '__main__':
test_func(summer, 10000000)
test_func(tupler, 10000000)
test_func(summer, 100000000)
test_func(tupler, 100000000)

$ python timer.py
summer 10000000
Start: 2009-07-08 22:02:13.216689
End: 2009-07-08 22:02:15.855931
Duration: 0:00:02.639242
tupler 10000000
Start: 2009-07-08 22:02:15.856122
End: 2009-07-08 22:02:16.743153
Duration: 0:00:00.887031
summer 100000000
Start: 2009-07-08 22:02:16.743863
End: 2009-07-08 22:02:49.372756
Duration: 0:00:32.628893
Killed
$

Note that "Killed" did not come from anything I did. The tupler just
bombed out when the tuple got too big for it to handle. Tupler was
faster for as large an input as it could handle, as well as for small
inputs (test not shown).

 
Reply With Quote
 
Bearophile
Guest
Posts: n/a
 
      07-09-2009
Paul Rubin:
> print x, sum(1 for _ in g)


Don't use that, use my function If g has a __len__ you are wasting
time. And sum(1 ...) is (on my PC) slower.


J. Clifford Dyer:
> if __name__ == '__main__':
> * * test_func(summer, 10000000)
> * * test_func(tupler, 10000000)
> * * test_func(summer, 100000000)
> * * test_func(tupler, 100000000)


Have you forgotten my function?

Bye,
bearophile
 
Reply With Quote
 
J. Cliff Dyer
Guest
Posts: n/a
 
      07-09-2009
Bearophile wins! (This only times the loop itself. It doesn't check
for __len__)

summer:5
0:00:00.000051
bearophile:5
0:00:00.000009
summer:50
0:00:00.000030
bearophile:50
0:00:00.000013
summer:500
0:00:00.000077
bearophile:500
0:00:00.000053
summer:5000
0:00:00.000575
bearophile:5000
0:00:00.000473
summer:50000
0:00:00.005583
bearophile:50000
0:00:00.004625
summer:500000
0:00:00.055834
bearophile:500000
0:00:00.046137
summer:5000000
0:00:00.426734
bearophile:5000000
0:00:00.349573
summer:50000000
0:00:04.180920
bearophile:50000000
0:00:03.652311
summer:500000000
0:00:42.647885
bearophile: 500000000
0:00:35.190550

On Thu, 2009-07-09 at 04:04 -0700, Bearophile wrote:
> Paul Rubin:
> > print x, sum(1 for _ in g)

>
> Don't use that, use my function If g has a __len__ you are wasting
> time. And sum(1 ...) is (on my PC) slower.
>
>
> J. Clifford Dyer:
> > if __name__ == '__main__':
> > test_func(summer, 10000000)
> > test_func(tupler, 10000000)
> > test_func(summer, 100000000)
> > test_func(tupler, 100000000)

>
> Have you forgotten my function?
>
> Bye,
> bearophile


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Count = Count + 1 Using only std_logic_1164 Doubt efelnavarro09 VHDL 2 01-26-2011 03:49 AM
E-Mail Count Thunderbird SkyPilot Firefox 2 07-06-2005 01:35 AM
TBird: Unread NewsGroup Message Count Never Accurate Herb Firefox 4 03-29-2005 02:00 AM
Count(*) in a Subquery with multiple tables: How does SQL determine which table to generate the Count() from? Kaimuri MCSD 3 12-29-2004 06:38 PM
I am adding a new row to the datagrid dynamically but if i use the Count property of Item it is not showing the count of the new rows being added Praveen Balanagendra via .NET 247 ASP .Net 2 06-06-2004 07:16 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57