Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Newbie with sort text file question

Reply
Thread Tools

Newbie with sort text file question

 
 
stuartc
Guest
Posts: n/a
 
      07-12-2003
Hi:

I'm not a total newbie, but I'm pretty green. I need to sort a text
file and then get a total for the number of occurances for a part of
the string. Hopefully, this will explain it better:

Here's the text file:

banana_c \\yellow
apple_a \\green
orange_b \\yellow
banana_d \\green
orange_a \\orange
apple_w \\yellow
banana_e \\green
orange_x \\yellow
orange_y \\orange

I would like two output files:

1) Sorted like this, by the fruit name (the name before the dash)

apple_a \\green
apple_w \\yellow
banana_c \\yellow
banana_d \\green
banana_e \\green
orange_a \\orange
orange_b \\yellow
orange_x \\yellow
orange_y \\orange

2) Then summarized like this, ordered with the highest occurances
first:

orange occurs 4
banana occurs 3
apple occurs 2

Total occurances is 9

Thanks for any help !
 
Reply With Quote
 
 
 
 
Max M
Guest
Posts: n/a
 
      07-13-2003
stuartc wrote:

> Hi:
>
> I'm not a total newbie, but I'm pretty green. I need to sort a text
> file and then get a total for the number of occurances for a part of
> the string. Hopefully, this will explain it better:
>
> Here's the text file:
>
> banana_c \\yellow
> apple_a \\green
> orange_b \\yellow
> banana_d \\green
> orange_a \\orange
> apple_w \\yellow
> banana_e \\green
> orange_x \\yellow
> orange_y \\orange
>
> I would like two output files:
>
> 1) Sorted like this, by the fruit name (the name before the dash)
> 2) Then summarized like this, ordered with the highest occurances
> first:
>
> orange occurs 4
> banana occurs 3
> apple occurs 2
>
> Total occurances is 9



fruity = """banana_c \\yellow
apple_a \\green
orange_b \\yellow
banana_d \\green
orange_a \\orange
apple_w \\yellow
banana_e \\green
orange_x \\yellow
orange_y \\orange"""

# print sorted list
fruits = fruity.split('\n')
fruits.sort()
print '\n'.join(fruits)
print ''

# count occurences
counter = {}
for fruit in fruits:
sort_of, apendix = fruit.split('_')
counter[sort_of] = counter.get(sort_of, 0) + 1

# sort by occurences
decorated = [(counter[key], key) for key in counter.keys()]
decorated.sort()
decorated.reverse()

# print result
sum = 0
for count, sort_of in decorated:
print sort_of, 'occurs', count
sum += count

print ''
print 'Total occurances is', sum


regards Max M

 
Reply With Quote
 
 
 
 
Behrang Dadsetan
Guest
Posts: n/a
 
      07-13-2003
stuartc wrote:
> Hi:
>
> Here's the text file:
>
> banana_c \\yellow
> apple_a \\green
> orange_b \\yellow
> banana_d \\green
> orange_a \\orange
> apple_w \\yellow
> banana_e \\green
> orange_x \\yellow
> orange_y \\orange
>
> I would like two output files:
>
> 1) Sorted like this, by the fruit name (the name before the dash)
>
> 2) Then summarized like this, ordered with the highest occurances
> first:

Here is some mostly tested code

import re

file = open ("textfile.txt") # your file name instead of textfile.txt
alllines = list(file.readlines())
file.close()

alllines.sort()

fruitre = re.compile('^[a-z]+')
fruits = {}
for line in alllines:
fruitresult = fruitre.search(line)
print line
if fruitresult:
fruit = fruitresult.group(0)
fruits.setdefault(fruit, 0)
fruits[fruit] += 1

totalamount = 0
for fruit, amount in fruits.items():
print fruit, " occurs ", amount
totalamount += amount

print "Total amount of fruits ", totalamount

Regards, Ben.
PS: It looks a little unoptimized to me but it works. Hopefully others
will reply to you as well so I can learn how to make the above better.


 
Reply With Quote
 
stuartc
Guest
Posts: n/a
 
      07-13-2003
Hi Bengt:

Thank you. Your code worked perfectly based on the text file I
provided.

Unfortunately for me, my real text file has one slight variation that
I did not account for. That is, the fruit name does not always have
an "_" after its name. For example, apple below does not an an "_"
attached to it.

banana_c \\yellow
apple \\green
orange_b \\yellow


This variation in my text file caused a problem with the program.
Here's the error.

Traceback (most recent call last):
File "G:/Python22/Sort_Fruit.py", line 47, in ?
for fruit, dummyvar in fruitlist: fruitfreq[fruit] =
fruitfreq.get(fruit, 0)+1
ValueError: unpack list of wrong size

I tried to debug and fix this variation, but I wasn't able to. I did
notice that your split, splits each line in the file into two fields,
as long as there's an "_" with a fruit name. If the fruit name does
not have an "_", then the split does not occur. I think this is
related to the problem, but I couldn't figure out how to fix it.

Any help will be greatly appreciated. Thanks.

- Stuart



http://www.velocityreviews.com/forums/(E-Mail Removed) (Bengt Richter) wrote in message news:<beq357$thj$0@216.39.172.122>...
> On 12 Jul 2003 12:46:51 -0700, (E-Mail Removed) (stuartc) wrote:
>
> >Hi:
> >
> >I'm not a total newbie, but I'm pretty green. I need to sort a text
> >file and then get a total for the number of occurances for a part of
> >the string. Hopefully, this will explain it better:
> >
> >Here's the text file:
> >
> >banana_c \\yellow
> >apple_a \\green
> >orange_b \\yellow
> >banana_d \\green
> >orange_a \\orange
> >apple_w \\yellow
> >banana_e \\green
> >orange_x \\yellow
> >orange_y \\orange
> >
> >I would like two output files:
> >
> >1) Sorted like this, by the fruit name (the name before the dash)
> >
> >apple_a \\green
> >apple_w \\yellow
> >banana_c \\yellow
> >banana_d \\green
> >banana_e \\green
> >orange_a \\orange
> >orange_b \\yellow
> >orange_x \\yellow
> >orange_y \\orange
> >
> >2) Then summarized like this, ordered with the highest occurances
> >first:
> >
> >orange occurs 4
> >banana occurs 3
> >apple occurs 2
> >
> >Total occurances is 9
> >
> >Thanks for any help !

>
> ===< stuartc.py >================================================= =======
> import StringIO
> textf = StringIO.StringIO(r"""
> banana_c \\yellow
> apple_a \\green
> orange_b \\yellow
> banana_d \\green
> orange_a \\orange
> apple_w \\yellow
> banana_e \\green
> orange_x \\yellow
> orange_y \\orange
> """)
>
> # I would like two output files:
> # (actually two files ?? Ok)
>
> # 1) Sorted like this, by the fruit name (the name before the dash)
>
> fruitlist = [line.split('_',1) for line in textf if line.strip()]
> fruitlist.sort()
>
> # apple_a \\green
> # apple_w \\yellow
> # banana_c \\yellow
> # banana_d \\green
> # banana_e \\green
> # orange_a \\orange
> # orange_b \\yellow
> # orange_x \\yellow
> # orange_y \\orange
>
> outfile_1 = StringIO.StringIO()
> outfile_1.write(''.join(['_'.join(pair) for pair in fruitlist]))
>
> # 2) Then summarized like this, ordered with the highest occurances
> # first:
>
> # orange occurs 4
> # banana occurs 3
> # apple occurs 2
>
> outfile_2 = StringIO.StringIO()
> fruitfreq = {}
> for fruit, dummyvar in fruitlist: fruitfreq[fruit] = fruitfreq.get(fruit, 0)+1
> fruitfreqlist = [(occ,name) for name,occ in fruitfreq.items()]
> fruitfreqlist.sort()
> fruitfreqlist.reverse()
> outfile_2.write('\n'.join(['%s occurs %s'%(name,occ) for occ,name in fruitfreqlist]+['']))
>
> # Total occurances is 9
> print >> outfile_2,"Total occurances [sic] is [sic] %s" % reduce(int.__add__, fruitfreq.values())
>
> ## show results
> print '\nFile 1:\n------------\n%s------------' % outfile_1.getvalue()
> print '\nFile 2:\n------------\n%s------------' % outfile_2.getvalue()
> ================================================== =======================
> executed:
>
> [15:52] C:\pywk\clp>stuartc.py
>
> File 1:
> ------------
> apple_a \\green
> apple_w \\yellow
> banana_c \\yellow
> banana_d \\green
> banana_e \\green
> orange_a \\orange
> orange_b \\yellow
> orange_x \\yellow
> orange_y \\orange
> ------------
>
> File 2:
> ------------
> orange occurs 4
> banana occurs 3
> apple occurs 2
> Total occurances [sic] is [sic] 9
> ------------
>
> Is that what you wanted?
>
> Regards,
> Bengt Richter

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: When will Thunderbird support sort in place (in context sort)? Ron Natalie Firefox 0 02-02-2006 04:38 AM
The Colourised Bewitched -- sort of OK....... sort of! anthony DVD Video 26 06-28-2005 04:39 AM
xsl:sort lang="es" modern vs. tradidional Spanish sort order nobody XML 0 06-01-2004 06:25 AM
Ado sort error-Ado Sort -Relate, Compute By, or Sort operations cannot be done on column(s) whose key length is unknown or exceeds 10 KB. Navin ASP General 1 09-09-2003 07:16 AM
Re: Newbie with sort text file question Bob Gailer Python 3 07-14-2003 06:54 PM



Advertisments