Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: How to detect typos in Python programs

Reply
Thread Tools

Re: How to detect typos in Python programs

 
 
Bob Gailer
Guest
Posts: n/a
 
      07-25-2003
At 07:26 PM 7/25/2003 +0530, Manish Jethani wrote:

>Hi all,
>
>Is there a way to detect typos in a Python program, before
>actually having to run it. Let's say I have a function like this:
>
> def server_closed_connection():
> session.abost()
>
>Here, abort() is actually misspelt. The only time my program
>follows this path is when the server disconnects from its
>end--and that's like once in 100 sessions. So sometimes I
>release the program, people start using it, and then someone
>reports this typo after 4-5 days of the release (though it's
>trivial to fix manually at the user's end, or I can give a patch).
>
>How can we detect these kinds of errors at development time?
>It's not practical for me to have a test script that can make
>the program go through all (most) the possible code paths.


consider:
use a regular expression to get a list of all the identifiers in the program
count occurrence of each by adding to/updating a dictionary
sort and display the result

program_text = """ def server_closed_connection():
session.abost()"""
import re
words = re.findall(r'([A-Za-z_]\w*)\W*', program_text) # list of all
identifiers
wordDict = {}
for word in words: wordDict[word] = wordDict.setdefault(word,0)+1 # dict of
identifiers w/ occurrence count
wordList = wordDict.items()
wordList.sort()
for wordCount in wordList: print '%-25s %3s' % wordCount

output (approximate, as I used tabs):

abost 1
def 1
server_closed_connection 1
session 1

You can then examine this list for suspect names, especially those that
occur once. We could apply some filtering to remove keywords and builtin names.

We could add a comment at the start of the program containing all the valid
names, and extend this process to report just the ones that are not in the
valid list.

Bob Gailer
http://www.velocityreviews.com/forums/(E-Mail Removed)
303 442 2625


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.500 / Virus Database: 298 - Release Date: 7/10/2003

 
Reply With Quote
 
 
 
 
Bengt Richter
Guest
Posts: n/a
 
      07-26-2003
On Fri, 25 Jul 2003 12:20:57 -0600, Bob Gailer <(E-Mail Removed)> wrote:

>--=======6B79482F=======
>Content-Type: text/plain; x-avg-checked=avg-ok-74704BF8; charset=us-ascii; format=flowed
>Content-Transfer-Encoding: 8bit
>
>At 07:26 PM 7/25/2003 +0530, Manish Jethani wrote:
>
>>Hi all,
>>
>>Is there a way to detect typos in a Python program, before
>>actually having to run it. Let's say I have a function like this:
>>
>> def server_closed_connection():
>> session.abost()
>>
>>Here, abort() is actually misspelt. The only time my program
>>follows this path is when the server disconnects from its
>>end--and that's like once in 100 sessions. So sometimes I
>>release the program, people start using it, and then someone
>>reports this typo after 4-5 days of the release (though it's
>>trivial to fix manually at the user's end, or I can give a patch).
>>
>>How can we detect these kinds of errors at development time?
>>It's not practical for me to have a test script that can make
>>the program go through all (most) the possible code paths.

>
>consider:
> use a regular expression to get a list of all the identifiers in the program
> count occurrence of each by adding to/updating a dictionary
> sort and display the result
>
>program_text = """ def server_closed_connection():
> session.abost()"""
>import re
>words = re.findall(r'([A-Za-z_]\w*)\W*', program_text) # list of all
>identifiers
>wordDict = {}
>for word in words: wordDict[word] = wordDict.setdefault(word,0)+1 # dict of
>identifiers w/ occurrence count
>wordList = wordDict.items()
>wordList.sort()
>for wordCount in wordList: print '%-25s %3s' % wordCount
>
>output (approximate, as I used tabs):
>
>abost 1
>def 1
>server_closed_connection 1
>session 1
>
>You can then examine this list for suspect names, especially those that
>occur once. We could apply some filtering to remove keywords and builtin names.
>
>We could add a comment at the start of the program containing all the valid
>names, and extend this process to report just the ones that are not in the
>valid list.
>

That's cool. If you want to go further, and use symbols that the actual program
is using (excluding comment stuff) try:

====< prtok.py >================================================= =======
#prtok.py
import sys, tokenize, glob, token

symdir={}

def tokeneater(type, tokstr, start, end, line, symdir=symdir):
if (type==token.NAME):
TOKSTR = tokstr.upper() #should show up for this file
if symdir.has_key(TOKSTR):
d = symdir[TOKSTR]
if d.has_key(tokstr):
d[tokstr] += 1
else:
d[tokstr] = 1
else:
symdir[TOKSTR]={ tokstr:1 }

for fileglob in sys.argv[1:]:
for filename in glob.glob(fileglob):
symdir.clear()
tokenize.tokenize(open(filename).readline, tokeneater)

header = '\n====< '+filename+' >===='
singlecase = []
multicase = [key for key in symdir.keys()
if len(symdir[key])>1 or singlecase.append(key)]
for key in multicase:
if header:
print header
print ' (Multicase symbols)'
header = None
for name, freq in symdir[key].items():
print '%15s:%-3s'% (name, freq),
print
if header: print header; header = None
print ' (Singlecase symbols)'
byfreq = [symdir[k].items()[0] for k in singlecase]
byfreq = [(n,k) for k,n in byfreq]
byfreq.sort()
npr = 0
for freq, key in byfreq:
if header:
print header
header = None
print '%15s:%-3s'% (key, freq),
npr +=1
if npr%4==3: print
print
================================================== ======================
Operating on itself and another little file (you can specify file glob expressions too):

[18:55] C:\pywk\tok>prtok.py prtok.py gt.py

====< prtok.py >====
(Multicase symbols)
tokstr:6 TOKSTR:4
NAME:1 name:2
(Singlecase symbols)
append:1 argv:1 clear:1
def:1 end:1 import:1 keys:1
len:1 line:1 open:1 or:1
readline:1 sort:1 start:1 upper:1
else:2 fileglob:2 has_key:2 items:2
multicase:2 n:2 sys:2 token:2
tokeneater:2 type:2 None:3 filename:3
glob:3 npr:3 singlecase:3 tokenize:3
d:4 freq:4 k:4 byfreq:5
for:8 if:8 in:8 key:8
header:10 print:10 symdir:11

====< gt.py >====
(Singlecase symbols)
__name__:1 argv:1 def:1
if:1 fn:2 for:2 import:2
in:2 main:2 print:2 sys:2
arg:3 glob:3

Regards,
Bengt Richter
 
Reply With Quote
 
 
 
 
Bengt Richter
Guest
Posts: n/a
 
      07-26-2003
On 26 Jul 2003 01:54:19 GMT, (E-Mail Removed) (Bengt Richter) wrote:
[code that got += line switched. needs change to increment after conditional print:]

--- prtok.py~ Fri Jul 25 18:52:53 2003
+++ prtok.py Fri Jul 25 19:58:24 2003
@@ -43,6 +43,6 @@
print header
header = None
print '%15s:%-3s'% (key, freq),
- npr +=1
if npr%4==3: print
+ npr +=1
print

Regards,
Bengt Richter
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Parsing text acounting for typos? dagoodyear Java 1 06-12-2005 09:19 PM
Typos os Bugs(70-315 self paced)? john hansen MCSD 4 10-30-2003 06:43 PM
How to detect typos in Python programs Manish Jethani Python 15 07-29-2003 04:53 PM
Typos in the Exam Davin Mickelson MCSE 3 07-21-2003 11:31 PM



Advertisments