Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Refactoring; arbitrary expression in lists

Reply
Thread Tools

Refactoring; arbitrary expression in lists

 
 
Frans Englich
Guest
Posts: n/a
 
      01-12-2005

As continuation to a previous thread, "PyChecker messages", I have a question
regarding code refactoring which the following snippet leads to:

> > runner.py:200: Function (detectMimeType) has too many returns (11)
> >
> > The function is simply a long "else-if" clause, branching out to
> > different return statements. What's wrong? It's simply a "probably ugly
> > code" advice?

>
> That is also advice. Generally you use a dict of functions, or some other
> structure to lookup what you want to do.


More specifically, my function looks like this:

#--------------------------------------------------------------
def detectMimeType( filename ):

extension = filename[-3:]
basename = os.path.basename(filename)

if extension == "php":
return "application/x-php"

elif extension == "cpp" or extension.endswith("cc"):
return "text/x-c++-src"
# etcetera
elif extension == "xsl":
return "text/xsl"

elif basename.find( "Makefile" ) != -1:
return "text/x-makefile"
else:
raise NoMimeError
#--------------------------------------------------------------
(don't bother if the MIME detection looks like stone age, it's temporary until
PyXDG gets support for the XDG mime type spec..)

I'm now wondering if it's possible to write this in a more compact way, such
that the if-clause isn't necessary? Of course, the current code works, but
perhaps it could be prettier.

I'm thinking along the lines of nested lists, but what is the obstacle for me
is that both the test and return statement are simple expressions; not
functions or a particular data type. Any ideas?


Cheers,

Frans

 
Reply With Quote
 
 
 
 
Paul McGuire
Guest
Posts: n/a
 
      01-12-2005
"Frans Englich" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
>
> As continuation to a previous thread, "PyChecker messages", I have a

question
> regarding code refactoring which the following snippet leads to:
>
> > > runner.py:200: Function (detectMimeType) has too many returns (11)
> > >
> > > The function is simply a long "else-if" clause, branching out to
> > > different return statements. What's wrong? It's simply a "probably

ugly
> > > code" advice?

> >
> > That is also advice. Generally you use a dict of functions, or some

other
> > structure to lookup what you want to do.

>
> More specifically, my function looks like this:
>
> #--------------------------------------------------------------
> def detectMimeType( filename ):
>
> extension = filename[-3:]
> basename = os.path.basename(filename)
>
> if extension == "php":
> return "application/x-php"
>
> elif extension == "cpp" or extension.endswith("cc"):
> return "text/x-c++-src"
> # etcetera

<snip>

Since the majority of your tests will be fairly direct 'extension "XYZ"
means mimetype "aaa/bbb"', this really sounds like a dictionary type
solution is called for. Still, you might have some desire to do some
order-dependent testing. Here are two ideas - the first iterates over a
list of expressions and resulting types, the other uses a dictionary lookup.

-- Paul


import os

extToMimeMap = [
('"php"', "application/x-php"),
('"cpp" or extension.endswith("cc")', "text/x-c++-src"),
('"xsl"', "text/xsl"),
]

def detectMimeType1( filename ):

extension = filename[-3:]
basename = os.path.basename(filename)

for exp,mimetype in extToMimeMap:
if eval("extension=="+exp): return mimetype

# do other non-extension-related tests here
if basename.find( "Makefile" ) != -1:
return "text/x-makefile"
else:
raise NoMimeError


extToMimeDict = {
"php": "application/x-php",
"cpp": "text/x-c++-src",
"xsl": "text/xsl",
}

def detectMimeType2( filename ):

extension = filename[-3:]
basename = os.path.basename(filename)

# check for straight extension matches
try:
return extToMimeDict[extension]
except KeyError:
pass

# do more complex extension and other non-extension-related tests here
if extension.endswith("cc"):
return extToMimeDict["cpp"]

if basename.find( "Makefile" ) != -1:
return "text/x-makefile"

raise NoMimeError

for detectMimeType in (detectMimeType1, detectMimeType2):
for s in ("a.php","z.acc","Makefile","blork.xsl"):
print s,"->",detectMimeType(s)



 
Reply With Quote
 
 
 
 
wittempj@hotmail.com
Guest
Posts: n/a
 
      01-12-2005
I can not break the original code in 2.4, if I try this:
import os, sys
class NoMimeError(Exception):
pass

def detectMimeType( filename ):

extension = filename[-3:]
basename = os.path.basename(filename)
if extension == "php":
return "application/x-php"
elif extension == "cpp" or extension.endswith("cc"):
return "text/x-c++-src"
elif extension == "1":
return 'BlahBlah'
elif extension == "2":
return 'BlahBlah'
elif extension == "3":
return 'BlahBlah'
elif extension == "4":
return 'BlahBlah'
elif extension == "5":
return 'BlahBlah'
elif extension == "6":
return 'BlahBlah'
elif extension == "7":
return 'BlahBlah'
elif extension == "8":
return 'BlahBlah'
elif extension == "9":
return 'BlahBlah'
elif extension == "10":
return 'BlahBlah'
elif extension == "11":
return 'BlahBlah'
elif extension == "12":
return 'BlahBlah'
elif extension == "13":
return 'BlahBlah'
elif extension == "14":
return 'BlahBlah'
elif extension == "15":
return 'BlahBlah'
elif extension == "16":
return 'BlahBlah'
elif extension == "17":
return 'BlahBlah'
elif extension == "18":
return 'BlahBlah'
elif extension == "19":
return 'BlahBlah'
elif extension == "20":
return 'BlahBlah'

elif extension == "xsl":
return "text/xsl"

elif basename.find( "Makefile" ) != -1:
return "text/x-makefile"
else:
raise NoMimeError
try:
print detectMimeType(r'c:\test.php')
print detectMimeType('c:\test.xsl')
print detectMimeType('c:\test.xxx')
except Exception, e:
print >> sys.stderr, '%s: %s' %(e.__class__.__name__, e)

I get
>>>

application/x-php
text/xsl
NoMimeError:
>>>


So although the dictionary solution is much nicer nothing seems wrong
with your code as it is - or am I missing something?

 
Reply With Quote
 
Frans Englich
Guest
Posts: n/a
 
      01-12-2005
On Wednesday 12 January 2005 18:56, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> I can not break the original code in 2.4, if I try this:


[...]

>
> So although the dictionary solution is much nicer nothing seems wrong
> with your code as it is - or am I missing something?


Nope, the current code works. I'm just looking at Python's cool ways of
solving problems. (the matter about 11 returns was a coding-style report from
PyChecker).


Cheers,

Frans

 
Reply With Quote
 
Jeff Shannon
Guest
Posts: n/a
 
      01-12-2005
Paul McGuire wrote:

> "Frans Englich" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>>#--------------------------------------------------------------
>>def detectMimeType( filename ):
>>
>> extension = filename[-3:]


You might consider using os.path.splitext() here, instead of always
assuming that the last three characters are the extension. That way
you'll be consistent even with extensions like .c, .cc, .h, .gz, etc.

Note that os.path.splitext() does include the extension separator (the
dot), so that you'll need to test against, e.g., ".php" and ".cpp".

> Since the majority of your tests will be fairly direct 'extension "XYZ"
> means mimetype "aaa/bbb"', this really sounds like a dictionary type
> solution is called for.


I strongly agree with this. The vast majority of your cases seem to
be a direct mapping of extension-string to mimetype-string; using a
dictionary (i.e. mapping ) for this is ideal. For those cases
where you can't key off of an extension string (such as makefiles),
you can do special-case processing if the dictionary lookup fails.


> if extension.endswith("cc"):
> return extToMimeDict["cpp"]


If the intent of this is to catch .cc files, it's easy to add an extra
entry into the dict to map '.cc' to the same string as '.cpp'.

Jeff Shannon
Technician/Programmer
Credit International

 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      01-13-2005
On Wed, 12 Jan 2005 18:16:23 +0000, Frans Englich <(E-Mail Removed)> wrote:

>
>As continuation to a previous thread, "PyChecker messages", I have a question
>regarding code refactoring which the following snippet leads to:
>
>> > runner.py:200: Function (detectMimeType) has too many returns (11)
>> >
>> > The function is simply a long "else-if" clause, branching out to
>> > different return statements. What's wrong? It's simply a "probably ugly
>> > code" advice?

>>
>> That is also advice. Generally you use a dict of functions, or some other
>> structure to lookup what you want to do.

>
>More specifically, my function looks like this:
>
>#--------------------------------------------------------------
>def detectMimeType( filename ):
>
> extension = filename[-3:]
> basename = os.path.basename(filename)
>
> if extension == "php":
> return "application/x-php"
>
> elif extension == "cpp" or extension.endswith("cc"):
> return "text/x-c++-src"
># etcetera
> elif extension == "xsl":
> return "text/xsl"
>
> elif basename.find( "Makefile" ) != -1:
> return "text/x-makefile"
> else:
> raise NoMimeError
>#--------------------------------------------------------------
>(don't bother if the MIME detection looks like stone age, it's temporary until
>PyXDG gets support for the XDG mime type spec..)
>
>I'm now wondering if it's possible to write this in a more compact way, such
>that the if-clause isn't necessary? Of course, the current code works, but
>perhaps it could be prettier.
>
>I'm thinking along the lines of nested lists, but what is the obstacle for me
>is that both the test and return statement are simple expressions; not
>functions or a particular data type. Any ideas?
>

I think I would refactor along these lines: (untested)

extensiondict = dict(
php = 'application/x-php',
cpp = 'text/x-c-src',
# etcetera
xsl = 'test/xsl'
)

def detectMimeType(filename):
extension = os.path.splitext(filename)[1].replace('.', '')
try: return extensiondict[extension]
except KeyError:
basename = os.path.basename(filename)
if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
raise NoMimeError

Regards,
Bengt Richter
 
Reply With Quote
 
Stephen Thorne
Guest
Posts: n/a
 
      01-13-2005
On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <(E-Mail Removed)> wrote:
> extensiondict = dict(
> php = 'application/x-php',
> cpp = 'text/x-c-src',
> # etcetera
> xsl = 'test/xsl'
> )
>
> def detectMimeType(filename):
> extension = os.path.splitext(filename)[1].replace('.', '')
> try: return extensiondict[extension]
> except KeyError:
> basename = os.path.basename(filename)
> if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
> raise NoMimeError


Why not use a regexp based approach.
extensionlist = [
(re.compile(r'.*\.php') , "application/x-crap-language"),
(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
]
for regexp, mimetype in extensionlist:
if regexp.match(filename):
return mimetype

if you were really concerned about efficiency, you could use something like:
class SimpleMatch:
def __init__(self, pattern): self.pattern = pattern
def match(self, subject): return subject[-len(self.pattern):] == self.pattern

Regards,
Stephen Thorne
 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      01-13-2005
On Thu, 13 Jan 2005 12:19:06 +1000, Stephen Thorne <(E-Mail Removed)> wrote:

>On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <(E-Mail Removed)> wrote:
>> extensiondict = dict(
>> php = 'application/x-php',
>> cpp = 'text/x-c-src',
>> # etcetera
>> xsl = 'test/xsl'
>> )
>>
>> def detectMimeType(filename):
>> extension = os.path.splitext(filename)[1].replace('.', '')

extension = os.path.splitext(filename)[1].replace('.', '').lower() # better

>> try: return extensiondict[extension]
>> except KeyError:
>> basename = os.path.basename(filename)
>> if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
>> raise NoMimeError

>
>Why not use a regexp based approach.

ISTM the dict setup closely reflects the OP's if/elif tests and makes for an efficient substitute
for the functionality when later used for lookup. The regex list is O(n) and the regexes themselves
are at least that, so I don't see a benefit. If you are going to loop through extensionlist, you
might as well write (untested)

flowerew = filename.lower().endswith
for ext, mimetype:
if flowerew(ext): return mimetype
else:
if 'makefile' in filename.lower(): return 'text/x-makefile'
raise NoMimeError

using a lower case extension list including the dot. I think it would run faster
than a regex, and not scare anyone unnecessarily

The dict eliminates the loop, and is easy to understand, so IMO it's a better choice.

>extensionlist = [
>(re.compile(r'.*\.php') , "application/x-crap-language"),
>(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
>(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
>]
>for regexp, mimetype in extensionlist:
> if regexp.match(filename):
> return mimetype
>
>if you were really concerned about efficiency, you could use something like:
>class SimpleMatch:
> def __init__(self, pattern): self.pattern = pattern
> def match(self, subject): return subject[-len(self.pattern):] == self.pattern


I'm not clear on what you are doing here, but if you think you are going to compete
with the timbot's dict efficiency with a casual few lines, I suspect you are PUI
(Posting Under the Influence

Regards,
Bengt Richter
 
Reply With Quote
 
Stephen Thorne
Guest
Posts: n/a
 
      01-13-2005
On Thu, 13 Jan 2005 05:18:57 GMT, Bengt Richter <(E-Mail Removed)> wrote:
> On Thu, 13 Jan 2005 12:19:06 +1000, Stephen Thorne <(E-Mail Removed)> wrote:
>
> >On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <(E-Mail Removed)> wrote:
> >> extensiondict = dict(
> >> php = 'application/x-php',
> >> cpp = 'text/x-c-src',
> >> # etcetera
> >> xsl = 'test/xsl'
> >> )
> >>
> >> def detectMimeType(filename):
> >> extension = os.path.splitext(filename)[1].replace('.', '')

> extension = os.path.splitext(filename)[1].replace('.', '').lower() # better
>
> >> try: return extensiondict[extension]
> >> except KeyError:
> >> basename = os.path.basename(filename)
> >> if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
> >> raise NoMimeError

> >
> >Why not use a regexp based approach.

> ISTM the dict setup closely reflects the OP's if/elif tests and makes for an efficient substitute
> for the functionality when later used for lookup. The regex list is O(n) and the regexes themselves
> are at least that, so I don't see a benefit. If you are going to loop through extensionlist, you
> might as well write (untested)

<code snipped>

*shrug*, O(n*m) actually, where n is the number of mime-types and m is
the length of the extension.

> >extensionlist = [
> >(re.compile(r'.*\.php') , "application/x-crap-language"),
> >(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
> >(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
> >]
> >for regexp, mimetype in extensionlist:
> > if regexp.match(filename):
> > return mimetype
> >
> >if you were really concerned about efficiency, you could use something like:
> >class SimpleMatch:
> > def __init__(self, pattern): self.pattern = pattern
> > def match(self, subject): return subject[-len(self.pattern):] == self.pattern

>
> I'm not clear on what you are doing here, but if you think you are going to compete
> with the timbot's dict efficiency with a casual few lines, I suspect you are PUI
> (Posting Under the Influence


Sorry about that, what I was trying to say was something along the lines of:

extensionlist = [
(re.compile(r'.*\.php') , "application/x-crap-language"),
(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
]
can be made more efficient by doing something like this:
extensionlist = [
SimpleMatch(".php"), "application/x-crap-language"),
(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
]
Where SimpleMatch uses a slice and a comparison instead of a regular
expression engine. SimpleMatch and re.compile both return an object
that when you call .match(s) returns a value that can be interpreted
as a boolean.

As for the overall efficiency concerns, I feel that talking about any
of this is premature optimisation. The optimisation that is really
required in this situation is the same as with any
large-switch-statement idiom, be it C or Python. First one must do a
frequency analysis of the inputs to the switch statement in order to
discover the optimal order of tests!

Regards,
Stephen Thorne
 
Reply With Quote
 
Nick Craig-Wood
Guest
Posts: n/a
 
      01-13-2005
Stephen Thorne <(E-Mail Removed)> wrote:
> Why not use a regexp based approach.


Good idea... You could also use sre.Scanner which is supposed to be
fast like this...

import re, sre

scanner = sre.Scanner([
(r"\.php$", "application/x-php"),
(r"\.(cc|cpp)$", "text/x-c++-src"),
(r"\.xsl$", "xsl"),
(r"Makefile", "text/x-makefile"),
(r".", None),
])

def detectMimeType( filename ):
t = scanner.scan(filename)[0]
if len(t) < 1:
return None
# raise NoMimeError
return t[0]


for f in ("index.php", "index.php3", "prog.cc", "prog.cpp", "flodge.xsl", "Makefile", "myMakefile", "potato.123"):
print f, detectMimeType(f)

....

prints

index.php application/x-php
index.php3 None
prog.cc text/x-c++-src
prog.cpp text/x-c++-src
flodge.xsl xsl
Makefile text/x-makefile
myMakefile text/x-makefile
potato.123 None

--
Nick Craig-Wood <(E-Mail Removed)> -- http://www.craig-wood.com/nick
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C++ 42 11-04-2008 12:39 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C Programming 45 11-04-2008 12:39 PM
List of lists of lists of lists... =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==?= Python 5 05-15-2006 11:47 AM
calling an arbitrary function w/ arbitrary arguments Honestmath C++ 5 12-13-2004 06:18 AM
Combining arbitrary lists Nick Python 7 11-15-2004 04:33 PM



Advertisments