Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Need better string methods

Reply
Thread Tools

Need better string methods

 
 
Skip Montanaro
Guest
Posts: n/a
 
      03-07-2004

David> I am convinced that Python can do anything that can be done by
David> these CPL's, but I know it will be an uphill battle getting
David> design engineers to learn yet another scripting language....

David> The resistance will come from people who throw at us little bits
David> and pieces of code that can be done more easily in their chosen
David> CPL.

Then throw little bits and pieces of code back at them that can be done more
easily in Python. <0.5 wink>

David> String processing, for example, is one area where we may face
David> some difficulty.

...

David> # Ruby:
David> # clean = line.chomp.strip('.').squeeze.split(/\s*\|\s*/)

David> This is pretty straight-forward once you know what each of the
David> methods do.

David> # Current best Python:
David> clean = [' '.join(t.split()).strip('.') for t in line.split('|')]

David> This is too much to expect of a non-programmer, even one who
David> undestands the methods.

...

My arguments from the "Zen of Python" would be:

Beautiful is better than ugly.
Simple is better than complex.
Sparse is better than dense.
Readability counts.

These aphorisms are especially important for non-programmers. They simply
aren't going to be able to remember what the above Ruby or Python code does
in six months without at least a little bit of study, especially if it's
buried in other similar code. That study will distract them, however
momentarily, from the actual task at hand. That breaks their chain of
concentration on the actual task at hand and lowers their productivity.

To that end, my proposed solution for your string smashing problem would be
something like:

import csv

for row in csv.reader(file("gradoo.csv"), delimiter='|'):
print row
# elide spaces
row = [" ".join(s.split()) for s in row]
print row
# trim leading ...
row = [s.lstrip(".") for s in row]
print row

given that gradoo.csv contains the line from your example. The advantages
that I see are:

* it's got some simple comments which identify the work being done

* it's easier to add new operations if needed in the future

* avoiding long chains of string methods makes the code easier to read

Skip

 
Reply With Quote
 
 
 
 
Stephen Horne
Guest
Posts: n/a
 
      03-07-2004
On Sat, 06 Mar 2004 12:01:16 -0700, David MacQuigg <(E-Mail Removed)>
wrote:

># Ruby:
># clean = line.chomp.strip('.').squeeze.split(/\s*\|\s*/)
>
>This is pretty straight-forward once you know what each of the methods
>do.
>
># Current best Python:
>clean = [' '.join(t.split()).strip('.') for t in line.split('|')]


So what you are saying is that non-programmers just naturally
understand what "/\s*\|\s*/" means!

I kind of agree with you about the join method - I far prefer the now
deprecated function. But it's not much of a problem - you don't _have_
to use method-call syntax for Python, just get the unbound method from
the class and call it with the object as the first parameter...

>>> str.join (' ', ['a', 'b', 'c'])

'a b c'

I guess I see the advantage in the Ruby form. It can of course be
replicated in Python using a library, but being able to handle the
task as neatly by default would be a plus.

So, how about this...

>>> line.lstrip ('.'); re.sub (' +', ' ', _).strip (); re.split (' ?\| ?', _)

'/bgref/stats.stf| SPICE | 3.2.7 | John Anderson \n'
'/bgref/stats.stf| SPICE | 3.2.7 | John Anderson'
['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']



Using ';' and '_', you can chain any functions or methods you want.
The downsides are (1) it only works at the command line, and (2) you
get intermediate results displayed.

A temporary variable can handle both issues, of course...

>>> t=line.lstrip('.'); t=re.sub(' +', ' ', t).strip(); re.split(' ?\| ?', t)

['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']


or, to save some hassle...

>>> def squeeze (p) :

.... return re.sub (' +', ' ', p)
....
>>> t=line.lstrip('.'); t=squeeze(t).strip(); re.split(' ?\| ?', t)

['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']


On this basis, perhaps it would be useful to support the '_' variable
outside of the command line, and maybe to suppress all but the last
result when ';' is used on the command line.

OTOH, as you suggest, maybe we could use some extra string methods.
With an equivalent to the Ruby 'squeeze' and support for regular
expression methods, we could write...

line.strip().lstrip('.').squeeze().resplit(' ?\| ?')

Which is very much like the Ruby example.

Finally, it seems to me that this kind of tidy-and-split is probably a
common requirement. The split is easy enough, but after pondering
Robert Brewers argument I wondered if maybe a specialised tidying
class could do the job...

import re

class cleaner :
steps = []

def lstrip (self, *args) :
self.steps.append (lambda s : s.lstrip (*args))
return self

def rstrip (self, *args) :
self.steps.append (lambda s : s.rstrip (*args))
return self

def strip (self, *args) :
self.steps.append (lambda s : s.strip (*args))
return self

def squeeze (self) :
pat = re.compile (' +')
self.steps.append (lambda s : pat.sub (' ', s))
return self

def resub (self, regex, rep) :
pat=re.compile (regex)
self.steps.append (lambda s : pat.sub (rep, s))
return self

def clean (self, p) :
for i in self.steps :
p = i (p)
return p

line = "..../bgref/stats.stf| SPICE | 3.2.7 | John Anderson \n"

mycleaner = cleaner().lstrip(".").strip() \
.squeeze().resub(' ?\| ?','|')

print mycleaner.clean(line).split("|")


--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk
 
Reply With Quote
 
 
 
 
Stephen Horne
Guest
Posts: n/a
 
      03-07-2004
On Sun, 7 Mar 2004 08:29:21 -0600, Skip Montanaro <(E-Mail Removed)>
wrote:

>My arguments from the "Zen of Python" would be:
>
> Beautiful is better than ugly.
> Simple is better than complex.
> Sparse is better than dense.
> Readability counts.


Sparse can certainly be better than dense, but it is not an absolute.
With any style rule there is a need to balance issues and to use
common sense. If code can be made denser while still being readable
then more functionality can be viewed on screen at once - a major
benefit in readability and understanding as the more you can see, the
less you have to remember.

The Ruby code was IMO easier to understand Davids 'best' Python
(except for the regular expression). The left-to-right sequencing is
really no different than top-to-bottom sequencing in readability
terms. And adding comments is pointless when those comments just
duplicate what a standard method name already tells you - worse than
pointless, in fact, as it obscures the code that you're trying to
read. Good names are better than compensatory comments, and anyone
claiming to be a programmer should know the everyday names that are
used in his chosen language.

I know that isn't what your comments did, but my point is that the
Ruby example really doesn't need them. The nearest equivalent Python
code requires a temporary variable and either semicolons or splitting
over a few lines - the latter is probably better, though I adopted the
former in my earlier post. Simply breaking the code up, though,
provides no real readability benefits.

Put it this
way. How
much am I
improving
the
readability
of this
paragraph
by making
it stupidly
narrow like
this?

Splitting a perfectly clear line of code over several lines is exactly
the same thing and, as I said, the only readability issue that I could
see in the Ruby code was the regular expression.


--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk
 
Reply With Quote
 
benjamin schollnick
Guest
Posts: n/a
 
      03-07-2004
In article <(E-Mail Removed)>, David MacQuigg
<(E-Mail Removed)> wrote:

> The resistance will come from people who throw at us little bits and
> pieces of code that can be done more easily in their chosen CPL.
> String processing, for example, is one area where we may face some
> difficulty. Here is a typical line of garbage from a statefile
> revision control system (simplified to eliminate some items that pose
> no new challenges):
>
> line = "..../bgref/stats.stf| SPICE | 3.2.7 | John Anderson \n"
>
> The problem is to break this into its component parts, and eliminate
> spaces and other gradoo. The cleaned-up list should look like:
>
> ['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']
>
> # Ruby:
> # clean = line.chomp.strip('.').squeeze.split(/\s*\|\s*/)
>
> This is pretty straight-forward once you know what each of the methods
> do.
>
> # Current best Python:
> clean = [' '.join(t.split()).strip('.') for t in line.split('|')]
>
> This is too much to expect of a non-programmer, even one who
> undestands the methods. The usability problems are 1) the three
> variations in syntax ( methods, a list comprehension, and what *looks
> like* a join function prefixed by some odd punctuation), and 2) The
> order in which each step is entered at the keyboard. ( I can show
> this in step-by-step detail if anyone doesn't understand what I mean.)
> 3) Proper placement of parens can be confusing.


David,

I think your coming at this too much like a programmer... |-)

Your right, this is tooo complex for a non-programmer to expect
to simply use...

So redefine the problem, or look at it from a 90 degree angle.

If making the users understand the syntax is to complex, than
redefine the syntax.

Define a set of commands, and make them function wrappers around
your code.

> line = "..../bgref/stats.stf| SPICE | 3.2.7 | John Anderson \n"


I am assuming your running into these lines on a regular basis, so
make a wrapper around your python function... Call it "Cleanup" or
"Parse_bar_line_string" or something that makes sense to your
users, and have them call that function....

- Benjamin
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there a way to find the class methods of a class, just like'methods' finds the instance methods? Kenneth McDonald Ruby 5 09-26-2008 03:09 PM
map question - need to find a string from an int but also need the int from the string? Angus C++ 3 05-03-2008 02:27 PM
Is splint really better than lint? Is there a better tool than splint? Peter Bencsik C Programming 2 09-21-2006 10:02 PM
Build a Better Blair (like Build a Better Bush, only better) Kenny Computer Support 0 05-06-2005 04:50 AM
Why doesn't the better camera have a better dpi? Tony Carlisle Digital Photography 6 10-04-2003 10:40 AM



Advertisments