Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Format specification mini-language for list joining (http://www.velocityreviews.com/forums/t954388-format-specification-mini-language-for-list-joining.html)

Tobia Conforto 11-10-2012 10:26 AM

Format specification mini-language for list joining
 
Hello

Lately I have been writing a lot of list join() operations variously including (and included in) string format() operations.

For example:

temps = [24.369, 24.550, 26.807, 27.531, 28.752]

out = 'Temperatures: {0} Celsius'.format(
', '.join('{0:.1f}'.format(t) for t in temps)
)

# => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'

This is just a simple example, my actual code has many more join and formatoperations, split into local variables as needed for clarity.

Then I remembered that Ye Old Common Lisp's format operator had built-in list traversing capabilities[1]:

(format t "Temperatures: ~{~1$~^, ~} Celsius" temps)

That format string (the part in the middle that looks like line noise) is admittedly arcane, but it's parsed like this:

~{ take next argument (temp) and start iterating over its contents
~1$ output a floating point number with 1 digit precision
~^ break the loop if there are no more items available
", " (otherwise) output a comma and space
~} end of the loop body

Now, as much as I appreciate the heritage of Lisp, I won't deny than its format string mini-language is EVIL. As a rule, format string placeholders should not include *imperative statements* such as for, break, continue, and if. We don't need a Turing-complete language in our format strings. Still, this is the grand^n-father of Python's format strings, so it's interesting to look at how it used to approach the list joining issue.

Then I asked myself: can I take the list joining capability and port it over to Python's format(), doing away with the overall ugliness?

Here is what I came up with:

out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

# => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'

Here ", " is the joiner between the items and <.1f> is the format string for each item.

The way this would work is by defining a specific Format Specification Mini-Language for sequences (such as lists, tuples, and iterables).

A Format Specification Mini-Language (format_spec) is whatever follows the first colon in a curly brace placeholder, and is defined by the argument's class, so that it can vary wildly among different types.[2]

The root class (object) defines the generic format_spec we are accustomed to[3]:

[[fill]align][sign][#][0][width][,][.precision][type]

But that doesn't mean that more complex types should not define extensions or replacements. I propose this extended format_spec for sequences:

seq_format_spec ::= join_string [":" item_format_spec] | format_spec
join_string ::= '"' join_string_char* '"' | "'" join_string_char* "'"
join_string_char ::= <any character except "{", "}", newline, or the quote>
item_format_spec ::= format_spec

That is, if the format_spec for a sequence starts with ' or " it would be interpreted as a join operation (eg. {0:", "} or {0:', '}) optionally followed by a format_spec for the single items: {0:", ":.1f}

If the format_spec does not start with ' or ", of if the quote is not balanced (does not appear again in the format_spec), then it's assumed to be a generic format string and the implementation would call super(). This is meant for backwards compatibility with existing code that may be using the generic format_spec over various sequences.

I do think that would be quite readable and useful. Look again at the example:

out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

As a bonus, it allows nested joins, albeit only for simple cases. For example we could format a dictionary's items:

temps = {'Rome': 26, 'Paris': 21, 'New York': 18}

out = 'Temperatures: {0:", ":" ":s}'.format(temps.items())

# => 'Temperatures: Rome 26, Paris 21, New York 18'

Here the format_spec for temps.items() is <", ":" ":s>. Then ", " would be used as a joiner between the item tuples and <" ":s> would be passed over as the format_spec for each tuple. This in turn would join the tuple's itemsusing a single space and output each item with its simple string format. This could go on and on as needed, adding a colon and joiner string for eachnested join operation.

A more complicated mini-language would be needed to output dicts using different format strings for keys and values, but I think that would be veeringover to unreadable territory.

What do you think?

I plan to write this as a module and propose it to Python's devs for inclusion in the main tree, but any criticism is welcome before I do that.

-Tobia

[1] http://www.gigamonkeys.com/book/a-fe...t-recipes.html
[2] http://docs.python.org/3/library/str...#formatstrings
[3] http://docs.python.org/3/library/string.html#formatspec

Paul Rubin 11-10-2012 04:51 PM

Re: Format specification mini-language for list joining
 
Tobia Conforto <tobia.conforto@gmail.com> writes:
> Now, as much as I appreciate the heritage of Lisp, I won't deny than
> its format string mini-language is EVIL. ... Still, this is the
> grand^n-father of Python's format strings...


Without having yet read the rest of your post carefully, I wonder the
particular historical point above is correct. Python's format strings
are pretty much the same as C's format strings, which go back to the
beginnings of C in the 1970's, maybe even to some forerunner of C, like
maybe FOCAL or something like that. It's possible that Common Lisp's
format strings came from some earlier Lisp, but Common Lisp itself was a
1980's thing. Maybe some Lisp historian would know.

Steven D'Aprano 11-10-2012 04:55 PM

Re: Format specification mini-language for list joining
 
On Sat, 10 Nov 2012 02:26:28 -0800, Tobia Conforto wrote:

> Hello
>
> Lately I have been writing a lot of list join() operations variously
> including (and included in) string format() operations.
>
> For example:
>
> temps = [24.369, 24.550, 26.807, 27.531, 28.752]
>
> out = 'Temperatures: {0} Celsius'.format(
> ', '.join('{0:.1f}'.format(t) for t in temps)
> )
>
> # => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'
>
> This is just a simple example, my actual code has many more join and
> format operations, split into local variables as needed for clarity.


Good plan! But then you suggest:


> Here is what I came up with:
> out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)
> # => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'
>
> Here ", " is the joiner between the items and <.1f> is the format string
> for each item.


And there goes all the clarity.

Is saving a few words of Python code so important that you would prefer
to read and write an overly-terse, cryptic mini-language?

If you're worried about code re-use, write a simple helper function:

def format_items(format, items):
template = '{0:%s}' % format
return ', '.join(template.format(item) for item in items)

out = 'Temperatures: {0} Celsius'.format( format_items('.1f, temps) )



--
Steven

Kwpolska 11-10-2012 05:13 PM

Re: Format specification mini-language for list joining
 
On Sat, Nov 10, 2012 at 5:51 PM, Paul Rubin <no.email@nospam.invalid> wrote:
> […] Python's format strings are pretty much the same as C's format strings […]


You’re thinking about the old % syntax, 'Hello %s!' % 'world'. TheOP
meant the new str.format syntax ('Hello {}!'.format('world')).
---

IMO, the idea is useless. First of, format() exists since 2.6, which
was released in 2008. So, it would be hard to use it anyways. Second
of, which is more readable:

out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

or

out = 'Temperatures: {} Celsius'.format(', '.join(temps))

101% of the Python community would opt for the second format. Because
your format is cryptic. The current thing is already
not-quite-easy-to-understand when you use magic (aligning, type
converting etc.), but your proposition is much worse. And I hate to
consult the docs while working on something. As I said, it’s hard to
even get this one changed because str.format is 4 years old.

--
Kwpolska <http://kwpolska.tk>
stop html mail | always bottom-post
www.asciiribbon.org | www.netmeister.org/news/learn2quote.html
GPG KEY: 5EAAEA16

Tobia Conforto 11-10-2012 09:11 PM

Re: Format specification mini-language for list joining
 
Kwpolska wrote:
> > out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

>
> [...] your format is cryptic.


Thank you for your criticism, I'll think it over. The reason I find it readable (-enough) is because even without knowing what format language is supported by the temps object, you can tell that "it" (the 0th argument in thiscase) is what's going to be serialized in that place.

Everything after the first colon is game anyways, meaning you'll have to look it up in the docs, because it's defined somewhere in the class hierarchyof the object being serialized. The fact that 99% of classes don't define a __format__ method and thus fall back on object's implementation, with it's alignment and padding operators, is IMHO irrelevant. It's still somethingyou can't pretend to know out of the box, because it's supposed to be customizable by classes.

Knowing this, if you know that the temps object is a list of floats, then Ithink it'd be pretty obvious what the ", " and the :.1f should do.

> As I said, its hard to even get this one changed
> because str.format is 4 years old.


Again, I beg to differ. I'm not proposing any change to format (that would be madness). What I'm proposing is the addition of a customized __format__ method to a few types, namely lists and sequences, that currently lack it (as do 99% of classes) and fall back to object's implementation. Which is kind of pointless with lists, as joining is by far the thing most often done to them when formatting.

Tobia

Tobia Conforto 11-10-2012 09:11 PM

Re: Format specification mini-language for list joining
 
Kwpolska wrote:
> > out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

>
> [...] your format is cryptic.


Thank you for your criticism, I'll think it over. The reason I find it readable (-enough) is because even without knowing what format language is supported by the temps object, you can tell that "it" (the 0th argument in thiscase) is what's going to be serialized in that place.

Everything after the first colon is game anyways, meaning you'll have to look it up in the docs, because it's defined somewhere in the class hierarchyof the object being serialized. The fact that 99% of classes don't define a __format__ method and thus fall back on object's implementation, with it's alignment and padding operators, is IMHO irrelevant. It's still somethingyou can't pretend to know out of the box, because it's supposed to be customizable by classes.

Knowing this, if you know that the temps object is a list of floats, then Ithink it'd be pretty obvious what the ", " and the :.1f should do.

> As I said, its hard to even get this one changed
> because str.format is 4 years old.


Again, I beg to differ. I'm not proposing any change to format (that would be madness). What I'm proposing is the addition of a customized __format__ method to a few types, namely lists and sequences, that currently lack it (as do 99% of classes) and fall back to object's implementation. Which is kind of pointless with lists, as joining is by far the thing most often done to them when formatting.

Tobia


All times are GMT. The time now is 07:48 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.