Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Rough draft: Proposed format specifier for a thousands separator

Reply
Thread Tools

Rough draft: Proposed format specifier for a thousands separator

 
 
Raymond Hettinger
Guest
Posts: n/a
 
      03-12-2009
If anyone here is interested, here is a proposal I posted on the
python-ideas list.

The idea is to make numbering formatting a little easier with the new
format() builtin
in Py2.6 and Py3.0: http://docs.python.org/library/string.html#formatspec


-------------------------------------------------------------


Motivation:

Provide a simple, non-locale aware way to format a number
with a thousands separator.

Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of
output exposed to end users.

In the finance world, output with commas is the norm. Finance
users
and non-professional programmers find the locale approach to be
frustrating, arcane and non-obvious.

It is not the goal to replace locale or to accommodate every
possible convention. The goal is to make a common task easier
for many users.


Research so far:

Scanning the web, I've found that thousands separators are
usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. The
COMMA is used when a PERIOD is the decimal separator.

James Knight observed that Indian/Pakistani numbering systems
group by hundreds. Ben Finney noted that Chinese group by
ten-thousands.

Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format specifiers
like: "_($* #,##0_)".



Proposal I (from Nick Coghlan]:

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][minimumwidth][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a
thousands separator. As with locales which do not use a period as
the
decimal point, locales which use a different convention for digit
separation will need to use the locale module to obtain
appropriate
formatting.

The proposal works well with floats, ints, and decimals. It also
allows easy substitution for other separators. For example:

format(n, "6,f").replace(",", "_")

This technique is completely general but it is awkward in the one
case where the commas and periods need to be swapped.

format(n, "6,f").replace(",", "X").replace(".", ",").replace
("X", ".")


Proposal II (to meet Antoine Pitrou's request):

Make both the thousands separator and decimal separator user
specifiable
but not locale aware. For simplicity, limit the choices to a
comma, period,
space, or underscore..

[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision]
[type]

Examples:

format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8T.,1f") --> ' 1.234,0'
format(1234, "8T .f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8T,d") --> ' 1,234'

This proposal meets mosts needs (except for people wanting
grouping
for hundreds or ten-thousands), but it comes at the expense of
being a little more complicated to learn and remember. Also, it
makes it
more challenging to write custom __format__ methods that follow
the
format specification mini-language.

For the locale module, just the "T" is necessary in a formatting
string
since the tool already has procedures for figuring out the actual
separators from the local context.



Comments and suggestions are welcome but I draw the line at supporting
Mayan numbering conventions


Raymond
 
Reply With Quote
 
 
 
 
Raymond Hettinger
Guest
Posts: n/a
 
      03-12-2009
> If anyone here is interested, here is a proposal I posted on the
> python-ideas list.
>
> The idea is to make numbering formatting a little easier with
> the new format() builtin:
> http://docs.python.org/library/string.html#formatspec


Here's a re-post (hopefully without the line wrapping problems
in the previous post).

Raymond

-------------------------------------------------------------



Motivation:
-----------

Provide a simple, non-locale aware way to format a number
with a thousands separator.

Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output
exposed to end users.

In the finance world, output with commas is the norm. Finance
users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.

It is not the goal to replace locale or to accommodate every
possible convention. The goal is to make a common task easier
for many users.


Research so far:
----------------

Scanning the web, I've found that thousands separators are
usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. The
COMMA is used when a PERIOD is the decimal separator.

James Knight observed that Indian/Pakistani numbering systems
group by hundreds. Ben Finney noted that Chinese group by
ten-thousands.

Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format
specifiers like: "_($* #,##0_)".



Proposal I (from Nick Coghlan):
-------------------------------

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][minimumwidth][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.

The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example:

format(n, "6,f").replace(",", "_")

This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped:

format(n, "6,f").replace(",", "X").replace(".", ",").replace("X",
".")


Proposal II (to meet Antoine Pitrou's request):
-----------------------------------------------

Make both the thousands separator and decimal separator user
specifiable but not locale aware. For simplicity, limit the
choices to a comma, period, space, or underscore.

[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]

Examples:

format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8T.,1f") --> ' 1.234,0'
format(1234, "8T .f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8T,d") --> ' 1,234'

This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but iIt comes at the
expense of being a little more complicated to learn and
remember. Also, it makes it more challenging to write custom
__format__ methods that follow the format specification
mini-language.

For the locale module, just the "T" is necessary in a
formatting string since the tool already has procedures for
figuring out the actual separators from the local context.

 
Reply With Quote
 
 
 
 
Ulrich Eckhardt
Guest
Posts: n/a
 
      03-12-2009
Raymond Hettinger wrote:
>> The idea is to make numbering formatting a little easier with
>> the new format() builtin:
>> http://docs.python.org/library/string.html#formatspec

[...]
> Scanning the web, I've found that thousands separators are
> usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. The
> COMMA is used when a PERIOD is the decimal separator.
>
> James Knight observed that Indian/Pakistani numbering systems
> group by hundreds. Ben Finney noted that Chinese group by
> ten-thousands.


IIRC, some cultures use a non-uniform grouping, like e.g. "123 456 78.9".
For that, there is also a grouping reserved in the locale (at least in
those of C++ IOStreams, that is). Further, an that seems to also be one of
your concerns, there are different ways to represent negative numbers like
e.g. "(123)" or "-456".


> Make both the thousands separator and decimal separator user
> specifiable but not locale aware. For simplicity, limit the
> choices to a comma, period, space, or underscore.
>
> [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]
>
> Examples:
>
> format(1234, "8.1f") --> ' 1234.0'
> format(1234, "8,1f") --> ' 1234,0'
> format(1234, "8T.,1f") --> ' 1.234,0'
> format(1234, "8T .f") --> ' 1 234,0'
> format(1234, "8d") --> ' 1234'
> format(1234, "8T,d") --> ' 1,234'



How about this?
format(1234, "8.1", tsep=",")
--> ' 1,234.0'
format(1234, "8.1", tsep=".", dsep=",")
--> ' 1.234,0'
format(123456, tsep=" ", grouping=(3, 2,))
--> '1 234 56'

IOW, why not explicitly say what you want using keyword arguments with
defaults instead of inventing an IMHO cryptic, read-only mini-language?
Seriously, the problem I see with this proposal is that its aim to be as
short as possible actually makes the resulting format specifications
unreadable. Could you even guess what "8T.,1f" should mean if you had not
written this?

> This proposal meets mosts needs (except for people wanting
> grouping for hundreds or ten-thousands), but iIt comes at the
> expense of being a little more complicated to learn and
> remember.


Too expensive for my taste.

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      03-12-2009
[Ulrich Eckhardt]
> IOW, why not explicitly say what you want using keyword arguments with
> defaults instead of inventing an IMHO cryptic, read-only mini-language?


That makes sense to me but I don't think that's the way the format()
builtin was implemented (see PEP 3101 which was implemented Py2.6 and
3.0).
It is a simple pass-through to a __format__ method for each
formattable
object. I don't see how keywords would fit in that framework. What
is
proposed is similar to locale module's existing "n" specifier except
that
this lets you say exactly what you want instead of deferring to the
locale
settings.

The mini-language seems to already be the way of things (just as it is
many other languages including PHP, C, Fortran, and whatnot). I'm
just
proposing an addition "T," so you add commas as a thousands separator.


Raymond

 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      03-12-2009
On Mar 12, 9:56*pm, Raymond Hettinger <(E-Mail Removed)> wrote:
> [Ulrich Eckhardt]
>
> > IOW, why not explicitly say what you want using keyword arguments with
> > defaults instead of inventing an IMHO cryptic, read-only mini-language?

>
> That makes sense to me but I don't think that's the way the format()
> builtin was implemented (see PEP 3101 which was implemented Py2.6 and
> 3.0).
> It is a simple pass-through to a __format__ method for each
> formattable
> object. *I don't see how keywords would fit in that framework. *What
> is
> proposed is similar to locale module's existing "n" specifier except
> that
> this lets you say exactly what you want instead of deferring to the
> locale
> settings.
>
> The mini-language seems to already be the way of things (just as it is
> many other languages including PHP, C, Fortran, and whatnot). *I'm
> just
> proposing an addition "T," so you add commas as a thousands separator.
>


.... and why not C (centum) for hundreds (can't have H(ollerith)) and W
for wan (the Chinese word for 10 thousand)?


 
Reply With Quote
 
Hendrik van Rooyen
Guest
Posts: n/a
 
      03-12-2009
"Ulrich Eckhardt" <eck...aser.com> wrote:

>IOW, why not explicitly say what you want using keyword arguments with
>defaults instead of inventing an IMHO cryptic, read-only mini-language?
>Seriously, the problem I see with this proposal is that its aim to be as
>short as possible actually makes the resulting format specifications
>unreadable. Could you even guess what "8T.,1f" should mean if you had not
>written this?


+1

Look back in history, and see how COBOL did it with the
PICTURE - dead easy and easily understandable.
Compared to that, even the C printf stuff and python's %
are incomprehensible.

- Hendrik


 
Reply With Quote
 
MRAB
Guest
Posts: n/a
 
      03-12-2009
Raymond Hettinger wrote:
[snip]
> Proposal I (from Nick Coghlan):
> -------------------------------
>
> A comma will be added to the format() specifier mini-language:
>
> [[fill]align][sign][#][0][minimumwidth][,][.precision][type]
>
> The ',' option indicates that commas should be included in the
> output as a thousands separator. As with locales which do not
> use a period as the decimal point, locales which use a
> different convention for digit separation will need to use the
> locale module to obtain appropriate formatting.
>
> The proposal works well with floats, ints, and decimals.
> It also allows easy substitution for other separators.
> For example:
>
> format(n, "6,f").replace(",", "_")
>
> This technique is completely general but it is awkward in the
> one case where the commas and periods need to be swapped:
>
> format(n, "6,f").replace(",", "X").replace(".", ",").replace("X",
> ".")
>
>
> Proposal II (to meet Antoine Pitrou's request):
> -----------------------------------------------
>
> Make both the thousands separator and decimal separator user
> specifiable but not locale aware. For simplicity, limit the
> choices to a comma, period, space, or underscore.
>
> [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]
>
> Examples:
>
> format(1234, "8.1f") --> ' 1234.0'
> format(1234, "8,1f") --> ' 1234,0'
> format(1234, "8T.,1f") --> ' 1.234,0'
> format(1234, "8T .f") --> ' 1 234,0'
> format(1234, "8d") --> ' 1234'
> format(1234, "8T,d") --> ' 1,234'
>
> This proposal meets mosts needs (except for people wanting
> grouping for hundreds or ten-thousands), but iIt comes at the
> expense of being a little more complicated to learn and
> remember. Also, it makes it more challenging to write custom
> __format__ methods that follow the format specification
> mini-language.
>
> For the locale module, just the "T" is necessary in a
> formatting string since the tool already has procedures for
> figuring out the actual separators from the local context.
>

[snip]
I'd probably prefer Proposal I with "." representing the decimal point
and "," representing the grouping (thousands) separator, although I'd
add an "L" flag to indicate that it should use the locale to provide the
actual characters to be used and even the number of digits for the
grouping:

[[fill]align][sign][#][0][minimumwidth][,][.precision][L][type]

Examples:

Assuming the locale has:

decimal point: ","
grouping separator: "."
grouping spacing: 3

format(123456, "10.1f") --> ' 123456.0'
format(123456, "10.1Lf") --> ' 123.456,0'
format(123456, "10,.1f") --> ' 123,456.0'
format(123456, "10,.1Lf") --> ' 123.456,0'

 
Reply With Quote
 
pruebauno@latinmail.com
Guest
Posts: n/a
 
      03-12-2009
On Mar 12, 3:30*am, Raymond Hettinger <(E-Mail Removed)> wrote:
> If anyone here is interested, here is a proposal I posted on the
> python-ideas list.
>
> The idea is to make numbering formatting a little easier with the new
> format() builtin
> in Py2.6 and Py3.0: *http://docs.python.org/library/string.html#formatspec
>
> -------------------------------------------------------------
>
> Motivation:
>
> * * Provide a simple, non-locale aware way to format a number
> * * with a thousands separator.
>
> * * Adding thousands separators is one of the simplest ways to
> * * improve the professional appearance and readability of
> * * output exposed to end users.
>
> * * In the finance world, output with commas is the norm. *Finance
> users
> * * and non-professional programmers find the locale approach to be
> * * frustrating, arcane and non-obvious.
>
> * * It is not the goal to replace locale or to accommodate every
> * * possible convention. *The goal is to make a common task easier
> * * for many users.
>
> Research so far:
>
> * * Scanning the web, I've found that thousands separators are
> * * usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. *The
> * * COMMA is used when a PERIOD is the decimal separator.
>
> * * James Knight observed that Indian/Pakistani numbering systems
> * * group by hundreds. * Ben Finney noted that Chinese group by
> * * ten-thousands.
>
> * * Visual Basic and its brethren (like MS Excel) use a completely
> * * different style and have ultra-flexible custom format specifiers
> * * like: "_($* #,##0_)".
>
> Proposal I (from Nick Coghlan]:
>
> * * A comma will be added to the format() specifier mini-language:
>
> * * [[fill]align][sign][#][0][minimumwidth][,][.precision][type]
>
> * * The ',' option indicates that commas should be included in the
> output as a
> * * thousands separator. As with locales which do not use a period as
> the
> * * decimal point, locales which use a different convention for digit
> * * separation will need to use the locale module to obtain
> appropriate
> * * formatting.
>
> * * The proposal works well with floats, ints, and decimals. *It also
> * * allows easy substitution for other separators. *For example:
>
> * * * * format(n, "6,f").replace(",", "_")
>
> * * This technique is completely general but it is awkward in the one
> * * case where the commas and periods need to be swapped.
>
> * * * * format(n, "6,f").replace(",", "X").replace(".", ",").replace
> ("X", ".")
>
> Proposal II (to meet Antoine Pitrou's request):
>
> * * Make both the thousands separator and decimal separator user
> specifiable
> * * but not locale aware. *For simplicity, limit the choices to a
> comma, period,
> * * space, or underscore..
>
> * * [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision]
> [type]
>
> * * Examples:
>
> * * * * format(1234, "8.1f") * *--> * * ' *1234.0'
> * * * * format(1234, "8,1f") * *--> * * ' *1234,0'
> * * * * format(1234, "8T.,1f") *--> * * ' 1.234,0'
> * * * * format(1234, "8T .f") * --> * * ' 1 234,0'
> * * * * format(1234, "8d") * * *--> * * ' * *1234'
> * * * * format(1234, "8T,d") * * *--> * ' * 1,234'
>
> * * This proposal meets mosts needs (except for people wanting
> grouping
> * * for hundreds or ten-thousands), but it comes at the expense of
> * * being a little more complicated to learn and remember. *Also, it
> makes it
> * * more challenging to write custom __format__ methods that follow
> the
> * * format specification mini-language.
>
> * * For the locale module, just the "T" is necessary in a formatting
> string
> * * since the tool already has procedures for figuring out the actual
> * * separators from the local context.
>
> Comments and suggestions are welcome but I draw the line at supporting
> Mayan numbering conventions
>
> Raymond


As far as I am concerned the most simple version plus a way to swap
around commas and period is all that is needed. The rest can be done
using one replace (because the decimal separator is always one of two
options). This should cover everywhere but the far east. 80% of cases
for 20% of implementation complexity.

For example:

[[fill]align][sign][#][0][,|.][minimumwidth][.precision][type]

> format(1234, ".8.1f") --> ' 1.234,0'
> format(1234, ",8.1f") --> ' 1,234.0'


 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      03-12-2009
On Mar 12, 7:51*am, (E-Mail Removed) wrote:
> On Mar 12, 3:30*am, Raymond Hettinger <(E-Mail Removed)> wrote:
>
>
>
> > If anyone here is interested, here is a proposal I posted on the
> > python-ideas list.

>
> > The idea is to make numbering formatting a little easier with the new
> > format() builtin
> > in Py2.6 and Py3.0: *http://docs.python.org/library/string.html#formatspec

>
> > -------------------------------------------------------------

>
> > Motivation:

>
> > * * Provide a simple, non-locale aware way to format a number
> > * * with a thousands separator.

>
> > * * Adding thousands separators is one of the simplest ways to
> > * * improve the professional appearance and readability of
> > * * output exposed to end users.

>
> > * * In the finance world, output with commas is the norm. *Finance
> > users
> > * * and non-professional programmers find the locale approach to be
> > * * frustrating, arcane and non-obvious.

>
> > * * It is not the goal to replace locale or to accommodate every
> > * * possible convention. *The goal is to make a common task easier
> > * * for many users.

>
> > Research so far:

>
> > * * Scanning the web, I've found that thousands separators are
> > * * usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. *The
> > * * COMMA is used when a PERIOD is the decimal separator.

>
> > * * James Knight observed that Indian/Pakistani numbering systems
> > * * group by hundreds. * Ben Finney noted that Chinese group by
> > * * ten-thousands.

>
> > * * Visual Basic and its brethren (like MS Excel) use a completely
> > * * different style and have ultra-flexible custom format specifiers
> > * * like: "_($* #,##0_)".

>
> > Proposal I (from Nick Coghlan]:

>
> > * * A comma will be added to the format() specifier mini-language:

>
> > * * [[fill]align][sign][#][0][minimumwidth][,][.precision][type]

>
> > * * The ',' option indicates that commas should be included in the
> > output as a
> > * * thousands separator. As with locales which do not use a period as
> > the
> > * * decimal point, locales which use a different convention for digit
> > * * separation will need to use the locale module to obtain
> > appropriate
> > * * formatting.

>
> > * * The proposal works well with floats, ints, and decimals. *It also
> > * * allows easy substitution for other separators. *For example:

>
> > * * * * format(n, "6,f").replace(",", "_")

>
> > * * This technique is completely general but it is awkward in the one
> > * * case where the commas and periods need to be swapped.

>
> > * * * * format(n, "6,f").replace(",", "X").replace(".", ",").replace
> > ("X", ".")

>
> > Proposal II (to meet Antoine Pitrou's request):

>
> > * * Make both the thousands separator and decimal separator user
> > specifiable
> > * * but not locale aware. *For simplicity, limit the choices to a
> > comma, period,
> > * * space, or underscore..

>
> > * * [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision]
> > [type]

>
> > * * Examples:

>
> > * * * * format(1234, "8.1f") * *--> * * ' *1234.0'
> > * * * * format(1234, "8,1f") * *--> * * ' *1234,0'
> > * * * * format(1234, "8T.,1f") *--> * * ' 1.234,0'
> > * * * * format(1234, "8T .f") * --> * * ' 1 234,0'
> > * * * * format(1234, "8d") * * *--> * * ' * *1234'
> > * * * * format(1234, "8T,d") * * *--> * ' * 1,234'

>
> > * * This proposal meets mosts needs (except for people wanting
> > grouping
> > * * for hundreds or ten-thousands), but it comes at the expense of
> > * * being a little more complicated to learn and remember. *Also, it
> > makes it
> > * * more challenging to write custom __format__ methods that follow
> > the
> > * * format specification mini-language.

>
> > * * For the locale module, just the "T" is necessary in a formatting
> > string
> > * * since the tool already has procedures for figuring out the actual
> > * * separators from the local context.

>
> > Comments and suggestions are welcome but I draw the line at supporting
> > Mayan numbering conventions

>
> > Raymond

>
> As far as I am concerned the most simple version plus a way to swap
> around commas and period is all that is needed.


Thanks for the feedback.

FWIW, posted a cleaned-up version of the proposal at
http://www.python.org/dev/peps/pep-0378/


Raymond
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      03-12-2009
Raymond Hettinger <(E-Mail Removed)> writes:
> FWIW, posted a cleaned-up version of the proposal at
> http://www.python.org/dev/peps/pep-0378/


It would be nice if the PEP included a comparison between the proposed
scheme and how it is done in other programs and languages. For
example, I think Common Lisp has a feature for formatting thousands.
Spreadsheets like Excel probably have something similar. Those
programs are pretty well evolved and probably address the important
real use cases by now. It might be best to follow an existing example
(with adjustments for Pythonification as necessary) to the extent
possible.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
limiting string size in data grid format specifier for column =?Utf-8?B?UGF1bA==?= ASP .Net 1 02-01-2005 12:47 AM
Struggling with struct.unpack() and "p" format specifier Geoffrey Python 5 12-01-2004 02:20 PM
proposed struct module format code addition Josiah Carlson Python 0 10-03-2004 01:08 AM
Formatting Number With Thousands Separator Douglas Javascript 12 04-10-2004 03:09 AM
format specifier Ravi Uday C Programming 2 07-16-2003 09:24 AM



Advertisments