T
Tobia Conforto
Hello
Lately I have been writing a lot of list join() operations variously including (and included in) string format() operations.
For example:
temps = [24.369, 24.550, 26.807, 27.531, 28.752]
out = 'Temperatures: {0} Celsius'.format(
', '.join('{0:.1f}'.format(t) for t in temps)
)
# => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'
This is just a simple example, my actual code has many more join and formatoperations, split into local variables as needed for clarity.
Then I remembered that Ye Old Common Lisp's format operator had built-in list traversing capabilities[1]:
(format t "Temperatures: ~{~1$~^, ~} Celsius" temps)
That format string (the part in the middle that looks like line noise) is admittedly arcane, but it's parsed like this:
~{ take next argument (temp) and start iterating over its contents
~1$ output a floating point number with 1 digit precision
~^ break the loop if there are no more items available
", " (otherwise) output a comma and space
~} end of the loop body
Now, as much as I appreciate the heritage of Lisp, I won't deny than its format string mini-language is EVIL. As a rule, format string placeholders should not include *imperative statements* such as for, break, continue, and if. We don't need a Turing-complete language in our format strings. Still, this is the grand^n-father of Python's format strings, so it's interesting to look at how it used to approach the list joining issue.
Then I asked myself: can I take the list joining capability and port it over to Python's format(), doing away with the overall ugliness?
Here is what I came up with:
out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)
# => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'
Here ", " is the joiner between the items and <.1f> is the format string for each item.
The way this would work is by defining a specific Format Specification Mini-Language for sequences (such as lists, tuples, and iterables).
A Format Specification Mini-Language (format_spec) is whatever follows the first colon in a curly brace placeholder, and is defined by the argument's class, so that it can vary wildly among different types.[2]
The root class (object) defines the generic format_spec we are accustomed to[3]:
[[fill]align][sign][#][0][width][,][.precision][type]
But that doesn't mean that more complex types should not define extensions or replacements. I propose this extended format_spec for sequences:
seq_format_spec ::= join_string [":" item_format_spec] | format_spec
join_string ::= '"' join_string_char* '"' | "'" join_string_char* "'"
join_string_char ::= <any character except "{", "}", newline, or the quote>
item_format_spec ::= format_spec
That is, if the format_spec for a sequence starts with ' or " it would be interpreted as a join operation (eg. {0:", "} or {0:', '}) optionally followed by a format_spec for the single items: {0:", ":.1f}
If the format_spec does not start with ' or ", of if the quote is not balanced (does not appear again in the format_spec), then it's assumed to be a generic format string and the implementation would call super(). This is meant for backwards compatibility with existing code that may be using the generic format_spec over various sequences.
I do think that would be quite readable and useful. Look again at the example:
out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)
As a bonus, it allows nested joins, albeit only for simple cases. For example we could format a dictionary's items:
temps = {'Rome': 26, 'Paris': 21, 'New York': 18}
out = 'Temperatures: {0:", ":" ":s}'.format(temps.items())
# => 'Temperatures: Rome 26, Paris 21, New York 18'
Here the format_spec for temps.items() is <", ":" ":s>. Then ", " would be used as a joiner between the item tuples and <" ":s> would be passed over as the format_spec for each tuple. This in turn would join the tuple's itemsusing a single space and output each item with its simple string format. This could go on and on as needed, adding a colon and joiner string for eachnested join operation.
A more complicated mini-language would be needed to output dicts using different format strings for keys and values, but I think that would be veeringover to unreadable territory.
What do you think?
I plan to write this as a module and propose it to Python's devs for inclusion in the main tree, but any criticism is welcome before I do that.
-Tobia
[1] http://www.gigamonkeys.com/book/a-few-format-recipes.html
[2] http://docs.python.org/3/library/string.html#formatstrings
[3] http://docs.python.org/3/library/string.html#formatspec
Lately I have been writing a lot of list join() operations variously including (and included in) string format() operations.
For example:
temps = [24.369, 24.550, 26.807, 27.531, 28.752]
out = 'Temperatures: {0} Celsius'.format(
', '.join('{0:.1f}'.format(t) for t in temps)
)
# => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'
This is just a simple example, my actual code has many more join and formatoperations, split into local variables as needed for clarity.
Then I remembered that Ye Old Common Lisp's format operator had built-in list traversing capabilities[1]:
(format t "Temperatures: ~{~1$~^, ~} Celsius" temps)
That format string (the part in the middle that looks like line noise) is admittedly arcane, but it's parsed like this:
~{ take next argument (temp) and start iterating over its contents
~1$ output a floating point number with 1 digit precision
~^ break the loop if there are no more items available
", " (otherwise) output a comma and space
~} end of the loop body
Now, as much as I appreciate the heritage of Lisp, I won't deny than its format string mini-language is EVIL. As a rule, format string placeholders should not include *imperative statements* such as for, break, continue, and if. We don't need a Turing-complete language in our format strings. Still, this is the grand^n-father of Python's format strings, so it's interesting to look at how it used to approach the list joining issue.
Then I asked myself: can I take the list joining capability and port it over to Python's format(), doing away with the overall ugliness?
Here is what I came up with:
out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)
# => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'
Here ", " is the joiner between the items and <.1f> is the format string for each item.
The way this would work is by defining a specific Format Specification Mini-Language for sequences (such as lists, tuples, and iterables).
A Format Specification Mini-Language (format_spec) is whatever follows the first colon in a curly brace placeholder, and is defined by the argument's class, so that it can vary wildly among different types.[2]
The root class (object) defines the generic format_spec we are accustomed to[3]:
[[fill]align][sign][#][0][width][,][.precision][type]
But that doesn't mean that more complex types should not define extensions or replacements. I propose this extended format_spec for sequences:
seq_format_spec ::= join_string [":" item_format_spec] | format_spec
join_string ::= '"' join_string_char* '"' | "'" join_string_char* "'"
join_string_char ::= <any character except "{", "}", newline, or the quote>
item_format_spec ::= format_spec
That is, if the format_spec for a sequence starts with ' or " it would be interpreted as a join operation (eg. {0:", "} or {0:', '}) optionally followed by a format_spec for the single items: {0:", ":.1f}
If the format_spec does not start with ' or ", of if the quote is not balanced (does not appear again in the format_spec), then it's assumed to be a generic format string and the implementation would call super(). This is meant for backwards compatibility with existing code that may be using the generic format_spec over various sequences.
I do think that would be quite readable and useful. Look again at the example:
out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)
As a bonus, it allows nested joins, albeit only for simple cases. For example we could format a dictionary's items:
temps = {'Rome': 26, 'Paris': 21, 'New York': 18}
out = 'Temperatures: {0:", ":" ":s}'.format(temps.items())
# => 'Temperatures: Rome 26, Paris 21, New York 18'
Here the format_spec for temps.items() is <", ":" ":s>. Then ", " would be used as a joiner between the item tuples and <" ":s> would be passed over as the format_spec for each tuple. This in turn would join the tuple's itemsusing a single space and output each item with its simple string format. This could go on and on as needed, adding a colon and joiner string for eachnested join operation.
A more complicated mini-language would be needed to output dicts using different format strings for keys and values, but I think that would be veeringover to unreadable territory.
What do you think?
I plan to write this as a module and propose it to Python's devs for inclusion in the main tree, but any criticism is welcome before I do that.
-Tobia
[1] http://www.gigamonkeys.com/book/a-few-format-recipes.html
[2] http://docs.python.org/3/library/string.html#formatstrings
[3] http://docs.python.org/3/library/string.html#formatspec