PEP 321: Date/Time Parsing and Formatting

G

Gerrit Holl

Posted with permission from the author.
I have some comments on this PEP, see the (coming) followup to this message.

PEP: 321
Title: Date/Time Parsing and Formatting
Version: $Revision: 1.3 $
Last-Modified: $Date: 2003/10/28 19:48:44 $
Author: A.M. Kuchling <[email protected]>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Python-Version: 2.4
Created: 16-Sep-2003
Post-History:


Abstract
========

Python 2.3 added a number of simple date and time types in the
``datetime`` module. There's no support for parsing strings in various
formats and returning a corresponding instance of one of the types.
This PEP proposes adding a family of predefined parsing function for
several commonly used date and time formats, and a facility for generic
parsing.

The types provided by the ``datetime`` module all have
``.isoformat()`` and ``.ctime()`` methods that return string
representations of a time, and the ``.strftime()`` method can be used
to construct new formats. There are a number of additional
commonly-used formats that would be useful to have as part of the
standard library; this PEP also suggests how to add them.


Input Formats
=======================

Useful formats to support include:

* `ISO8601`_
* ARPA/`RFC2822`_
* `ctime`_
* Formats commonly written by humans such as the American
"MM/DD/YYYY", the European "YYYY/MM/DD", and variants such as
"DD-Month-YYYY".
* CVS-style or tar-style dates ("tomorrow", "12 hours ago", etc.)

XXX The Perl `ParseDate.pm`_ module supports many different input formats,
both absolute and relative. Should we try to support them all?

Options:

1) Add functions to the ``datetime`` module::

import datetime
d = datetime.parse_iso8601("2003-09-15T10:34:54")

2) Add class methods to the various types. There are already various
class methods such as ``.now()``, so this would be pretty natural.::

import datetime
d = datetime.date.parse_iso8601("2003-09-15T10:34:54")

3) Add a separate module (possible names: date, date_parse, parse_date)
or subpackage (possible names: datetime.parser) containing parsing
functions::

import datetime
d = datetime.parser.parse_iso8601("2003-09-15T10:34:54")


Unresolved questions:

* Naming convention to use.
* What exception to raise on errors? ValueError, or a specialized exception?
* Should you know what type you're expecting, or should the parsing figure
it out? (e.g. ``parse_iso8601("yyyy-mm-dd")`` returns a ``date`` instance,
but parsing "yyyy-mm-ddThh:mm:ss" returns a ``datetime``.) Should
there be an option to signal an error if a time is provided where
none is expected, or if no time is provided?
* Anything special required for I18N? For time zones?


Generic Input Parsing
=======================

Is a strptime() implementation that returns ``datetime`` types sufficient?

XXX if yes, describe strptime here. Can the existing pure-Python
implementation be easily retargeted?


Output Formats
=======================

Not all input formats need to be supported as output formats, because it's
pretty trivial to get the ``strftime()`` argument right for simple things
such as YYYY/MM/DD. Only complicated formats need to be supported; RFC2822
is currently the only one I can think of.

Options:

1) Provide predefined format strings, so you could write this::

import datetime
d = datetime.datetime(...)
print d.strftime(d.RFC2822_FORMAT) # or datetime.RFC2822_FORMAT?

2) Provide new methods on all the objects::

d = datetime.datetime(...)
print d.rfc822_time()


Relevant functionality in other languages includes the `PHP date`_
function (Python implementation by Simon Willison at
http://simon.incutio.com/archive/2003/10/07/dateInPython)


References
==========

... _RFC2822: http://rfc2822.x42.com

... _ISO8601: http://www.cl.cam.ac.uk/~mgk25/iso-time.html

... _ParseDate.pm: http://search.cpan.org/author/MUIR/Time-modules-2003.0211/lib/Time/ParseDate.pm

... _ctime: http://www.opengroup.org/onlinepubs/007908799/xsh/asctime.html

... _PHP date: http://www.php.net/date

Other useful links:

http://www.egenix.com/files/python/mxDateTime.html
http://ringmaster.arc.nasa.gov/tools/time_formats.html
http://www.thinkage.ca/english/gcos/expl/b/lib/0tosec.html


Copyright
=========

This document has been placed in the public domain.

yours,
Gerrit.
 
P

Paul Moore

Gerrit Holl said:
Python 2.3 added a number of simple date and time types in the
``datetime`` module. There's no support for parsing strings in various
formats and returning a corresponding instance of one of the types.
This PEP proposes adding a family of predefined parsing function for
several commonly used date and time formats, and a facility for generic
parsing.

I assume you're aware of Gustavo Niemeyer's DateUtil module
(https://moin.conectiva.com.br/DateUtil)?

I'm not 100% sure how the parser functionality fits in with this PEP.
It seems to me that this PEP is more focused on parsing specifically
formatted data (not something I need often) whereas Gustavo's function
is about parsing highly general "human input" formats.

As most of my date parsing need is for user input parameters and the
like, I prefer Gustavo's module :)

[After reading through this PEP and commenting, I'd say that my
preference (which may not be Gustavo's!) would be to add dateutil to
the standard library, with the following changes/additions:

1. Add a dateutil.RFC822_FORMAT for output of RFC822-compliant dates.
2. Extend dateutil.parser.parse to handle additional (CVS-style)
possibilities - today, tomorrow, yesterday, things like that.
3. Add dateutil.parser.strptime as a wrapper round time.strptime.

I think that's all.]
* Formats commonly written by humans such as the American
"MM/DD/YYYY", the European "YYYY/MM/DD", and variants such as
"DD-Month-YYYY".

UK format DD/MM/YYYY is worth adding (in my UK-based opinion :)) But
you can get all of these via strptime (wrapped to return a datetime
value).
* CVS-style or tar-style dates ("tomorrow", "12 hours ago", etc.)

That would be nice. I assume it should be combined with a highly
flexible parser, so that the same function that handles "tomorrow"
will also handle "12-dec-2003". This would basically be like Gustavo's
parser, but with extended functionality (Gustavo's doesn't handle
things like "tomorrow").
3) Add a separate module (possible names: date, date_parse, parse_date)
or subpackage (possible names: datetime.parser) containing parsing
functions::

import datetime
d = datetime.parser.parse_iso8601("2003-09-15T10:34:54")

I'd go for this option. Actually, I'd support including Gustavo's
dateutil module in the standard library. This PEP then involves adding
a number of additional (specialised) parsers to the dateutil.parser
subpackage.
* What exception to raise on errors? ValueError, or a specialized exception?

ValueError seems perfectly adequate.
* Should you know what type you're expecting, or should the parsing figure
it out? (e.g. ``parse_iso8601("yyyy-mm-dd")`` returns a ``date`` instance,
but parsing "yyyy-mm-ddThh:mm:ss" returns a ``datetime``.)

I don't think that the functions should return a type which depends on
the input (I'd push that as a general rule, but I've probably missed
an obvious counterexample - nevermind, I think it applies here
regardless).
Should there be an option to signal an error if a time is provided
where none is expected, or if no time is provided?

I think that returning a datetime always, with a zero time component
when no time is specified, should be enough. You can use the date()
method of datetime instances to get just the date part if you want it.
But this is something that should be prototyped - real-world use is
far more important here than theoretical considerations.
* Anything special required for I18N? For time zones?

Scary. Do we need to parse "21-janvier-2001"? Only if in a
French-speaking locale?
Generic Input Parsing
=======================

Is a strptime() implementation that returns ``datetime`` types sufficient?

XXX if yes, describe strptime here. Can the existing pure-Python
implementation be easily retargeted?

Not sufficient, but very useful. It effectively covers all of the
fixed-format cases (with a suitable format string). And it does I18N,
I believe (hard to tell in a UK locale...)

Options:

* class methods on the 3 datetime classes. This might be hard,
because datetime is a C extension, and strptime is Python.
* Modify strptime to return a datetime value rather than a
struct_time. But this isn't backward compatible, and so is
probably not on. Shame, as it feels like the right answer.
* Have a new function in the time module. Either just a wrapper
round strptime, or a modified strptime, with strptime changed
into a wrapper round the new function. But a good name is going
to be hard to come up with.
* Add a new parameter to strptime (datetime=True or something).
Ugly, and violates my "functions shouldn't return different
types depending on their arguments" comment above.
* A function in a new module - something like
dateutil.parser.strptime, as a wrapper round time.strptime.
(Excuse the subliminal advertising for Gustavo's module - change
the name if you prefer :))
Output Formats
=======================

Not all input formats need to be supported as output formats, because it's
pretty trivial to get the ``strftime()`` argument right for simple things
such as YYYY/MM/DD. Only complicated formats need to be supported; RFC2822
is currently the only one I can think of.

An *output* format for RFC2822 compliant dates shouldn't be too hard,
surely? Ah, I see what you mean. It's possible, but hard to
*remember*, so it's best to define it somewhere. Good point.
Options:

1) Provide predefined format strings, so you could write this::

import datetime
d = datetime.datetime(...)
print d.strftime(d.RFC2822_FORMAT) # or datetime.RFC2822_FORMAT?

This is what I'd prefer. A module-level constant in a dateutil module
would be fine for me, too.
2) Provide new methods on all the objects::

d = datetime.datetime(...)
print d.rfc822_time()

Seems overkill. And I'd rather just have strftime for all date output
formatting - one way of doing things, and all that.

Paul.
 
P

Paddy McCarthy

Gerrit said:
Posted with permission from the author.
I have some comments on this PEP, see the (coming) followup to this message.

PEP: 321
Title: Date/Time Parsing and Formatting
Abstract
========

Python 2.3 added a number of simple date and time types in the
``datetime`` module. There's no support for parsing strings in various
formats and returning a corresponding instance of one of the types.
This PEP proposes adding a family of predefined parsing function for
several commonly used date and time formats, and a facility for generic
parsing.

The types provided by the ``datetime`` module all have
``.isoformat()`` and ``.ctime()`` methods that return string
representations of a time, and the ``.strftime()`` method can be used
to construct new formats. There are a number of additional
commonly-used formats that would be useful to have as part of the
standard library; this PEP also suggests how to add them.
Unresolved questions:

* Naming convention to use.
* What exception to raise on errors? ValueError, or a specialized exception?
* Should you know what type you're expecting, or should the parsing figure
it out? (e.g. ``parse_iso8601("yyyy-mm-dd")`` returns a ``date`` instance,
but parsing "yyyy-mm-ddThh:mm:ss" returns a ``datetime``.) Should
there be an option to signal an error if a time is provided where
none is expected, or if no time is provided?
* Anything special required for I18N? For time zones?

I am in favour of there being an intelligent 'guess the format'
routine that would be easy to use, but maybe computationally
inefficient, backed up by a computationally efficient routine where
you specify the format. This latter case being split into two sub
items: first where the parsing routine is passed a constant
representing one of the standard formats and another where the parsing
routine is passed a string representing the format.
datetime.datetime.parse("

datetime.datetime.parse("1985-08-13 15:03")
gives: datetime(1985, 8, 13, 13, 5)
datetime.date.parse("1985-08-13 15:03")
gives: date(1985, 8, 13) # You asked for the date, date was found
# first in string and converted
datetime.date.parse("13/08/1985", "%d/%m/%Y")
gives: date(1985, 8, 13)
datetime.datetime.parse("1985-08-13 15:03", datetime.ISO8601)
gives: datetime(1985, 8, 13, 13, 5)

The idea being for the parser to be able to automatically extract a
date from one of the standard formats it knows, or to accept a
strptime type string for unknown formats.

(Apologies to Gerrit for using values from his reply)

Cheers, Paddy.
 
J

John Roth

I am in favour of there being an intelligent 'guess the format'
routine that would be easy to use, but maybe computationally
inefficient, backed up by a computationally efficient routine where
you specify the format.

The trouble with "guess the format" is that it's not possible
to do it correctly in the general case from one sample.
Given enough samples of one consistent format, it's
certainly possible. However, that's a two pass process.

John Roth
 
G

Gerrit Holl

Paul said:
I assume you're aware of Gustavo Niemeyer's DateUtil module
(https://moin.conectiva.com.br/DateUtil)?

I was not, actually. Thanks for the link.
It looks like a very comprehensive library!
The example actually calculates the next time I'm having birthday on
friday the 13th :)
[After reading through this PEP and commenting, I'd say that my
preference (which may not be Gustavo's!) would be to add dateutil to
the standard library, with the following changes/additions:

Sounds like a good idea.
UK format DD/MM/YYYY is worth adding (in my UK-based opinion :)) But
you can get all of these via strptime (wrapped to return a datetime
value).

I don't think so. One you just as well add the German "D.M.YY", and many
others.
I'd go for this option.

It depends on how comprehensive it would be. Gustavo's DateUtil module
does a lot more than this PEP suggests. For an implementation of this
PEP, I think a seperate module is not necessary. For DateUtil, I think
it is.

yours,
Gerrit.
 
A

A.M. Kuchling

I'd go for this option. Actually, I'd support including Gustavo's
dateutil module in the standard library. This PEP then involves adding
a number of additional (specialised) parsers to the dateutil.parser
subpackage.

Actually I think the PEP mostly evaporates, especially if verbal dates
aren't covered. The common cases are then trivial with DateUtil, leaving
only a few cases such as RFC-2822 times.

--amk
 
P

Paul Moore

John Roth said:
The trouble with "guess the format" is that it's not possible
to do it correctly in the general case from one sample.
Given enough samples of one consistent format, it's
certainly possible. However, that's a two pass process.

I think you can do it with a hint or two. The key one is whether in
ambiguous cases, you choose DD/MM or MM/DD. You need a second hint
with 2-digit years, as 01-02-03 is *very* ambiguous (given that
putting the year in the middle is insane, you only need a flag saying
whether the year is at the start or the end).

I'm not sure what other ambiguities you'd need to cater for?

Paul.
 
P

Paul Moore

A.M. Kuchling said:
Actually I think the PEP mostly evaporates, especially if verbal dates
aren't covered. The common cases are then trivial with DateUtil, leaving
only a few cases such as RFC-2822 times.

The PEP is pretty borderline, in any case. Not because the
functionality isn't useful, but because most of it exists somewhere
already. So the PEP is more of the form "now that we have datetime,
consolidating the parsing stuff would be good".

Specifically:

def dt_strptime(s, fmt):
tm = time.strptime(s, fmt)[:6]
return datetime(*tm)

def dt_rfc2822(s):
tm = email.Utils.parsedate(s)[:6]
return datetime(*tm)

This isn't to say that these are immediately obvious (it took me a
while to realise that using the * form of call saved a horribly long
and ugly constructor call)

If this is worth doing, I'd have to say that time.strptime, and
email.Utils.parsedate should get deprecated in favour of the "new
forms". And I'm not sure I can see that being acceptable.

I guess I'm -0 on the PEP as it stands. Incorporate it into a more
general "date/time utilities" module, and I'm +1.

Also, I'm -1 on adding anything to the datetime module itself (this
includes adding more classmethods to the types). The module is clean,
and lean as it stands. Bloating it (particularly under the banner of
"it's more OO to keep the functions as part of the classes") doesn't
appeal to me at all.

Paul.
 
J

John Roth

Paul Moore said:
I think you can do it with a hint or two. The key one is whether in
ambiguous cases, you choose DD/MM or MM/DD. You need a second hint
with 2-digit years, as 01-02-03 is *very* ambiguous (given that
putting the year in the middle is insane, you only need a flag saying
whether the year is at the start or the end).

I'm not sure what other ambiguities you'd need to cater for?

Those are basically it. I've played around with doing "intelligent"
parsing, and I'm absolutely against providing hints. If you're
processing a file with dates all in one format, it will frequently
give the wrong answers for a substantial number of them.
In other words, hints don't give your program the capacity
to learn from experiance. Scanning a number of cases and
noting which fields contained numbers > 31 (or 0), or numbers
greater than 12, does.

In any case, there are three formats, and you can't
always depend on a delimiter to tell you where the year
is for 8 digit inputs. Lots of the inputs I've seen have not
had delimiters. On the other hand, a lot of them have
been guaranteed to be in the late 19th century or later.
That's a hint worth having.

John Roth
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,739
Latest member
Clint8040

Latest Threads

Top