K
Kenneth McDonald
Over the last couple of years, I've built a module called rex that lays
on top of (and from the user's point of view, hides) the re module. rex
offers the following advantages over re.
* Construction of re's is object oriented, and does not require any
knowledge of re syntax.
* rex is _very good_ at combining small re's into larger re's. In fact,
this sort of composition is almost effortless. It greatly simplifies the
creation of complex re's, since you can build smaller re's, build test
code for them, compose them, build test code for the composition,
compose again, etc.
* Reading and understanding re's built with rex is much easier than
understanding primitive re's.
* No re metacharacters are used by rex. for example, a pattern to match
any character except a, b, or c is just
'not CHAR("abc")' rather [^abc]. A pattern to recognize all of a, b,
c and "^" is CHAR("^abc").
* Many useful predefined patterns are defined. For example PAT.float
matches floating point numbers, so a simple pattern to recognize complex
numbers is PAT.float + ALT("+", "-") + PAT.float + "i"
* The match result object returned by rex is much more flexible, and has
many more functions than that returned by re. Commonly performed
operations on match results can often be done with much less (and much
clearer) code than re.
I had hoped to polish this for a 'true' 1.0 release, but it's become
obvious to me that I won't get to do this in the foreseeable future.
Here is the current status of the project.
* rex is already highly functional. I use it all the time, and I have
had very few bugs emerge. The testing code is fairly comprehensive, and
every time I do find a bug, I've put in another test case. I haven't use
pure re's in over a year and a half.
* rex supports almost all re functionality. Backrefs and a couple of the
new re features added in (I think) python 2.4 are not yet supported, but
should be easy to put in.
* There are some other very useful functions I've partially implemented,
but not finished to the point they can be used. This should be quite
easy, I just haven't had the need.
* I'm not entirely sure the API is ideal. Some discussion is needed on this.
* Internal documentation is decent, but not great.
* Internal code is again decent, but not great.
* User's documentation is not bad, but somewhat out of date. One of the
problems here is that rex makes use of a lot of constants, which cannot
be documented using Python's docstrings. In addition, rex is complex
enough that it needs an external manual or good html ref, and none of
the multiple attempts at pure Python doc systems do this well for
everything that is needed. Now that d'oxygen works with Python, I would
like to redo all of the documentation in D'oxygen.
* Everything is in a single file. This should be split up.
I would like to avoid putting this up on sourceforge as I think it would
do much better at a site aimed specifically at Python development.
Given the above, are people interested in seeing this as a project they
might be interested in working on? And where should I created the project?
Thanks,
Ken
on top of (and from the user's point of view, hides) the re module. rex
offers the following advantages over re.
* Construction of re's is object oriented, and does not require any
knowledge of re syntax.
* rex is _very good_ at combining small re's into larger re's. In fact,
this sort of composition is almost effortless. It greatly simplifies the
creation of complex re's, since you can build smaller re's, build test
code for them, compose them, build test code for the composition,
compose again, etc.
* Reading and understanding re's built with rex is much easier than
understanding primitive re's.
* No re metacharacters are used by rex. for example, a pattern to match
any character except a, b, or c is just
'not CHAR("abc")' rather [^abc]. A pattern to recognize all of a, b,
c and "^" is CHAR("^abc").
* Many useful predefined patterns are defined. For example PAT.float
matches floating point numbers, so a simple pattern to recognize complex
numbers is PAT.float + ALT("+", "-") + PAT.float + "i"
* The match result object returned by rex is much more flexible, and has
many more functions than that returned by re. Commonly performed
operations on match results can often be done with much less (and much
clearer) code than re.
I had hoped to polish this for a 'true' 1.0 release, but it's become
obvious to me that I won't get to do this in the foreseeable future.
Here is the current status of the project.
* rex is already highly functional. I use it all the time, and I have
had very few bugs emerge. The testing code is fairly comprehensive, and
every time I do find a bug, I've put in another test case. I haven't use
pure re's in over a year and a half.
* rex supports almost all re functionality. Backrefs and a couple of the
new re features added in (I think) python 2.4 are not yet supported, but
should be easy to put in.
* There are some other very useful functions I've partially implemented,
but not finished to the point they can be used. This should be quite
easy, I just haven't had the need.
* I'm not entirely sure the API is ideal. Some discussion is needed on this.
* Internal documentation is decent, but not great.
* Internal code is again decent, but not great.
* User's documentation is not bad, but somewhat out of date. One of the
problems here is that rex makes use of a lot of constants, which cannot
be documented using Python's docstrings. In addition, rex is complex
enough that it needs an external manual or good html ref, and none of
the multiple attempts at pure Python doc systems do this well for
everything that is needed. Now that d'oxygen works with Python, I would
like to redo all of the documentation in D'oxygen.
* Everything is in a single file. This should be split up.
I would like to avoid putting this up on sourceforge as I think it would
do much better at a site aimed specifically at Python development.
Given the above, are people interested in seeing this as a project they
might be interested in working on? And where should I created the project?
Thanks,
Ken