Draft PEP on RSON configuration file format

P

Philip Semanchuk

Yaml sucks, but seems to have gotten some traction regardless.
Therefore the Python principle of "there should be one and only one
obvious way to do it" says: don't try to replace the existing thing if
your new thing is only slightly better. Just deal with the existing
thing's imperfections or make improvements to it. If you can make a
really powerful case that your new thing is 1000x better than the old
thing, that's different, but I don't think we're seeing that here.

Also, XML is used for pretty much everything in the Java world. It
sucks too, but it is highly standardized, it observably gets the job
done, there are tons of structure editors for it, etc. Frankly
I'd rather have stayed with it than deal with Yaml.

There are too many of these damn formats. We should ban all but one
of
them (I don't much care which one). And making even more of them is
not
the answer.


I dunno, times change, needs change. We must invent new tools, be
those computer languages or data formats. Otherwise we'd still be
programming in COBOL and writing fixed-length records to 12 inch
floppies.*

If Mr. Maupin was a giant corporation trying to shove a proprietary
format down our collective throats, I might object to RSON. But he's
not. He appears willing for it live or die on its merits, so I say
good luck to him. I don't want or need it, but someone else might.

Cheers
Philip


* You had floppies? Bleddy luxury! We wrote our data on wood pulp we'd
chewed ourselves and dried into paper, using drops of our own blood to
represent 1s and 0s.
 
P

Patrick Maupin

How big are the files that you want to parse with it?  Sheesh.

Tiny, but over and over. The rst2pdf testsuite can generate
approximately 160 PDFs, totalling around 2.5 MB, in around 22 seconds
on one of my machines. But if I replace the JSON parser with a YAML
parser, that goes up to 55 seconds. Wait, maybe it's because JSON is
optimized in C! Nope, using JSON but disabling the C scanner only
takes it to 22.3 seconds...
So write a new one that parses the same syntax, but cleaner and faster.

But there are already several parsers for YAML, and none of them
agree! The syntax definition is a mess. The thing's been in
development for 10 years now, and there is no one true way to do it.
Seriously, YAML overreaches for what I want.
I do it all the time; it's a bit dreary but not difficult.  And there is
absolutely no way to get anything done in this field anymore without
dealing with XML from time to time.  So given that we all have some
experience using it, it's sensible to stick with it.

But people "in this field" are not really my target audience. Well, I
mean people in this field are the target audience for the library, but
not for the writing of the actual text files.
ReST is another abomination that should never have gotten off the
ground.  It is one of the reasons I react so negatively to your
config format proposal.  It just sounds like more of the same.

Well, that clarifies a lot. I guess we'll just have to agree to
disagree :)

Regards,
Pat
 
E

Emile van Sebille

On 3/1/2010 1:02 PM Philip Semanchuk said...
* You had floppies? Bleddy luxury! We wrote our data on wood pulp we'd
chewed ourselves and dried into paper, using drops of our own blood to
represent 1s and 0s.

You had left-over blood?!!

Emile :)
 
P

Patrick Maupin

Psst.  That you're allowed to present the idea that you think is good
doesn't mean that other people aren't allowed to respond and point out
that in their opinion it's not such a good idea.  You don't own this or
any other thread.

Absolutely, but I still do (and will always) express a clear
preference for opinions that have at least a modicum of reasoning
behind them.

Regards,
Pat
 
P

Patrick Maupin

Patrick Maupin wrote:
This not only seriously stretching the meaning of the term "superset"
(as Python is most definitely not even remotely a superset of JSON), but

Well, you are entitled to that opinion, but seriously, if I take valid
JSON, replace unquoted true with True, unquoted false with False,
replace unquoted null with None, and take the quoted strings and
replace occurrences of \uXXXX with the appropriate unicode, then I do,
in fact, have valid Python. But don't take my word for it -- try it
out!

But if you really want to be pedantic about it, JavaScript (rather
than Python) is, in fact a superset of JSON, and, despite the
disparagement JavaScript receives, in my opinion, it is possible to
write much better looking JavaScript than JSON for many tasks.

YAML, also, is a superset of JSON, and IMO, it is possible to write
much better looking YAML than JSON.
still doesn't address the question.  Is RSON and _actual_ superset of
JSON, or are you just misusing the term there, as well?

Yes, the RSON definition, in fact, a superset of JSON, just like the
YAML definition. But RSON is a much smaller grammar than YAML.

 If it is, then
your rationale for not using JSON makes no sense if you're making a new
format that's merely a superset of it.  Obviously JSON can't be that
unreadable if you're _extending_ it to make your own "more readable"
format.  If JSON is unreadable, so must be RSON.

Well, we'll have to agree to disagree here. Bearing in mind that the
definition of "unreadable" depends on the target application and user,
obviously, it will be *possible* to write unreadable RSON, just as it
is *possible* to write unreadable JavaScript or Python or YAML, but it
will be *possible* to write better looking RSON than is possible to
achieve with JSON, just as it is *possible* to write better looking
JavaScript or YAML or Python than it is *possible* to achieve with
pure JSON.

Best regards,
Pat
 
K

Kirill Simonov

Erik said:
Agreed. Even YAML's acronym indicates that it is already a bridge too
far; we don't need more.

Note that YA in the acronym doesn't mean Yet Another, YAML = YAML Ain't
Markup Language.


Thanks,
Kirill
 
T

Terry Reedy

Well, you are entitled to that opinion, but seriously, if I take valid
JSON, replace unquoted true with True, unquoted false with False,
replace unquoted null with None, and take the quoted strings and
replace occurrences of \uXXXX with the appropriate unicode, then I do,
in fact, have valid Python. But don't take my word for it -- try it
out!

To me this is so strained that I do not see why why you are arguing the
point. So what? The resulting Python 'program' will be equivalent, I
believe, to 'pass'. Ie, construct objects and then discard them with no
computation or output. I suggest dropping this red-herring distraction.
But if you really want to be pedantic about it, JavaScript (rather
than Python) is, in fact a superset of JSON, and, despite the
disparagement JavaScript receives, in my opinion, it is possible to
write much better looking JavaScript than JSON for many tasks.

YAML, also, is a superset of JSON, and IMO, it is possible to write
much better looking YAML than JSON.

I agree that adding a bit of syntax to something can sometimes make it
easier to write readable text. This is hardly a new idea and should not
be controversial. That is why people developed 'macro assembley' as a
superset of 'assembly' languages and why, for instance, Python added the
'with' statement.

I read your proposal. I have not needed config files and have never
written json or yaml and so cannot really evaluate your proposal for
something 'in between'. It does seem plausible that it could be useful.

While using the PEP format is great, calling your currently vaperware
module proposal a 'standards track' PEP is somewhat off-putting and
confusing. If Guido rejected it, would you simply drop the idea? If not,
if you would continue it as a third-party module that would eventually
be released and announced on PyPI, I seriously suggest renaming it to
what it is.

Terry Jan Reedy
 
A

Arnaud Delobelle

Erik Max Francis said:
This not only seriously stretching the meaning of the term "superset"
(as Python is most definitely not even remotely a superset of JSON),
but still doesn't address the question. Is RSON and _actual_ superset
of JSON, or are you just misusing the term there, as well? If it is,
then your rationale for not using JSON makes no sense if you're making
a new format that's merely a superset of it. Obviously JSON can't be
that unreadable if you're _extending_ it to make your own "more
readable" format. If JSON is unreadable, so must be RSON.

Your argument is utterly speculative as you are making clear you haven't
read the OP's proposal.
 
D

Daniel Fetchinson

But you are working on a solution in search of a problem. The really
OK, but I am a bit unclear on what you and/or Paul are claiming. It
could be one of a number of things. For example:

- There is a preexisting file format suitable for my needs, so I
should not invent another one.

I suspect this to be true, if we mean the same thing by "configuration
file format". Especially if RSON will be a superset of JSON.
- If I invent a file format suitable for my needs, it couldn't
possibly be general enough for anybody else.

Quite possibly, the reason is that the already existing file formats
have an ecosystem around them that make them attractive. Your file
format will have to cope with this barrier to attract new users which
I think will be very difficult, given the wide range of already
existing formats, covering just about any use case.
- Even if it was general enough for somebody else, there would only be
two of them.

See above.
I've been known to waste time (or be accused of wasting time) on
various endeavors, but I like to know exactly *why* it is perceived to
be a waste.

Don't get me wrong, I also waste lot of time on hobby/fun/educational
projects ("waste" in this case is only meant as "useless for others",
not "useless for me") because it's, well, hobby and fun and
educational :) It's just good to know if a given project is in this
category or outside.

Cheers,
Daniel
 
M

mk

Yaml sucks, but seems to have gotten some traction regardless.
Therefore the Python principle of "there should be one and only one
obvious way to do it" says: don't try to replace the existing thing if
your new thing is only slightly better.

With all due respect, Paul, and with thanks for all the help you've
given me, I have to disagree here: this is a really, really complicated
matter and I think there is a case even for making things slightly better.

I think this is a matter of "investment period", so to speak: is it
short or long? In short term, it absolutely makes no sense to produce
even slight improvement.

But in the long run it will almost certainly pay off to switch to smth
even somewhat better implementation (say, imaginary "20%" if you get my
drift): suppose we stay with sucky format for 10 years. Wouldn't it make
sense to implement a new one and be "in red" in terms of effort expended
versus saved for 3 years, but then be "in black" for the following 7 years?
Just deal with the existing
thing's imperfections or make improvements to it.

OK, but how? How would you make up e.g. for JSON's lack of comments?
Producing accompanying ".json-comment" format and writing libraries that
parse the comments and interleave them with JSON file for producing
human-readable commented output?

I think the effort required by all parties, both developers and users,
before they produced smth like this and learned to use this widely and
comprehensively, for this manner of improvement would be so high that it
would be actually cheaper to dump the thing and develop smth new that
has built-in support for comments.

If you mean some other method of improving existing things like formats,
well let's hear it; but I for one don't see any worth doing to
significant extent really, other than dumping the thing or producing
next, improved version at least.

Improvement: other than making basic tools like parsing libraries
editors, what improvements can you realistically make? And such
improvements in and of themselves are not very expensive: my GPL
Notepad++ has syntax highlighting for YAML (on top of gazillion other
languages), and there are parsing libraries for it. So where's this
terrible cost to it?

OTOH, if YAML produces net benefit for as few as, say, 200 people in
real world, the effort to make it has been well worth it.
If you can make a
really powerful case that your new thing is 1000x better than the old
thing, that's different, but I don't think we're seeing that here.

Perhaps in ideal world we would be able to develop smth good or at least
decent without long series of abominations preceding it.

But I don't think we live in such world and I don't think it's possible
to produce a decent format (or language) without decades of having to
deal with abominations first. We learn as we go along, there's no way
but to produce whatever works best at the moment, learning from it,
dumping it and then doing smth better. I don't think e.g. Python could
be produced without C, COBOL and Fortran preceding it: it's important
not only to know how to do it, but also how (and why) not to do it, and
learning that can't be done without producing some sort of abomination.

I'd argue that abominations are inevitable price of progress and evolution.
Also, XML is used for pretty much everything in the Java world. It
sucks too, but it is highly standardized, it observably gets the job
done, there are tons of structure editors for it, etc. Frankly
I'd rather have stayed with it than deal with Yaml.

http://myarch.com/why-xml-is-bad-for-humans

http://www.ibm.com/developerworks/xml/library/x-sbxml.html

Such reasons alone are enough to consider dumping XML for smth better.

Today I had to hand-edit XML config files for two programs (no other
option). The files were rather large, complicated and doing it frankly
sucked.

I also have to maintain a few applications that internally use XML as
data format: while they are tolerable, they still leave smth to be
desired, as those applications are really slow for larger datasets,
their users systematically make errors (like forgetting to attach DTD
before editing), and working across various versions of Windows is still
not perfect.

If somebody out there invents smth that is better than XML "only" by
half, I'm all for it.
There are too many of these damn formats. We should ban all but one of
them (I don't much care which one). And making even more of them is not
the answer.

I think this is a situation of "beware of what you wish for". Suppose
those alternative formats disappeared and you'd have no tools, however
imperfect, to use them: I think that in many, many contexts deficiencies
of "the" format would be so painful that most developers would just
write their own "private" ones, and everyone would be even worse off
than they are now.

I wouldn't worry too much about "stretching scarce resources thin"
either: abominations completely unfit to live waste little in the way of
resources, and we learn a deal off them too.

There are demonstrable benefits to this too: I for one am happy that
ReST is available for me and I don't have to learn a behemoth such as
DocBook to write documentation.

(imaginary dialog:

Paul: "eat your greens and learn your DocBook!"

Me: "but I don't like it and there's too much of it..."

;-)

OK me off the soapbox.

Regards,
mk
 
S

Steve Howell

With all due respect, Paul, and with thanks for all the help you've
given me, I have to disagree here: this is a really, really complicated
matter and I think there is a case even for making things slightly better..

I think this is a matter of "investment period", so to speak: is it
short or long? In short term, it absolutely makes no sense to produce
even slight improvement.

But in the long run it will almost certainly pay off to switch to smth
even somewhat better implementation (say, imaginary "20%" if you get my
drift): suppose we stay with sucky format for 10 years. Wouldn't it make
sense to implement a new one and be "in red" in terms of effort expended
versus saved for 3 years, but then be "in black" for the following 7 years?

I think ten years is about the right horizon to be looking down.
Anything further than ten years is probably so speculative as to have
lots of diminishing returns, although I admire people who have really
BIG ideas and are starting on them now. (The particular context of
this thread doesn't lead to big ideas, unless I just lack
imagination.)

I think it's wrong, though, only to look five years ahead. If the
only goal was to make software development easier in 2015, then I'd
say, by all means, let's pick the current best-of-breed tools and
simply perfect them as much as we can. This is a worthwhile goal
regardless of your ultimate time horizon, and that effort tends to
happen anyway, since so many people rightly live in the here and
now.

Somewhere in the 2020s, though, I predict that a lot of technologies
are either going to finally die off, or at least be restricted to the
niches that they serve well. Take Java, for example. I think it will
be still be used, and people will still even be writing new programs
in it, but it will be rightly scorned in a lot of places where it is
now embraced. Some of this won't actually be due to technological
advances, but just changes in perception. For example, I predict lots
of programs that people now write in Java will be written in Python,
even if the core language of Python remains fairly stable.

Beyond just changing mindsets, though, I think evolution is
inevitable. Some subset of Python tools will almost certainly develop
features that are more friendly to the Java mindset, but work in
Python, and this will help move folks from Java to Python. I also
think that Java will be supplanted for lots of use cases by some
languages invented after 2000. Maybe Scala will become more
mainstream. Maybe Go will turn into more of an enterprise-y
platform. Who knows?

With regard to XML, I think at a bare minimum, folks will stop using
XML for use cases that YAML, JSON, and maybe even RSON serve better
today. I bet that at least one of YAML and JSON survives in some
form, and my money is on JSON, but I bet there will also be some new
competing formats. I also think that we'll still have multiple
formats that are only marginally better than each other for general
use cases, but which people will still choose for specific reasons.
Developers *love* good tools almost as much as they hate confusion--
there will always be tension between having too many choices and not
enough.

Going back to Paul's statement, I agree that "there should be one and
only one obvious way to do it" in Python, but I don't think the
philosophy applies to the greater ecosystem of software development.
In our generation I think we have the live with the confusion and
chaos that comes from a plethora of tools, and that's just part of
progress. Ironically, I think the tools that survive will be very
focused in their own right; it's just that we'll still have many to
choose from.

Going back to XML, I found myself using it last night for a completely
inappropriate use case. It so happens that it would have been about
100% better if it had simply been written in JSON, so there was no
compelling need for yet another alternative. But if the inventers of
JSON had been complacent about XML, we wouldn't even have that as an
option. And, of course, there is nothing radical at all about JSON--I
am pretty sure it was just a common sense realization about the
inadequacies of current technologies that led to its development, and
I'm sure early versions of it were pretty raw. Without having looked
into RSON, I am sure it's the same mindset that drives its invention--
current tools exist that can get the some done, but we can do better.
Whether RSON is really an improvement or not is an orthogonal issue to
whether we should strive for improvement.
 
M

mk

Steve said:
Somewhere in the 2020s, though, I predict that a lot of technologies
are either going to finally die off, or at least be restricted to the
niches that they serve well. Take Java, for example. I think it will
be still be used, and people will still even be writing new programs
in it, but it will be rightly scorned in a lot of places where it is
now embraced. Some of this won't actually be due to technological
advances, but just changes in perception. For example, I predict lots
of programs that people now write in Java will be written in Python,
even if the core language of Python remains fairly stable.

A friend of mine, and a good Java programmer, says caustically: "Java is
COBOL of the future".

Where I work we develop a huge application in Websphere (IBM Java-based
application server). The problems with legacy code made project manager
joke "perhaps we should rewrite this in Python". Perhaps some day it
will not be a joke anymore?

Personally, I chose to stay away from Java, even though it would
temporarily help me: the amount of time & effort it takes to master the
necessary toolset is *huge*, and my scarce time is better spent
elsewhere, on more productive tools, and I really, really do not want
lots of my limited time to go down the drain in a few years.

Take EJB for example: even its creators realized they've overdone it
with EJB 2 and simplified somewhat EJB 3 and switched to annotations
instead of gazillion XML formats. But still I dread the thought of
having to spend so much time learning it before I can do a few lines of
productive work in it.

In a way it's horrible: all this gargantuan effort in a few years will
be completely wasted, down the drain. All those developer hours and
dollars wasted.. In a way, C wasn't as bad as Java has been: at least
many of C libs, with new bindings, still live on and do work.
Going back to Paul's statement, I agree that "there should be one and
only one obvious way to do it" in Python, but I don't think the
philosophy applies to the greater ecosystem of software development.

+1

Note that when it comes to bigger tools or frameworks, even in the world
of Python things are not "one obvious way", e.g. Django for quick and
dirty and small apps, and Pylons for big and powerful apps. There may be
"one obvious way to do it" in a very, very narrow context, but when
contexts widen, like, say: "what is web framework I should choose?" the
answers diverge, because answer has to be variation of "it depends on
your situation".
Whether RSON is really an improvement or not is an orthogonal issue to
whether we should strive for improvement.

+1

Regards,
mk
 
P

Paul Rubin

mk said:
OK, but how? How would you make up e.g. for JSON's lack of comments?

Modify the JSON standard so that "JSON 2.0" allows comments.
OTOH, if YAML produces net benefit for as few as, say, 200 people in
real world, the effort to make it has been well worth it.

Not if 200,000 other people have to deal with it but don't receive the
benefit.

You might like this one too:

http://www.schnada.de/grapt/eriknaggum-xmlrant.html
I also have to maintain a few applications that internally use XML as
data format: while they are tolerable, they still leave smth to be
desired, as those applications are really slow for larger datasets,

I thought we were talking about configuration files, not "larger datasets".
There are demonstrable benefits to this too: I for one am happy that
ReST is available for me and I don't have to learn a behemoth such as
DocBook to write documentation.

DocBook is so far off my radar I'd have never thought of it. I just now
learned that it's not Windows-only. There is already POD, Pydoc,
Texinfo, a billion or so flavors of wiki markup, vanilla LaTeX, and most
straightforwardly of all, plain old ascii. ReST was another solution in
search of a problem.
 
S

Steve Howell

Modify the JSON standard so that "JSON 2.0" allows comments.

If you don't control the JSON standard, providing a compelling
alternative to JSON might be the best way to force JSON to accomodate
a wider audience. It might just be that the people behind JSON
deliberately avoid comments, because it's not in the scope of the
problem they are trying to solve. Hence another need for
alternatives.

Not if 200,000 other people have to deal with it but don't receive the
benefit.

How many hundreds of thousands of people have had to deal with XML
without receiving its benefits? Do well-established standards get an
exemption from the rule that software is not allowed to annoy non-
willing users of it?
 
G

Gregory Ewing

Paul said:
ReST was another solution in search of a problem.

I think the basic idea behind ReST is quite good, i.e.
understanding as markup various typographical conventions
that make sense in plain text, such as underlined
headings, bullets, numbered paragraphs.

Unfortunately it went overboard with a slew of cryptic
codes for footnotes, hyperlinks, etc. that nobody would
naturally think to use in a plain text document.
 
S

Steve Howell

I think the basic idea behind ReST is quite good, i.e.
understanding as markup various typographical conventions
that make sense in plain text, such as underlined
headings, bullets, numbered paragraphs.

Unfortunately it went overboard with a slew of cryptic
codes for footnotes, hyperlinks, etc. that nobody would
naturally think to use in a plain text document.

The same thing happened with YAML to a certain extent, from my
perspective. YAML was never meant to be an exact alternative to XML,
but its basic premise was sound--use indentation for more elegant
syntax, and model its semantics more toward how data actually gets
used internally by scripting languages in particular. But there is
also some featuritis with YAML that makes it hard to digest and
needlessly cumbersome to implement.

JSON is not perfect by any means, but I consider it to be a more
useful descendant of XML and YAML, even if it did not directly borrow
from either. (YAML and JSON are certainly similar, but that could be
a coincidental convergence.) Even if YAML itself has not been a
resounding success, it set the bar to a certain degree.
 
P

Paul Rubin

Steve Howell said:
If you don't control the JSON standard, providing a compelling
alternative to JSON might be the best way to force JSON to accomodate
a wider audience.

Ehh, either the JSON standardizers care about this issue or else they
don't. JSON (as currently defined) is a machine-to-machine
serialization format and just isn't that good a choice for handwritten
files. Adding a comment specification is a small perturbation that
might be accepted into the standard, but a big departure like RSON is a
whole nother creature.
How many hundreds of thousands of people have had to deal with XML
without receiving its benefits? Do well-established standards get an
exemption from the rule that software is not allowed to annoy non-
willing users of it?

We already have to deal with XML. So using XML for config files doesn't
require anyone to deal with any lousy formats that they didn't have to
deal with before. So the basic answer to your question about
well-established standards is yes: one annoying but standardized format
is better than multiple annoying unstandardized ones.
 
S

Steven D'Aprano

I think the basic idea behind ReST is quite good, i.e. understanding as
markup various typographical conventions that make sense in plain text,
such as underlined headings, bullets, numbered paragraphs.

Unfortunately it went overboard with a slew of cryptic codes for
footnotes, hyperlinks, etc. that nobody would naturally think to use in
a plain text document.


I use footnotes all the time[1] in plain text documents and emails. I
don't think there's anything bizarre about it at all.






[1] When I say "all the time", I actually mean occasionally.
 
J

John Bokma

Steven D'Aprano said:
I think the basic idea behind ReST is quite good, i.e. understanding as
markup various typographical conventions that make sense in plain text,
such as underlined headings, bullets, numbered paragraphs.

Unfortunately it went overboard with a slew of cryptic codes for
footnotes, hyperlinks, etc. that nobody would naturally think to use in
a plain text document.

I use footnotes all the time[1] in plain text documents and emails. I
don't think there's anything bizarre about it at all.


http://docutils.sourceforge.net/docs/user/rst/quickref.html#footnotes [#]_.

... [#] the keyword is ReST.
 
S

Steve Howell

Ehh, either the JSON standardizers care about this issue or else they
don't.  JSON (as currently defined) is a machine-to-machine
serialization format and just isn't that good a choice for handwritten
files.  Adding a comment specification is a small perturbation that
might be accepted into the standard, but a big departure like RSON is a
whole nother creature.


We already have to deal with XML.  So using XML for config files doesn't
require anyone to deal with any lousy formats that they didn't have to
deal with before.  So the basic answer to your question about
well-established standards is yes: one annoying but standardized format
is better than multiple annoying unstandardized ones.

<question type="rhetorical">
Does this mean we should stick with XML until the end of time?
</question>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top