XML parsing with Java

V

vk02720

Egad, no! I should have reiterated Arne's caveat. OTOH, the result is
not entirely unexpected and parallels Arne's (considerable) experience.






Yes, this is confounding; I was sticking with the developer version
numbers:




Indeed, such numbers are almost meaningless, yet strangely fascinating.
Cf <http://www.google.com/intl/en/press/zeitgeist2008/index.html>

Here's a very rough measure of features/version from skimming the Java
1.5 API documentation (J2SE 5.0):

<code>
#!/bin/sh
DIR=/Developer/Documentation/Java/docs
ECHO=/bin/echo
for ((i=0; i<=6; i++)) ; do
  ${ECHO} -n "Since 1.${i}: "
  grep -R "<DD>1.${i}" $DIR/* | wc -l
done
</code>

<console>
$ ./since.sh
Since 1.0:       26
Since 1.1:       89
Since 1.2:      965
Since 1.3:      550
Since 1.4:     1384
Since 1.5:     1321
Since 1.6:        0
</console>

A lot has to do with 1.5 being adopted very late by some popular app
server vendors due to which the customers never adopted 1.5 (even for
their non app server needs). Many companies are also slow to adopt
because of plain ignorance on part of some decision makers who had
"significant investments" building the system and there is "no budget
to take chances and move to newer version" even though risk may not be
that high. Even recompiling and taking advantage of improvements in
JVM for higher releases is still worth it but it is tough to convince
some people.
The distribution % almost looks correct - atleast in the sense that
nearly 50% are using 1.4(or less).
What would be interesting to know is that even for 1.4 how many users
are really using 1.4 new features like regular expressions, NIO,
exception chaining etc, JAXP etc. Some places even the developers are
either not aware or care about newer features.
 
A

Arne Vajhøj

John said:
Google - millions of hits:

java 1.1 - 22.8
java 1.2 - 16.1
java 1.3 - 12.0
java 1.4 - 12.6
java 1.5 - 38.1
java 1.6 - 10.2
java 1.7 - 5.2

Bimodal!?

Stuff posted 10 years ago is not a good indication of
current usage. And the measurement will favor versions
with many new changes.

But it is still more objective than my guess.

Arne
 
A

Arne Vajhøj

A lot has to do with 1.5 being adopted very late by some popular app
server vendors due to which the customers never adopted 1.5 (even for
their non app server needs). Many companies are also slow to adopt
because of plain ignorance on part of some decision makers who had
"significant investments" building the system and there is "no budget
to take chances and move to newer version" even though risk may not be
that high. Even recompiling and taking advantage of improvements in
JVM for higher releases is still worth it but it is tough to convince
some people.

The testing/certification cost can be huge.
What would be interesting to know is that even for 1.4 how many users
are really using 1.4 new features like regular expressions, NIO,
exception chaining etc, JAXP etc. Some places even the developers are
either not aware or care about newer features.

I think today the 1.4 new features are widely used.

When the first apps were deployed on 1.4 much less features
were used.

But it was a time where the number of Java EE apps grew
fast, so it was picked for a lot of new apps.

Arne
 
M

Mike Schilling

Lew said:
Java 1.4 has been completely retired for a few weeks now, and
obsolescent for quite some time.


It is Xerces.

I thought we'd been through this recently? In 1.4, the default parser is
(ack! pthwt!) Crimson. In 1.5, the default becomes Xerces.

Yes. JAXP will find the parser definition in the classpath and use it.
 
L

Lew

Mike said:
I thought we'd been through this recently? In 1.4, the default parser is
(ack! pthwt!) Crimson. In 1.5, the default becomes Xerces.

I thought for sure time would have covered my embarrassment at having been
mistaken about this. Now my face is crimson again, never, it seems, to get
surcease.
 
J

John B. Matthews

[informative discussion]

From Old French sursis, past participle of Old French surseoir ‘refrain,
delay,’ from Latin supersedere ‘desist’ (see supersede). In an odd
syzygy, there's that word again.
 
D

Daniel Pitts

John said:
[informative discussion]

From Old French sursis, past participle of Old French surseoir ‘refrain,
delay,’ from Latin supersedere ‘desist’ (see supersede). In an odd
syzygy, there's that word again.
Please check your encoding, That looks like garbage to me.
 
M

Mike Schilling

Lew said:
I thought for sure time would have covered my embarrassment at
having
been mistaken about this. Now my face is crimson again, never, it
seems, to get surcease.

Nicely done. Sorry about replying to the ancient post; for some
reason, it showed up as unread.
 
L

Lew

Daniel said:
John said:
[informative discussion]

From Old French sursis, past participle of Old French surseoir
‘refrain, delay,’ from Latin supersedere ‘desist’ (see
supersede). In an odd syzygy, there's that word again.
Please check your encoding, That looks like garbage to me.

Came through clearly here. You must be forcing a different encoding from John's.

"Here" being Thunderbird picking up news from news.albasani.net. I can't see
what encoding John used, which must mean that T-bird assumed UTF-8.

Your message was encoded in windows-1252, which is notoriously incomplete.
 
L

Lew

Peter said:
Why it is that Thunderbird 2.0.0.19 doesn't interpret the post correctly
but 2.0.0.18 does I can't say. Maybe there's some user setting you have
set but Daniel doesn't that forces an encoding for posts that arrive
without encoding specified.

"Account settings" (a.k.a. "properties") / "Server settings" / "Default
Character Encoding:" "Unicode (UTF-8)"
 
J

John B. Matthews

[QUOTE="Lew said:
Why it is that Thunderbird 2.0.0.19 doesn't interpret the post correctly
but 2.0.0.18 does I can't say. Maybe there's some user setting you have
set but Daniel doesn't that forces an encoding for posts that arrive
without encoding specified.

"Account settings" (a.k.a. "properties") / "Server settings" / "Default
Character Encoding:" "Unicode (UTF-8)"[/QUOTE]

Ah, thank you Daniel, Lew & Peter. I appreciate your feedback. My nntp
client indicated that it was using UTF-8, but it wasn't adding an
explicit Content-Type:

Content-Type: text/plain; charset=UTF-8; format=flowed

‘single-quote’
“double-quoteâ€
‹single-angle›
«double-angle»
 
J

John B. Matthews

"Peter Duniho said:
[...]
Content-Type: text/plain; charset=UTF-8; format=flowed

‘single-quote’
“double-quoteâ€
‹single-angle›
«double-angle»

Much better. :)

Of course, it begs the question, why not just use the "normal" ASCII
characters, rather than something that could create a character encoding
issue?

Ordinarily, I wouldn't; they slipped in with a cut and paste. Now to
convince my new reader to decode my own posts correctly outside of
alt.test!
 
R

Roedy Green

see http://mindprod.com/jgloss/xml.html
for an overview of the features. You can use what is build into Java
or a third party package.
--
Roedy Green Canadian Mind Products
http://mindprod.com

We are almost certainly going to miss our [global warming] deadline.
We cannot get the 10 lost years back, and by the time a new global agreement to
replace the Kyoto accord is negotiated and put into effect, there will probably
not be enough time left to stop the warming short of the point where we must not
go. ~ Gwynne Dyer
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,990
Messages
2,570,211
Members
46,796
Latest member
SteveBreed

Latest Threads

Top