java get xml file line number for the current node

G

Gene Wirchenko

It's a meaningless question. There are no line numbers in XML.

Really, Lew!

There are no line numbers in Java either, but somehow,
compilation messages that give line numbers are rather more useful
than messages without.

Sincerely,

Gene Wirchenko
 
L

Lew

Really, Lew!

There are no line numbers in Java either, but somehow,
compilation messages that give line numbers are rather more useful
than messages without.

Really, Gene!

By the time you have a DOM node in Java, there is no line number information.
Surely you are aware of that.

XML is explicitly a line-number-free format. Surely you are aware of that.

You can have the exact same XML document with various different line breaks in
whatever source you used. Surely you are aware of that.

Yes, you can associate line numbers from a particular expression of an XML
document with nodes in your tree, but that is meaningless because it means
nothing with other equivalent expressions. As surely you are aware.

A DOM (Document Object Model) is a tree-structured representation of an XML
document. There are no line numbers in a tree. Surely you are aware of that.

Where line breaks occur between elements has no significance to XML, as I'm so
sure you are aware.

There may be a line number in a particular source file used to represent an
XML, but there really isn't a line number in the XML. To use your lame
analogy, a Java source file might have line numbers, but the resulting class
file doesn't. It might associate a line number from a particular expression of
that source in a debug map, but that isn't going to help you know where the
code is in an equivalent source module with different line breaks.

My point was, really, Gene, that in a DOM the line numbers have no
significance. XML is frequently canonicalized, and "[a]ccording to the W3C, if
two XML documents have the same canonical form, then the two documents are
logically equivalent within the given application context (except for
limitations regarding a few unusual cases)."
<http://en.wikipedia.org/wiki/Canonical_XML>

Since XML is generally used to move information from one module or component
to another, the notion of line numbers between them becomes utterly meaningless.

Really, Gene.

Perhaps if the OP were to explain how they contemplate using line numbers, and
how they plan to assure the numbers have meaning, and why they have to persist
all the way into the DOM where they really have no relevance, things would
become more clear.
 
J

Jeff Higgins

Okay, I stand corrected.
Well, OK, if you wish. Although my comment wasn't intended as a
correction. I meant to draw attention to the parallelism in the thread
of discussion[1]. The other OP responded with a parser technology and
use case unlike this OP. This should allow you stand corrected free if
you like. :)

[1]+GIYF
 
A

Arne Vajhøj

Really, Lew!

There are no line numbers in Java either, but somehow,
compilation messages that give line numbers are rather more useful
than messages without.

Really, Gene!

By the time you have a DOM node in Java, there is no line number
information. Surely you are aware of that.

XML is explicitly a line-number-free format. Surely you are aware of that.

You can have the exact same XML document with various different line
breaks in whatever source you used. Surely you are aware of that.

Yes, you can associate line numbers from a particular expression of an
XML document with nodes in your tree, but that is meaningless because it
means nothing with other equivalent expressions. As surely you are aware.

A DOM (Document Object Model) is a tree-structured representation of an
XML document. There are no line numbers in a tree. Surely you are aware
of that.

Where line breaks occur between elements has no significance to XML, as
I'm so sure you are aware.

There may be a line number in a particular source file used to represent
an XML, but there really isn't a line number in the XML. To use your
lame analogy, a Java source file might have line numbers, but the
resulting class file doesn't. It might associate a line number from a
particular expression of that source in a debug map, but that isn't
going to help you know where the code is in an equivalent source module
with different line breaks.

My point was, really, Gene, that in a DOM the line numbers have no
significance. XML is frequently canonicalized, and "[a]ccording to the
W3C, if two XML documents have the same canonical form, then the two
documents are logically equivalent within the given application context
(except for limitations regarding a few unusual cases)."
<http://en.wikipedia.org/wiki/Canonical_XML>

Since XML is generally used to move information from one module or
component to another, the notion of line numbers between them becomes
utterly meaningless.

Really, Gene.

Given that OP asked for "line number in the source XML file" which
clearly exist and that a given node parsed from a given file start
at a particular line number, then all the lines above seems
utterly irrelevant.

Arne
 
A

Arne Vajhøj

It's a meaningless question. There are no line numbers in XML.

Well nobody asked for "line numbers in XML" - OP asked for
"line number in the source XML", so ...

Arne
 
A

Arne Vajhøj

Using what parser? W3C DOM? SAX? StAX? JDOM?

If we assume:
- you use W3C DOM
- you only need line number for element nodes
- you really need it
- you are willing to write some ugly hacks
then try something like:

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.LinkedList;
import java.util.Queue;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.DOMImplementation;
import org.w3c.dom.Document;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class ParseWithLineNumbers {
public static Document parseNormal(String fnm) throws
ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
return db.parse(new File(fnm));
}
public static Document parseSpecial(String fnm) throws
ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
return db.parse(new InputSource(new SpecialReader(fnm)));
}
public static void print(Document doc) throws ClassCastException,
ClassNotFoundException, InstantiationException, IllegalAccessException {
DOMImplementation impl =
DOMImplementationRegistry.newInstance().getDOMImplementation("XML 3.0");
DOMImplementationLS feature =
(DOMImplementationLS)impl.getFeature("LS","3.0");
LSSerializer ser = feature.createLSSerializer();
LSOutput output = feature.createLSOutput();
output.setByteStream(System.out);
ser.write(doc, output);
}
public static void main(String[] args) throws Exception {
Document d1 = parseNormal("test.xml");
print(d1);
Document d2 = parseSpecial("test.xml");
print(d2);
}
}

class SpecialReader extends FileReader {
private int lineno;
private boolean inelm;
private boolean eof;
private Queue<Character> extra;
public SpecialReader(String fnm) throws FileNotFoundException {
super(fnm);
lineno = 1;
inelm = false;
eof = false;
extra = new LinkedList<Character>();
}
@Override
public int read(char[] ch, int ix, int n) throws IOException {
if(eof) return -1;
int res = 0;
wloop:
while(res < n) {
int c = extra.isEmpty() ? super.read() : extra.remove();
if(inelm && (c == ' ' || c == '>')) {
for(char xc : (" lineno='" + lineno +"'").toCharArray()) {
extra.add(xc);
}
extra.add((char)c);
c = extra.remove();
inelm = false;
}
switch(c) {
case '\n' :
lineno++;
break;
case '<' :
inelm = true;
break;
case '/':
inelm = false;
break;
case -1 :
eof = true;
break wloop;
default:
/* nothing */
break;
}
ch[ix + res] = (char)c;
res++;
}
return res;
}
}

It adds an attribute lineno to all elment nodes.

Arne
 
G

Gene Wirchenko

[snip]
Given that OP asked for "line number in the source XML file" which
clearly exist and that a given node parsed from a given file start
at a particular line number, then all the lines above seems
utterly irrelevant.

What he said.

Sincerely,

Gene Wirchenko
 
L

Lew

Arne said:
Given that OP asked for "line number in the source XML file" which
clearly exist and that a given node parsed from a given file start
at a particular line number, then all the lines above seems
utterly irrelevant.

All right, fair point.
 
L

Lew

My installation of wc(1) begs to differ.

Aside from the fact that I've conceded the limits of my statement multiple
times before you posted, which apparently you've chosen to ignore in your rush
to seem clever, you completely missed my point.

Are there line numbers in a DOM tree?

No, of course not, and that's what I was trying to say. The line numbers don't
matter. What matters is the structure of the tree. Just like Java source line
numbers don't affect the execution of bytecode. And just like multiple
different versions of source with different line numbers produce the exact
same bytecode, multiple versions of an XML document with different line
numbers will produce the exact same DOM tree. So what is the point of the line
numbers, to the OP?

So far it seems the *only* person not to jump on and explain what the hell
they think they need from line numbers is the OP. That they want to include
them in the DOM nodes is, at best, curious. They've responded to none of our
suggestions or questions. So, "bugbear", why don't you just take a frakkin'
chill pill and wait for the OP to respond?
 
G

Gene Wirchenko

[snip]
Aside from the fact that I've conceded the limits of my statement multiple
times before you posted, which apparently you've chosen to ignore in your rush
to seem clever, you completely missed my point.

Odd use of "conceded" qwhen you continue to make your argument:
Are there line numbers in a DOM tree?

No, of course not, and that's what I was trying to say. The line numbers don't
matter. What matters is the structure of the tree. Just like Java source line

It depends on how you are looking at the data. If you are trying
track down an error, knowing what line the parser was working on when
it threw the error will usually help in locating the error.

[snip]

Sincerely,

Gene Wirchenko
 
L

Lew

Gene said:
Lew wrote:

[snip]
Aside from the fact that I've conceded the limits of my statement multiple
times before you posted, which apparently you've chosen to ignore in your rush
to seem clever, you completely missed my point.

Odd use of "conceded" qwhen you continue to make your argument:

Odd use of "continue to make [my] argument" when I made a different point.
It depends on how you are looking at the data. If you are trying
track down an error, knowing what line the parser was working on when
it threw the error will usually help in locating the error.

That's line numbers in the source, a different question. Once the document has
been parsed, the line number ceases to have relevance. You discuss what's
needed during compilation, a completely separate phase from the result phase,
in this case a DOM tree.

I agree that it's useful to know where in the source there's a problem, up
until but not including when the source has lost relevance.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top