Newbee question about <! and <?

Joseph Kesselman · Sep 24, 2007

Andy said:
I don't understand this para.

Decided I didn't need to understand it. He's coding in Desperate Perl
Hacker mode, writing code that will work for the one testcase he has to
deal with. He's implicitly accepting that he may have to throw this out
and do it all again from scratch, in exchange for hopefully saving some
cycles now. Not my idea of good design for anything but _extremely_
limited (ie, embedded to the point of nearly being hardwired) testcases,
and I don't think anyone who has to ask basic questions about XML syntax
knows enough about their actual requirements yet to make that decision,
but de gustibus.

If he has guessed wrong, his employers/customers will correct him.

Asger Jørgensen · Sep 24, 2007

"Andy Dingley" <[email protected]> skrev i en meddelelse

You can only do that if you also forbid the use of CDATA sections.

I imagine that you can do this in your situation, but clearly record
that you've made this choice, don't just leave it to chance in the
future.

As it turns out CDATA are not allowed in the files that I recieve
so I will never meet them, but if I do I'll just treat them as comments,
and throw them away, since they have no effect on the values that
I need to get from the files.

I don't understand this para.

Sory about my english.
All the files that I recieve comes in UTF-8 encoding, but since the
sofware that I am making is for the local Danish market, I can safely
convert them to the local codepage before parsing them.
That way I don't have to deal with anything, but good old char's.

If the content "Isn't Unicode at all", then I presume you mean that
it's plain old ASCII character set. In this case (ignoring the
possibility of a UTF-8 BOM) then the encoding is also ASCII and is
thus also UTF-8 simultaneously.

So how could you have "national charaters" occurring? (by which I
assume that you mena non-ASCII characters from an ISO-8859-* character
set)

You are right here, I come from Denmark and we have three non-ASCII
characters and in names there are used a lot of the charaters from all
over Europe.

Kind regards
Asger

Asger Jørgensen · Sep 24, 2007

Hi Joseph

Joseph Kesselman said:
Decided I didn't need to understand it.

As You obviously decided not to understand a lot of things, I know my
English isn't perfect at all, but how You can get from:
"I'm just an old C/C++ guy" to "Desperate Perl Hacker" is a little more
then I can understand.

He's coding in Desperate Perl Hacker mode, writing code that will work for
the one testcase he has to deal with. He's implicitly accepting that he
may have to throw this out and do it all again from scratch, in exchange
for hopefully saving some cycles now. Not my idea of good design for
anything but _extremely_ limited (ie, embedded to the point of nearly
being hardwired) testcases,

Being the old guy that I am, I have met a lot of people who thought they
knew better, what was right and what was wrong, what was good and what
was bad, some of them got wiser with age, do You have that chance ?

You really should not make comments about my person without knowing
me at all, I have tried to explain my situation with this project, but You
have obviously not understood me or You have just chosen not to.

For instance empty tags like these:

<TagName></TagName>
and
<TagName/>

are not allowed in the files that I am working with. So I catch these as
errors and report them, which a normal parser wouldn't do.

The parser is done and it is quite fast, as I explained speed was a main
concern.

2.6MB file 23920 nested nodes in 80 milliseconds, and that include
disk reading. (Standard PC 2.1 Ghz, WinXP)

It is tested on about 600 files, no bugs yet

It can also pass xsd and
xsl
but it will only get the TagName, the attributes+value and the contents
between tags. Every thing else is discarded.
The nodes are correctly nested and can be shown in a Windows treeview.
I think I can use it for more then this job.....

and I don't think anyone who has to ask basic questions about XML syntax
knows enough about their actual requirements yet to make that decision,
but de gustibus.

I guess thats a question about IQ and the ability to imagine

Kind regards
Asger

=?ISO-8859-1?Q?J=FCrgen_Kahrs?= · Sep 24, 2007

Asger said:
The parser is done and it is quite fast, as I explained speed was a main
concern.

2.6MB file 23920 nested nodes in 80 milliseconds, and that include
disk reading. (Standard PC 2.1 Ghz, WinXP)

This is really pretty fast. I don't know any faster parser.
But I'm not quite convinced that the disk-reading is
included because the file is so small that is was probably
read off-the-cache.

Asger Jørgensen · Sep 24, 2007

Hi Jürgen

Jürgen Kahrs said:
This is really pretty fast. I don't know any faster parser.
But I'm not quite convinced that the disk-reading is
included because the file is so small that is was probably
read off-the-cache.

You are probably right about that.
I have a 12.1MB 192,880 nodes 730 milliseconds all included.
which would give 156 m.s. for 2.6MB, but it's ok.

Kind regards
Asger

Joseph Kesselman · Sep 24, 2007

Asger said:
"I'm just an old C/C++ guy" to "Desperate Perl Hacker" is a little more
then I can understand.

Sorry for the shorthand. "Desperate perl hacker" is a bit of slang used
in the XML community to refer to folks -- no matter what language
they're working in -- who are just looking for a quickie solution to a
particular XML task rather than one that actually follows the full XML
architecture. As such, it's a good description of the approach you're
taking.

Nothing wrong with it, within its limits, but it does have limits. And
it isn't a criticism of you, your skills, or even of Perl, but an
observation that you're taking an approach which has built-in
limitations. If the benefits you're gaining are worth accepting those
limits, go for it; that's the difference between computer science and
software engineering (and I'm very much an engineer myself).

For instance empty tags like these:
<TagName></TagName>
<TagName/>
are not allowed in the files that I am working with. So I catch these as
errors and report them, which a normal parser wouldn't do.

For what it's worth, most of us would implement that in the application
layer rather than the parser. Same result, slightly different
partitioning, probably about the same performance.

As I said: De gustibus non disputandum est. Your solution isn't the one
I'd take given what you've told us. That doesn't mean it's the wrong
one; it does mean that I know my advice to you isn't going to be a good
fit for what you're trying to do and the way you've chosen to do it.

Customized XML parsers can indeed yield a performance gain; IBM has
demonstrated (and patented) some optimizing technology that
automatically produces a highly tuned parser for a particular set of
expected documents. So if you really do know this is going to be the
critical bottleneck in your system, by all means hack it as necessary.

Asger Jørgensen · Sep 24, 2007

Hi Joseph

Joseph Kesselman said:
Sorry for the shorthand. "Desperate perl hacker" is a bit of slang used in
the XML community to refer to folks -- no matter what language they're
working in

I can see that You didn't mean it the way I understood it..

Sorry for being cranky!

For what it's worth, most of us would implement that in the application
layer rather than the parser. Same result, slightly different
partitioning, probably about the same performance.

That functionality can be turned of , of cource.

As I said: De gustibus non disputandum est. Your solution isn't the one
I'd take given what you've told us. That doesn't mean it's the wrong one;
it does mean that I know my advice to you isn't going to be a good fit for
what you're trying to do and the way you've chosen to do it.

Well.., You obviously also know Your way around the xml file.
When I started 3 weeks ago I didn't know anything about xml except that
it looked like Html, but I couldn't recognice the names in the tags.

Later in the project that I am involved with, I have to write the same kind
of
xml files that I am now reading, converting userdata to this special kind
of xml files, whichs means that I need to know the syntax.

I could of course read a lot about this particular standard, but I tend to
fall asleep when I'm reading, so I desided to write my own parser instead.
Now I have a good feeling about these xml files and I have a fast parser.
To me that is well worth the limitations, that I am very well aware of.
Leaning by doing, thats my way.

Have a nice evening
Kind regards
Asger

Andy Dingley · Sep 25, 2007

He's coding in Desperate Perl
Hacker mode, writing code that will work for the one testcase he has to
deal with.

Are you familiar with the hardcore TDD (Test-Driven Development)
world? The (entirely serious) attitude of, "Write nothing more than
the fix for the next test" and "Test nothing more than the most next
failure". Here's an example

<http://butunclebob.com/ArticleS.UncleBob.TheThreeRulesOfTdd>

Now, can someone please explain to me how you use that most excellent
and highly fashionable development methodology to write a parser?

Andy Dingley · Sep 25, 2007

For instance empty tags like these:

<TagName></TagName>
and
<TagName/>

are not allowed in the files that I am working with.

They're empty elements, not empty tags. In SGML you could define the
second one as an empty tag, but XML doesn't even have such a concept.
Until you understand the distinction, then you're doomed.

I'd also be very cautious about writing a subset XML parser that
couldn't handle empty elements. It's not especially hard and it's a
very common feature to encounter in documents. A subset that excludes
this is certainly possible, it might do what your particular
application needs, but IMHO it's a subsetting too far.

Asger Jørgensen · Sep 25, 2007

"Andy Dingley" <[email protected]> skrev i en meddelelse

<TagName/>

are not allowed in the files that I am working with.

They're empty elements, not empty tags. In SGML you could define the
second one as an empty tag, but XML doesn't even have such a concept.
Until you understand the distinction, then you're doomed.

Uuuuuuuuuh, thats bad, I'm getting all scared, how can I do my work
now...
I AM DOOMED!!

Newbee List question	5	Jan 9, 2014
Question about my projects	3	Jul 23, 2021
Question about WEKA, Python and Python-WEKA-Wrapper3	0	Mar 31, 2022
About as basic "Newbie-Question" that you can get.	3	Sep 4, 2023
Newbee in VHDL	1	Aug 13, 2010
How to go about building a crud app when you are a noob	1	Jan 2, 2023
Trying to creade method .between()	3	Sep 24, 2023
Question about CSS and XML	2	Jun 17, 2008

Newbee question about <! and <?

Joseph Kesselman

Asger Jørgensen

Asger Jørgensen

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Asger Jørgensen

Joseph Kesselman

Asger Jørgensen

Andy Dingley

Andy Dingley

Asger Jørgensen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads