Newbie q: Parsing vendor-data into uniform XML

C

Casper B

If I have 3-4 specific ASCII/non-XML vendor-specific data-sheets,
forming tables of simple types (int, float, string) with space as
delimiter. The data is simple (from a grammar point of view) yet not as
simple as a 2D-array/recordset. Example:

1234567894 00000100 50 10400
01330002 003 0000213337 10400
01330025 002 0000066887 10400
01330027 000 0000033841 10400
01330029 001 0000061182 10400
01330030 004 0000047411 10400
9999999998 0001165422- 10400
1234567894 00000100 50 10400
01330003 001 0000033671- 10400
01330004 001 0000116653- 10400
....looped data!

Normally I would parse this and do transformation using a
Compiler-Compiler. This is however, a very static approach (new format
would require recompilation etc) and certainly not suited for database
integration.

Can I somehow use XML or any features hereof (DTD, Xpath...) to
parse/validate vendor-specific ASCII/non-XML data-sheets and transform
this into a standard XML format.

The goal is of course, to be able to receive vendor-data in a new
propriatary ASCII format and still be able to read the data provided an
associated grammar has been created for this new format. Unfortunately I
have no way of requireing the vendor to provide/follow a schema/XML
format. :(

Thanks in advance for any feedback!

Casper Bang
 
A

Andy Fish

Casper B said:
If I have 3-4 specific ASCII/non-XML vendor-specific data-sheets, forming
tables of simple types (int, float, string) with space as delimiter. The
data is simple (from a grammar point of view) yet not as simple as a
2D-array/recordset. Example:

1234567894 00000100 50 10400
01330002 003 0000213337 10400
01330025 002 0000066887 10400
01330027 000 0000033841 10400
01330029 001 0000061182 10400
01330030 004 0000047411 10400
9999999998 0001165422- 10400
1234567894 00000100 50 10400
01330003 001 0000033671- 10400
01330004 001 0000116653- 10400
...looped data!

Normally I would parse this and do transformation using a
Compiler-Compiler. This is however, a very static approach (new format
would require recompilation etc) and certainly not suited for database
integration.

Can I somehow use XML or any features hereof (DTD, Xpath...) to
parse/validate vendor-specific ASCII/non-XML data-sheets and transform
this into a standard XML format.

you might be able to process these files using XML tools but it certainly
wouldn't help with the job of parsing them.

In XML, any data between tags is represented as text nodes so all you would
end up with would be either a single text node or a sequence of text nodes.
you would still have to use substring() or instr() type operations to locate
the individual fields. this would be more complicated in, say, xxlt code
than it would be in a conventional 3gl.

I think you need to treat the parsing of the incoming non-xml data as a
separate process. once you have done that, you can certainly build XML
structures and use XML tools to process and output the data.
 
E

eranb

Hi,
to handle the parsing side I would recomend taking a look at
ContentMaster, ItemField's file parsing
solution - using its parser studio a parsing solution for the scenario
you have just described can be created in minutes.

http://www.itemfield.com

ContentMaster is a complete multi-format (EDI, Excel, Word, RTF, custom

formats, etc.) text parsing solution, that comes with a dedicated
visual
authoring environment for the creation of parsing scripts, and a
parsing
engine that seamlessly integrates into any environement.
Regards,

Eran Berkowitz
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,836
Latest member
login dogas

Latest Threads

Top