validating plain text input files .....

C

championsleeper

we have a system that recives plain text files from numerous external
sources (database and others). our system recieves the information and
then processes it. each line in the input file is a record. our system
recognizes part of the records by character location
(1-6)=date,(7-9)=age etc, parses the record according to the
configuration it reads and proceses the data.

quite often, the quality of the recieved data is not "good" and we
have problems. while it would be preferable to perform the detailed
validation inside the application i'm keen to investigate alternatives
for a number of reasons.

this may sound like a bad idea but would it make any sense whatsoever
to parse the file into xml format and the use native modules in
languages such as perl to compare the xml file with the xml schema for
the input file? i know it would be much better to force the input
files to be xml but that may be a bridge too far for now.

i'm open to any suggestions. this idea came to me when i was running
around the park this evening and may be partly due to my dehydration
at the time!
 
D

Daniel Parker

championsleeper said:
this may sound like a bad idea but would it make any sense whatsoever
to parse the file into xml format and the use native modules in
languages such as perl to compare the xml file with the xml schema for
the input file? i know it would be much better to force the input
files to be xml but that may be a bridge too far for now.
You might be interested in checking out my open source project
http://servingxml.sourceforge.net/, which supports this idea. Check out the
"countries" and "hot 1" examples in the Examples link. This software
supports input streams of flat file records that may have different formats,
represented by record types. The record type is used as the document
element, and each field is represented as an element.

Regards,
Daniel Parker
http://servingxml.sourceforge.net/
 
M

Manuel Collado

championsleeper said:
... each line in the input file is a record...

this may sound like a bad idea but would it make any sense whatsoever
to parse the file into xml format and the use native modules in
languages such as perl to compare the xml file with the xml schema for
the input file? i know it would be much better to force the input
files to be xml but that may be a bridge too far for now.

The input parsing can be coded also in Perl, so no need for XML
intermediate representation. Besides that, if the input is
line-oriented, then AWK could be a better tool (simpler than Perl).

Post your question and sample data on and you will
probably have useful suggestions and even sample code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,835
Latest member
lila30

Latest Threads

Top