Can I use XML as an article database ?

Alvin SIU · May 28, 2007

Hi all,

I am a newbie with XML.
Hope that any expert can give me a hand to guide me the right
direction on this topic.

I have many articles, all are text file.
They are stored in many directories, according to its topic.

Using this method, I can easily classify the articles by topic.
But, I cannot classify it by Author, or by date.
So, 'directory' is not a good method.

If I put the articles into database,
I can easily add additional columns (e.g. Author, Date of Publish,
etc) to each article.

Then, I can easily sorted by Author or by Date.

But, using a database seems to be quite troublesome.

I wonder whether I can convert all article text file into an XML file
with, for example,
the following tags:
<author>xxx</author>
<date>yyyy-mm-dd</date>
<essay>The original article contents</essay>

Then, put all the XML files under a directory.
Then, use 'something' to search this directory.
Then, I can easily get a list sorted by Author, or by Date, or else.

Now, my questions are:

Q1. Is this method feasible ?

Q2. Is this a correct way of using XML ?
What I mean is XML designed for this use) ?

Q3. Is there anything in the world already done this ?
If yes, please guide me to that.

Q4. Is there anything related to this situation ?
If yes, please give me some keywords
so that I can continue searching the net.
I use the keywords : XML +document +index
but cannot find what I want.

Thanks for your expert advice in advance.
Alvin SIU

Pavel Lepin · May 28, 2007

Alvin SIU said:
I have many articles, all are text file.
They are stored in many directories, according to its
topic.

If I put the articles into database,
I can easily add additional columns (e.g. Author, Date of
Publish, etc) to each article.

Then, I can easily sorted by Author or by Date.

But, using a database seems to be quite troublesome.

Troublesome? I'm not sure what you mean. A database seems
like the only sensible way to go, whether it's XML
database, more traditional tuple-based RDBMS or something
else that has 'database' in its name. Because, whether you
realize it or not, what you describe *is* a database.

I wonder whether I can convert all article text file into
an XML file with, for example,
the following tags:
<author>xxx</author>
<date>yyyy-mm-dd</date>
<essay>The original article contents</essay>

Then, put all the XML files under a directory.

Right. Concealing the databaseness of your task behind the
familiar concepts of filesystem won't make The Database go
away. For that matter, any filesystem is a specialised
database.

Then, use 'something' to search this directory.

'Something' is called XQuery. You stuff your XML data into
an XML database, then use XPath/XQuery/XSLT/whatever else
to access it.

Q1. Is this method feasible ?

Not as you described. But if you replace 'directory'
with 'XML database' and 'something' with 'XQuery', it is.

Q2. Is this a correct way of using XML ?
What I mean is XML designed for this use) ?

XML is designed to represented structured data. XML
databases are designed to store and access structured data
represented as XML. XQuery is designed to query structured
data represented as XML.

Q3. Is there anything in the world already done this ?
If yes, please guide me to that.

IBM's DB2 9 Express-C. Alternatively, you might want to
google for XML databases.

Andy Dingley · May 29, 2007

Q1. Is this method feasible ?

As an example or as working code?

You can certainly do it, but performance for retrieving articles will
be terrible.

Q2. Is this a correct way of using XML ?
What I mean is XML designed for this use) ?

XML is a data format primarily for exchanging documents. Once they're
retrieved, store them in some sort of database.

For your example here, the obvious technology to use is a SQL
database. It's not a perfect choice, but it's very accessible to you.
Anyone can easily get hold of MySQL or Access-like database engines

Q3. Is there anything in the world already done this ?
If yes, please guide me to that.

About a squillion things already!

You should probably read up on:

Dublin Core (especially on this)
Metadata
OAI
RSS 1.0 / Atom syndication formats

You can do this in XML, although XML has restrictions that become a
real nuisance for big systems.

One of your problems isn't the storage and querying of your data, it's
the issue of "vocabularies". As your system grows bigger and more
interested in inter-working with other systems, then you start to care
about identifying "authors" such that "Douglas Adams" is the guy who
wrote "Health Monitoring of Structural Materials and Components", not
the guy with the towel obsession (follow the link - even the mighty
Amazon have got this one wrong).
<http://www.amazon.co.uk/exec/obidos/ASIN/0470033134/codesmiths>

This itself is a big topic! (with much work going on within it). You
might find yourself using techniques like XML Schema or even OWL to
list these. It also starts to hit the limits of XML, and you might
find RDF more useful to you.

How can i fix Mysql images not found error?	0	Dec 13, 2023
I can NOT install Anaconda on my Windows laptop correctly	2	Sep 18, 2023
How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 22, 2023
What should I do Before I give up programming?	6	Jan 14, 2023
XML python to database	3	Nov 2, 2013
How I can find related articles for single article - needs help toextract the keywords.	0	Jun 6, 2012
How do I use Find and Loop in VBA for Excel to identify, delete, and insert blank row for values greater than 6?	0	Feb 28, 2022
Generating XML Schemas from RDF	0	Apr 4, 2013

Can I use XML as an article database ?

Alvin SIU

Pavel Lepin

Andy Dingley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads