searching a structured text data base

  • Thread starter Michael Friendly
  • Start date
M

Michael Friendly

I have a LaTeX document composed of historical items with structured
fields (on the history of data visualization,
http://www.math.yorku.ca/SCS/Gallery/milestone/)

I'd like to create a web-based facility to provide searching of these
items. As a first step, I've written a perl script to translate the
LaTeX stuff into various formats: tagged, CSV, HTML, XML. But I don't
know how to choose a data format and appropriate software tools to
accomplish this most easily.

There's a bewildering array of perl modules for databases, XML, etc.
but I'm not sure what would be most useful in this context. Can anyone
help point me in useful directions? I'm doing this on a debian linux
system, and a solution involving software other than perl is possible.

For example, the tagged format looks like this:

KEY: Ptolemy150
YEAR: c. 150
WHAT: Map projections of a spherical earth and use of latitude and
longitude to characterize position (first display of longitude)
WHO: Claudius Ptolemy
WHERE: Alexandria, Egypt
TXT: http://portico.bl.uk/exhibitions/maps/ptolemy.html::Ptolemy's
world map, description and high-res image::
TXT:
http://www-groups.dcs.st-and.ac.uk/~history/Mathematicians/Ptolemy.html::Ptolemy
history::
PIC: /SCS/Gallery/images/portraits/ptolemy.gif::ptolemy, portrait
from ca. 1400 (90 x 109; 9K)::
FIG: /SCS/Gallery/images/ptolemy-map.jpg::ptolemy's world map,
republished in 1482 (640 x 496; 40K)::
ADD: 11/22/00

and the XML format like this (I have a basic DTD):

<hdbitem key="Ptolemy150" added="11/22/00">
<keywords>latitude,longitude,projection,map!projection</keywords>
<description>Map projections of a spherical earth and use of latitude
and longitude to characterize position (first display of longitude)
</description>
<authors>
<who first="Claudius" last="Ptolemy" lived="c. 85--c. 165">Claudius
Ptolemy</who>
</authors>
<date from="c. 150" to="c. 150">c. 150</date>
<where>Alexandria, Egypt</where>
<commentary url="http://portico.bl.uk/exhibitions/maps/ptolemy.html"
text="Ptolemy's world map, description and high-res image" />
<commentary
url="http://www-groups.dcs.st-and.ac.uk/~history/Mathematicians/Ptolemy.html"
text="Ptolemy history" />
<figure type="portrait"
url="/SCS/Gallery/images/portraits/ptolemy.gif" height="109" width="90"
size="9K">
<caption>Ptolemy, portrait from ca. 1400</caption>
</figure>
<figure type="figure" url="/SCS/Gallery/images/ptolemy-map.jpg"
height="496" width="640" size="40K">
<caption>Ptolemy's world map, republished in 1482</caption>
</figure>
</hdbitem>



--
Michael Friendly Email: (e-mail address removed)
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
 
J

James Willmore

I have a LaTeX document composed of historical items with structured
fields (on the history of data visualization,
http://www.math.yorku.ca/SCS/Gallery/milestone/)

I'd like to create a web-based facility to provide searching of these
items. As a first step, I've written a perl script to translate the
LaTeX stuff into various formats: tagged, CSV, HTML, XML. But I don't
know how to choose a data format and appropriate software tools to
accomplish this most easily.

There's a bewildering array of perl modules for databases, XML, etc. but
I'm not sure what would be most useful in this context. Can anyone help
point me in useful directions? I'm doing this on a debian linux system,
and a solution involving software other than perl is possible.

[ ... ]

You might want to think about parsing the documents and putting them into
a database instead of using the XML files *as* a database.

XML::Simple might work for you. However, I don't use XML that much (right
now, at least :) ).

You could try looking over the various XML modules and see which might fit
the bill for you.

http://search.cpan.org/

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
"It's a summons." "What's a summons?" "It means summon's in
trouble." -- Rocky and Bullwinkle
 
C

ctcgag

Michael Friendly said:
I have a LaTeX document composed of historical items with structured
fields (on the history of data visualization,
http://www.math.yorku.ca/SCS/Gallery/milestone/)

I'd like to create a web-based facility to provide searching of these
items. As a first step, I've written a perl script to translate the
LaTeX stuff into various formats: tagged, CSV, HTML, XML. But I don't
know how to choose a data format and appropriate software tools to
accomplish this most easily.

I think that you are starting at the wrong end. What do you want the
interface to look like and do, and what tools if any do you have in mind
for making the interface? How many hits per minute do you want to support?
I would make those decisions first, and then look into the data storage
format secondarily.

Xho
 

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top