HTML/XML Parsing...

R

Ruby Tuesday

I'm wondering if anyone ever come across an example on how to parse an html
table(with images) using either XSLT or Ruby scripts.

I'd like to be able to extract all the data and put them in the
database(MySQL, SQLite, etc).

There's a twist though, some of the image cells has 2 or more jpeg images
instead of one. Since I'm not an expert database designer, how do you do
that?

Table fields:
------------
xnum text(40) unique
image jpeg image (may have none, or 1+ images)
desc memo(256)
loc memo(256)

Thanks.
 
D

dhtapp

My first suggestion would be to
1. make a to-many relationship to the image records,
2. to store any IMG attributes parsed from the HTML in the image records
themselves (with maybe an ordering attribute within the to-many set, in case
sliced images are bumped up against each other for positioning), and
3. To decide whether to store the images themselves on a filesystem with
pathnames in the records, or to store image data as BLOBS within the records
themselves.

If you need to retrieve the images through a non-HTTP pipeline into another
process, then BLOBS may be the way to go. If I was simply going to generate
dynamic HTML, then I'd probably go ahead and put the images out on a
filesystem where both the database and the Webserver could get to 'em.

- dan
 
M

Mark Hubbart

Ruby Tuesday said:
I'm wondering if anyone ever come across an example on how to parse an
html
table(with images) using either XSLT or Ruby scripts.

I'd like to be able to extract all the data and put them in the
database(MySQL, SQLite, etc).

There's a twist though, some of the image cells has 2 or more jpeg
images
instead of one. Since I'm not an expert database designer, how do you
do
that?

Table fields:
------------
xnum text(40) unique
image jpeg image (may have none, or 1+ images)
desc memo(256)
loc memo(256)

Thanks.

It's been a while since I worked with databases, but perhaps something
like this:

table 1: "cells"
- id int autoincrement primary_key
- xnum text(40) unique
- desc ...
- loc ...

table 2: "images"
- id int autoincrement primary_key
- cell_id int index
- image blob


that way, more than one image could be linked to the same cell_id. then:

SELECT image, xnum FROM images, cells WHERE cell_id = cells.id;

...to select a list of records conating to fields: image data, and cell
numbers (assuming that's what the xnum is)

Alternatively, you could forgo the ids, and link images via xnums. But
I understand that using ids is the "right" way, whatever that means. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,143
Messages
2,570,822
Members
47,368
Latest member
michaelsmithh

Latest Threads

Top