SDN List & XML

A

Andy Dingley

Zamdrist said:
I wonder if it my be conceivable/practical to import on a regular
basis, hands-off, the XML version of this list directly from it's
source, the web, to a MS SQL database, in this case version 2K?

Of course.

(The rest is such a basic and uninteresting question that I leave its
actual implementation for the reader. Or else hire a suitable
contractor. If you've no budget and you're doing it from scratch, then
read up on RSS aggregators as useful example code.)
 
Z

Zamdrist

Andy said:
Of course.

(The rest is such a basic and uninteresting question that I leave its
actual implementation for the reader. Or else hire a suitable
contractor. If you've no budget and you're doing it from scratch, then
read up on RSS aggregators as useful example code.)

Thank you Andy. No, not asking anyone to do it for me, just looking for
a starting point.
 
L

lucm

Zamdrist said:
http://www.treas.gov/offices/enforcement/ofac/sdn/

I wonder if it my be conceivable/practical to import on a regular
basis, hands-off, the XML version of this list directly from it's
source, the web, to a MS SQL database, in this case version 2K?

Thoughts, ideas?

I see two different issues here: getting the XML, and importing it in
SQL2k.

First part is easy. The easier way I can figure out is to use a PHP
script (not a web page), from which you can download the XML. Then you
can call this script from a batch file, and call the batch file from a
CmdExec task in a DTS.

Here is a snippet that will get your file from the website. Save this
as yourfile.php.
(You need to have PHP installed, but you don't need to configure it in
a web server.)

# File begins here
<?
$handle = fopen("c:/SDN/sdn.xml", 'w');
fwrite($handle,
file_get_contents('http://www.treas.gov/offices/enforcement/ofac/sdn/sdn.xml'));
?>
# File ends here

To call the PHP script from a batch file:
php.exe yourfile.php

Make sure your PHP folder is in the system path.

The second issue is more tricky. With SQL2k there is no XML datatype,
so either you store the whole XML file in a varchar or text field, or
you use MXSML in a VBS to parse the content and you import in whatever
tables you have designed before.

Regards,
lucm
 
A

Andy Dingley

Zamdrist said:
Thank you Andy. No, not asking anyone to do it for me, just looking for
a starting point.

I hate to sound so negative here, because you've posted a very fair
question, but quite honestly this should either "just be obvious to
you", or else you should go talk to someone to whom it is already
obvious.
 
Z

Zamdrist

Andy said:
I hate to sound so negative here, because you've posted a very fair
question, but quite honestly this should either "just be obvious to
you", or else you should go talk to someone to whom it is already
obvious.

I appreciate your patience with my ignorance. I work extensively with
T-SQL, VB/VBA/VBScript...I'm not a complete moron, no you didn't say I
was.

Are you trying to say, it's as easy as say: File/Import?

I mean, there has to be a table definition somewhere first off right?
 
Z

Zamdrist

Andy said:
I hate to sound so negative here, because you've posted a very fair
question, but quite honestly this should either "just be obvious to
you", or else you should go talk to someone to whom it is already
obvious.

I'm looking into SQLXMLBulkLoad Com object right now, looks like what
I'll need to use.
 
Z

Zamdrist

Ok, I have some code and error for you :)

Some VBScript Code:

Dim objBulkLoad
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad")
objBulkLoad.ConnectionString = "provider=SQLOLEDB;data
source=server\server;database=SDNList;Integrated Security=SSPI;"
objBulkLoad.SchemaGen = True
objBulkLoad.BulkLoad = False
objBulkLoad.Execute
"http://www.treas.gov/offices/enforcement/ofac/sdn/sdn.xsd",
"http://www.treas.gov/offices/enforcement/ofac/sdn/sdn.xml"
Set objBulkLoad = Nothing

Of course, that'll wrap terribly. But anyhow, the error: Schema:
relationship expected on 'publshInformation'

My goal at this point is only to import the schema, and once I have
that working, import the data.

Thoughts? Thanks.
 
A

Andy Dingley

Zamdrist said:
I'm looking into SQLXMLBulkLoad Com object right now, looks like what
I'll need to use.

Looks like a reasonable place to start. Of course it might not give
you the table structure you want, but consider loading into an ugly
scratch table, then fixing that through the SQL you're familiar with.

Some notes:

1. It's the web. The web is slow, broken and wrong, usually at least
one of these at any time. Get your content down into a local temp file
before trying to throw it at a bulk loader COM object. Even if it
_claims_ to load from any URL, it probably won't do so reliably. The
XMLHTTP COM object will be useful here.

2. Encoding and entiities. You shouldn't need to know about these,
until one day it all breaks.
2.1 Use UTF-8 for everything you make, and allow as many tools as
possible to sniff for encodings on incoming content - they're usually
good at it. Don't specify encodings explicitly unless you have -
they're a contract to get it right in the future too, and if you can't
guarantee this to work you're just writing tomorrow's bugs for
yourself.

2.2 You shouldn't need entities and shouldn't use them. If things
break because your supplier used them (depressingly common in the RSS
world) it's because they've assumed the pre-declaration of the HTML
entity set (typically accented foreign language characters).

3. I wouldn't use VBScript for _anything_, JavaScript / JScript beats
it every time (just for the error handling, if nothing else)

4. You need an XMLSchema to make things work with SQLXMLBulkLoad This
will affect both data typing and the inferred table layout.

5. As with any ETL, pull the thing in first to an unconstrained table,
then worry about setting up constraints, indexes and foreign keys
afterwards. Drop them if you have to before the import, then re-create
-- far too slow to do it row-by-row.

6. Assume it will break. Don't throw away the old data until you've
loaded the new data. This includes caching the incoming XML document,
validating it against the schema, checking it's not a simple (but
valid) error message saying "Service off-line today", loading it into
the DB (any old table), then loading it into the DB (the right tables,
structure, and fully indexed and keyed). _Any_ of these steps can
fail, and in a large decoupled near-real-time system like this, they
will do so from time to time.

7. Time your downloads. Time the intervals between them. Watch out for
whether the data is actually changing. Use some variant on an
exponential back-off algorithm so that you catch the changes promptly,
but don't hammer on their servers every few minutes (try measuring the
actual change rate and re-loading at twice this frequency) Maybe they
publish metadata suggest how often to try it, or they offer a
lightweight notification service to tell you when to grab the big one.

8. Log everything. Log when you tried, when it worked, when it failed,
and how long it took. If it takes twice as long to download at 3am,
then don't try and grab it at 3am! They're probably running some
cron'ed batch job at that time of night and you're hitting it. Making
reliable systems means caring about this level of trivia.

9. Be polite. Hammering a feed service's servers too much will tend to
get your IP blocked.
 
Z

Zamdrist

Andy said:
Looks like a reasonable place to start. Of course it might not give
you the table structure you want, but consider loading into an ugly
scratch table, then fixing that through the SQL you're familiar with.

Some notes:

Thank you Andy, looks like some good advice.

All I really wanna do is grab the data and slam it into SQL Server,
work with the data from there, I don't want to do anything fancy with
the XML. Doesn't look like SQLXMLBulkLoad will work for me as the feed
as an inline schema. I'll have to find another way.

Your point about javascript is well taken, and I've just been lazy
using vbscript.

Incidently MS Access 2003 sucks in this feed in a blink of an eye,
structure and data or just structure. Interesting, but ultimately not
useful as I would want to do directly to MS SQL Server.

Thank you for your help, advice & patience.
 
Z

Zamdrist

Here is what I've learned so far, and a review of what I wish to
accomplish.

First off what I want to accomplish: Import the SDN (Special Designated
Names) List from:
http://www.treas.gov/offices/enforcement/ofac/sdn/sdn.xml to SQL
Server. Preferrably 'as is' without having to modify the file at all.

The U.S. Treasury makes the xml file and an xsd file avialble also
here: http://www.treas.gov/offices/enforcement/ofac/sdn/index.shtml

It appears as though the xml file contains an in-line schema. Quite
obviously the xsd also provides a scheme definition. Because
SQLXMLBulkLoad *does not* support in-line schemas, I cannot use it that
functionality. I have searched exhausitively online for some clue as
how I could import this data into SQL Server as is.

Could someone *please* take a brief look at the xml & xsd, and let me
know if it's even possible to 'bulk' load this data as is, or if it has
to be copied locally first? Or what the recommendation is? I can write
DTS packages, stored procedures, vb or javascripts, including some .Net
also.

Additionally I have search Usenet, including this group, others too
have struggled with this, and this data in particular. Unfortunately
there do not appear to be any clear answers.

Thank you!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,006
Messages
2,570,265
Members
46,860
Latest member
JeremiahCo

Latest Threads

Top