J
jkugler
Hello,
I am trying to store huge amounts of data from xml files and put them
into a MySQL database. The xml files all are in this format:
<?xml version="1.0" encoding="UTF-8"?>
<snp_submission xsi:schemaLocation="http://www.hapmap.org
http://hapmap.cshl.org/xml-schema/2003-1
1-04/hapmap.xsd"
lsid="urn:LSID:ncbi.nlm.nih.gov:dbSNP/DCC_batch:34_chrom11_DELETE:005"
xmlns="http
://www.hapmap.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<batch_info>
<class>DELETE</class>
<group lsid="urn:lsid:dcc.hapmap.cshl.org:LabgroupBSNP:1"/>
<date_created>2005-09-15</date_created>
<contact>
<name>Steve Sherry</name>
<email>[email protected]</email>
</contact>
</batch_info>
<snps>
<snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs2461544:111">
<snp_class>
<bac-overlap/>
</snp_class>
<sequence>
<flank_5>caaaggaatataaatcattctattataaagatacatgcacAGGgctgggtgcagtggctcacacctgtaatccc
agcactttgggaggccaaggcgggtggatcacctgaggacaggagtttgagaccagcctagccaacatggggaaactccatctctactaaaaatacaaa
aattagccaggtatagtggtgcacacctgtaataccagctactttggaggctgaggcaggagaatcgctggaacccaggaggcagaggtcaaagtgagc
caagatcataccattgcactccagcctgggcaacaagagcaaaactccatcttaaaaaaatatatatatatacatatacatacatatatatacacatat
atatacatatatacagatattatatatgtaaatgtatatatatgtgtatatatatacacatatatatacatattataactacatatatatacacacaca
catacatatacatgcacacatatgtttattgcagcactatttacgatagaaaatacatggaatcctcccaaatgcccatcaatgatatattggataaag
aaaatgtgatatatattcaccatggaatactatgcagccgttaaaataaatgagatcatgttctttgcagggacatggatgaagctggaagccatcacc
ctcagcaaactaacacaggaaaagaaaaccaaacaccacatgttctcagtcgtaagagggagttgaacaatgagagcaaacacatggatacatggaggg
gaacaacacacaccagggcctctcagcgggacaggggtaggagaCCATCAGGACAaacacgtggatacatggaggggaacaacacacaccagggcctct
cagggggacagggggtaggagaccatcaagacaaacacgtggatacatggaggggaacaacacacaccagggcctctcagggggacagggggtaggaga
ccatcaggacaaacacgtggatacatggaggggaa</flank_5>
<variation>
<allele base="C"/>
<allele base="T"/>
</variation>
<flank_3>aacacacaccagggcctctcagggggacagggggtaggagaccatcaRgacaaacacgtggRtacatggagggg
aacaacacacaccagggcctctcagggggacggggggtagRagaccatcaggacaaatagctaatgcatgcagggcctcatacctaggtgatgggttga
tgggtgcagcaaaccaccatggcacacatttacctatgtatcaaacctaYactttctgcacgtgtatcccagaacataaaataaaatttaaaaaatata
taCACTGATTCATGATCTCCTttctctccttctgaaacactctttaaaactttttagcatttccccctctgtcttccatgtctcctaactacatgtttc
ttattttccatgtctttattcctgtgttcattttggatagccccttctgacctatattacagtttactagttcactcttcaactgcttctaacatacta
atattctgttaaaaccattcatttgggtttaaatttcaattatgttattctctatggacattctatttgttttcttttaatcttcttggccattctcta
gagtttcctgttccattatgatatttttaattttttgttttactttaaacatactaaatatagttattttattttattttctgtatctgatactttcaa
taactgcagtctttgctagtcttttttctgtgctcttgctcatagtttttttcatttgttttCATGATTagaaaaacagagagagaagaaggagagtaa
agggaggaggcggaggaggagaaaagaagaaagcagagaagaagggacagagaaaaaaaggaagTTGGTTCTAACGTTTCTCTAACAACTGGCTTCAGT
GAAACACTCCCACCTTGTGGATTTTTAGGTTATTGAAATTAACCAGTCTTCtgggtgcagcacaccaacatggcacatgtatacatatgtaacaaacct
gcactttgtgcacatgtaccctaaaacttaaagta</flank_3>
<length_class value="full-length"/>
</sequence>
<genomic_locations>
<genomic_location active="true" type="exact">
<assembly_version>ncbi_build34</assembly_version>
<coordinates chrom="11" start="60749" stop="60749"
strand="forward"/>
</genomic_location>
</genomic_locations>
<neighbour_snps>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs2461545:101"/>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs2461546:101"/>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs2461547:101"/>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs2686858:100"/>
</neighbour_snps>
</snp>
I am trying to extract the following information: lsid, allele base,
cooordinates chrom, start, and strand. I would ideally like to use a
perl script to pull this data out and put it in MySQL, but I have had
no luck with XML:arser since the information I am looking for is
embedded in the element's attributes.
If anyone has any ideas on how to get this data out of xml and into
MySQL, it would be much appreciated.
Thanks so much,
--James
I am trying to store huge amounts of data from xml files and put them
into a MySQL database. The xml files all are in this format:
<?xml version="1.0" encoding="UTF-8"?>
<snp_submission xsi:schemaLocation="http://www.hapmap.org
http://hapmap.cshl.org/xml-schema/2003-1
1-04/hapmap.xsd"
lsid="urn:LSID:ncbi.nlm.nih.gov:dbSNP/DCC_batch:34_chrom11_DELETE:005"
xmlns="http
://www.hapmap.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<batch_info>
<class>DELETE</class>
<group lsid="urn:lsid:dcc.hapmap.cshl.org:LabgroupBSNP:1"/>
<date_created>2005-09-15</date_created>
<contact>
<name>Steve Sherry</name>
<email>[email protected]</email>
</contact>
</batch_info>
<snps>
<snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs2461544:111">
<snp_class>
<bac-overlap/>
</snp_class>
<sequence>
<flank_5>caaaggaatataaatcattctattataaagatacatgcacAGGgctgggtgcagtggctcacacctgtaatccc
agcactttgggaggccaaggcgggtggatcacctgaggacaggagtttgagaccagcctagccaacatggggaaactccatctctactaaaaatacaaa
aattagccaggtatagtggtgcacacctgtaataccagctactttggaggctgaggcaggagaatcgctggaacccaggaggcagaggtcaaagtgagc
caagatcataccattgcactccagcctgggcaacaagagcaaaactccatcttaaaaaaatatatatatatacatatacatacatatatatacacatat
atatacatatatacagatattatatatgtaaatgtatatatatgtgtatatatatacacatatatatacatattataactacatatatatacacacaca
catacatatacatgcacacatatgtttattgcagcactatttacgatagaaaatacatggaatcctcccaaatgcccatcaatgatatattggataaag
aaaatgtgatatatattcaccatggaatactatgcagccgttaaaataaatgagatcatgttctttgcagggacatggatgaagctggaagccatcacc
ctcagcaaactaacacaggaaaagaaaaccaaacaccacatgttctcagtcgtaagagggagttgaacaatgagagcaaacacatggatacatggaggg
gaacaacacacaccagggcctctcagcgggacaggggtaggagaCCATCAGGACAaacacgtggatacatggaggggaacaacacacaccagggcctct
cagggggacagggggtaggagaccatcaagacaaacacgtggatacatggaggggaacaacacacaccagggcctctcagggggacagggggtaggaga
ccatcaggacaaacacgtggatacatggaggggaa</flank_5>
<variation>
<allele base="C"/>
<allele base="T"/>
</variation>
<flank_3>aacacacaccagggcctctcagggggacagggggtaggagaccatcaRgacaaacacgtggRtacatggagggg
aacaacacacaccagggcctctcagggggacggggggtagRagaccatcaggacaaatagctaatgcatgcagggcctcatacctaggtgatgggttga
tgggtgcagcaaaccaccatggcacacatttacctatgtatcaaacctaYactttctgcacgtgtatcccagaacataaaataaaatttaaaaaatata
taCACTGATTCATGATCTCCTttctctccttctgaaacactctttaaaactttttagcatttccccctctgtcttccatgtctcctaactacatgtttc
ttattttccatgtctttattcctgtgttcattttggatagccccttctgacctatattacagtttactagttcactcttcaactgcttctaacatacta
atattctgttaaaaccattcatttgggtttaaatttcaattatgttattctctatggacattctatttgttttcttttaatcttcttggccattctcta
gagtttcctgttccattatgatatttttaattttttgttttactttaaacatactaaatatagttattttattttattttctgtatctgatactttcaa
taactgcagtctttgctagtcttttttctgtgctcttgctcatagtttttttcatttgttttCATGATTagaaaaacagagagagaagaaggagagtaa
agggaggaggcggaggaggagaaaagaagaaagcagagaagaagggacagagaaaaaaaggaagTTGGTTCTAACGTTTCTCTAACAACTGGCTTCAGT
GAAACACTCCCACCTTGTGGATTTTTAGGTTATTGAAATTAACCAGTCTTCtgggtgcagcacaccaacatggcacatgtatacatatgtaacaaacct
gcactttgtgcacatgtaccctaaaacttaaagta</flank_3>
<length_class value="full-length"/>
</sequence>
<genomic_locations>
<genomic_location active="true" type="exact">
<assembly_version>ncbi_build34</assembly_version>
<coordinates chrom="11" start="60749" stop="60749"
strand="forward"/>
</genomic_location>
</genomic_locations>
<neighbour_snps>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs2461545:101"/>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs2461546:101"/>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs2461547:101"/>
<neighbour_snp
lsid="urn:lsid:ncbi.nlm.nih.gov:dbSNP;refSNP:rs2686858:100"/>
</neighbour_snps>
</snp>
I am trying to extract the following information: lsid, allele base,
cooordinates chrom, start, and strand. I would ideally like to use a
perl script to pull this data out and put it in MySQL, but I have had
no luck with XML:arser since the information I am looking for is
embedded in the element's attributes.
If anyone has any ideas on how to get this data out of xml and into
MySQL, it would be much appreciated.
Thanks so much,
--James