P
ProvoWallis
Hi,
I'm trying to write a script that will extract the value of an
attribute from an element using the attribute value of another element
as the basis for extraction.
For example, in my situation I have a pre-defined list of main sections
and I want to extract the id attribute of the form element and create a
dictionary of graphic ID and section number pairs but only for the
sections in my pre-defined list but I want to exclude the id value from
any section that does not appear on my list. I.e., I want to know the
id value for the forms that appear in sections 1 and 3 but not in 2.
Boiled down my SGML looks something like this:
<main-section no="1">
<form id="graphic_1.tif">
<form id="graphic_2.tif">
<main-section no="2">
<form id="graphic_3.tif">
<main-section no="3">
<form id="graphic_4.tif">
<form id="graphic_5.tif">
<form id="graphic_6.tif">
This is what I have come up with on my own so far. My problem is that I
can't seem to pick up the value of the id attribute.
Any advice appreciated.
Greg
###
import os, re, csv
root = raw_input("Enter the path where the program should run: ")
fname = raw_input("Enter name of the CSV file containing the section
numbers: ")
sgmlname = raw_input("Enter name of the SGML file to search: ")
print
given,ext = os.path.splitext(fname)
root_name = os.path.join(root,fname)
n = given + '.new'
outputName = os.path.join(root,n)
reader = csv.reader(open(root_name, 'r'), delimiter=',')
sections = []
for row in reader:
sections.append(row[0])
inputFile = open(os.path.join(root,sgmlname), 'r')
illoList ={}
while 1:
lines = inputFile.readlines()
if not lines:
break
for line in lines:
main = re.search(r'(?i)(?m)(?s)<main-section
no=\"(\w+)\"', line)
id = re.search(r'(?i)id=\"(.*?tif)\"', line)
if main is not None and main.group(1) in sections:
if id is not None:
illoList[illo.group(1)] = main.group(1)
I'm trying to write a script that will extract the value of an
attribute from an element using the attribute value of another element
as the basis for extraction.
For example, in my situation I have a pre-defined list of main sections
and I want to extract the id attribute of the form element and create a
dictionary of graphic ID and section number pairs but only for the
sections in my pre-defined list but I want to exclude the id value from
any section that does not appear on my list. I.e., I want to know the
id value for the forms that appear in sections 1 and 3 but not in 2.
Boiled down my SGML looks something like this:
<main-section no="1">
<form id="graphic_1.tif">
<form id="graphic_2.tif">
<main-section no="2">
<form id="graphic_3.tif">
<main-section no="3">
<form id="graphic_4.tif">
<form id="graphic_5.tif">
<form id="graphic_6.tif">
This is what I have come up with on my own so far. My problem is that I
can't seem to pick up the value of the id attribute.
Any advice appreciated.
Greg
###
import os, re, csv
root = raw_input("Enter the path where the program should run: ")
fname = raw_input("Enter name of the CSV file containing the section
numbers: ")
sgmlname = raw_input("Enter name of the SGML file to search: ")
given,ext = os.path.splitext(fname)
root_name = os.path.join(root,fname)
n = given + '.new'
outputName = os.path.join(root,n)
reader = csv.reader(open(root_name, 'r'), delimiter=',')
sections = []
for row in reader:
sections.append(row[0])
inputFile = open(os.path.join(root,sgmlname), 'r')
illoList ={}
while 1:
lines = inputFile.readlines()
if not lines:
break
for line in lines:
main = re.search(r'(?i)(?m)(?s)<main-section
no=\"(\w+)\"', line)
id = re.search(r'(?i)id=\"(.*?tif)\"', line)
if main is not None and main.group(1) in sections:
if id is not None:
illoList[illo.group(1)] = main.group(1)