B
Bulba!
OK. I have reworked this program (below) to use just the data
manipulation capabilities of Python (training was largely the
motivation). I've tried to manipulate the data just in Python
and not in typical loops. One thing that may not be entirely
crazy in this, IMHO, is the attempt to use built-in capabilities
of the language as much as possible instead of doing it "manually".
Anyway, Python is the only language I've seen (apart from functional
languages probably, but I have yet to get unconventional enough
to wet my feet there) where it is possible.
Still, it was not as easy as I wish it were: I've had to use 3
types of data structures (dictionaries, sets, lists) and arguably 6
"stages" (marked in comments below) to get it done.
And it still dies when the source terms are not unique. And
I haven't figured out the way of producing a list of dictionaries
that would have this particular key as unique across all the
dictionaries in this list.
[Also, for some reason the advice by another poster, to
use:
oldl=list(orig)
instead of:
oldl=[x for x in orig]
...somehow didn't work. The first instruction has produced only empty
lists.]
#---------Code follows-----------
import sys
import csv
from sets import Set as set
class excelpoldialect(csv.Dialect):
delimiter=';'
doublequote=True
lineterminator='\r\n'
quotechar='"'
quoting=0
skipinitialspace=False
epdialect=excelpoldialect()
csv.register_dialect('excelpol',epdialect)
try:
ofile=open(sys.argv[1],'rb')
except IOError:
print "Old file %s could not be opened" % (sys.argv[1])
sys.exit(1)
try:
tfile=open(sys.argv[2],'rb')
except IOError:
print "New file %s could not be opened" % (sys.argv[2])
sys.exit(1)
titles=csv.reader(ofile, dialect='excelpol').next()
orig=csv.DictReader(ofile, titles, dialect='excelpol')
transl=csv.DictReader(tfile, titles, dialect='excelpol')
cfile=open('cmpfile.csv','wb')
titles.append('New')
titles.append('RowChanged')
cm=csv.DictWriter(cfile,titles, dialect='excelpol')
cm.writerow(dict(zip(titles,titles)))
print titles
print "-------------"
# 1. creating lists of old & new translations
#oldl=list(orig)
#newl=list(orig)
oldl=[x for x in orig]
newl=[x for x in transl]
# oldl is a list of dictionaries like:
# [{'Polish': 'Zarzadzanie', 'TermID': '5', 'English':
'Administration'},
# {'Polish': 'Zarzadzanie systemem', 'TermID': '4', 'English':
'System Administration'},
# {'Polish': 'Testowanie', 'TermID': '5', 'English': 'Testing'}]
# 2. creation of intersection of sets of old and new English strings
to find the common source strings
oldeng=set([item['English'] for item in oldl])
neweng=set([item['English'] for item in newl])
matcheng = list(oldeng & neweng)
# 3. eliminating items not containing the common source strings
oldl=[x for x in oldl if x['English'] in matcheng]
newl=[x for x in newl if x['English'] in matcheng]
# 4. sorting lists
oldl.sort(lambda a, b: cmp(a['English'], b['English']))
newl.sort(lambda a, b: cmp(a['English'], b['English']))
# 5. defining comparison function
def matchpol(old,new):
retval={'TermID': old['TermID'], 'English': old['English'],
'Polish': old['Polish'], 'New': new['Polish'], 'RowChanged': ''}
if old['Polish'] != new['Polish']:
retval['RowChanged']='CHANGED'
return retval
# 6. Constructing list of target dictionaries
chglist=map(matchpol, oldl, newl)
# 7. Writing to a target file
cm.writerows(chglist)
# the end..
cfile.close()
ofile.close()
tfile.close()
manipulation capabilities of Python (training was largely the
motivation). I've tried to manipulate the data just in Python
and not in typical loops. One thing that may not be entirely
crazy in this, IMHO, is the attempt to use built-in capabilities
of the language as much as possible instead of doing it "manually".
Anyway, Python is the only language I've seen (apart from functional
languages probably, but I have yet to get unconventional enough
to wet my feet there) where it is possible.
Still, it was not as easy as I wish it were: I've had to use 3
types of data structures (dictionaries, sets, lists) and arguably 6
"stages" (marked in comments below) to get it done.
And it still dies when the source terms are not unique. And
I haven't figured out the way of producing a list of dictionaries
that would have this particular key as unique across all the
dictionaries in this list.
[Also, for some reason the advice by another poster, to
use:
oldl=list(orig)
instead of:
oldl=[x for x in orig]
...somehow didn't work. The first instruction has produced only empty
lists.]
#---------Code follows-----------
import sys
import csv
from sets import Set as set
class excelpoldialect(csv.Dialect):
delimiter=';'
doublequote=True
lineterminator='\r\n'
quotechar='"'
quoting=0
skipinitialspace=False
epdialect=excelpoldialect()
csv.register_dialect('excelpol',epdialect)
try:
ofile=open(sys.argv[1],'rb')
except IOError:
print "Old file %s could not be opened" % (sys.argv[1])
sys.exit(1)
try:
tfile=open(sys.argv[2],'rb')
except IOError:
print "New file %s could not be opened" % (sys.argv[2])
sys.exit(1)
titles=csv.reader(ofile, dialect='excelpol').next()
orig=csv.DictReader(ofile, titles, dialect='excelpol')
transl=csv.DictReader(tfile, titles, dialect='excelpol')
cfile=open('cmpfile.csv','wb')
titles.append('New')
titles.append('RowChanged')
cm=csv.DictWriter(cfile,titles, dialect='excelpol')
cm.writerow(dict(zip(titles,titles)))
print titles
print "-------------"
# 1. creating lists of old & new translations
#oldl=list(orig)
#newl=list(orig)
oldl=[x for x in orig]
newl=[x for x in transl]
# oldl is a list of dictionaries like:
# [{'Polish': 'Zarzadzanie', 'TermID': '5', 'English':
'Administration'},
# {'Polish': 'Zarzadzanie systemem', 'TermID': '4', 'English':
'System Administration'},
# {'Polish': 'Testowanie', 'TermID': '5', 'English': 'Testing'}]
# 2. creation of intersection of sets of old and new English strings
to find the common source strings
oldeng=set([item['English'] for item in oldl])
neweng=set([item['English'] for item in newl])
matcheng = list(oldeng & neweng)
# 3. eliminating items not containing the common source strings
oldl=[x for x in oldl if x['English'] in matcheng]
newl=[x for x in newl if x['English'] in matcheng]
# 4. sorting lists
oldl.sort(lambda a, b: cmp(a['English'], b['English']))
newl.sort(lambda a, b: cmp(a['English'], b['English']))
# 5. defining comparison function
def matchpol(old,new):
retval={'TermID': old['TermID'], 'English': old['English'],
'Polish': old['Polish'], 'New': new['Polish'], 'RowChanged': ''}
if old['Polish'] != new['Polish']:
retval['RowChanged']='CHANGED'
return retval
# 6. Constructing list of target dictionaries
chglist=map(matchpol, oldl, newl)
# 7. Writing to a target file
cm.writerows(chglist)
# the end..
cfile.close()
ofile.close()
tfile.close()