B
Bruce Eckel
Thanks! The stringcmp.py module not only has nicely-documented
function calls, but also gives examples of how to use difflib.
It has also occurred to me that many of my examples will have text
that is "position consistent" between the control sample and test
sample -- that is, the stuff that matches will often be in exactly the
same place. So I might just be able to march through
position-by-position and do a simple comparison. (But I had to ask the
question on the newsgroup before I could think of the answer myself.
Funny how that works).
Anyway, I now seem to have some good footholds.
Bruce Eckel http://www.BruceEckel.com mailto:[email protected]
Contains electronic books: "Thinking in Java 3e" & "Thinking in C++ 2e"
Web log: http://www.mindview.net/WebLog
Subscribe to my newsletter:
http://www.mindview.net/Newsletter
My schedule can be found at:
http://www.mindview.net/Calendar
"The whole problem with the world is that fools and fanatics are always
so certain of themselves, and wiser people so full of doubts."
--Bertrand Russell
function calls, but also gives examples of how to use difflib.
It has also occurred to me that many of my examples will have text
that is "position consistent" between the control sample and test
sample -- that is, the stuff that matches will often be in exactly the
same place. So I might just be able to march through
position-by-position and do a simple comparison. (But I had to ask the
question on the newsgroup before I could think of the answer myself.
Funny how that works).
Anyway, I now seem to have some good footholds.
Python implementations of a range of such algorithms can be found in
Febrl - see section 9.2 of the manual:
http://datamining.anu.edu.au/projects/linkage.html#prototype_software
I suspect that a simple bigram comparison would meet your needs best. Or
just use the Python difflib module in the standard Python library which
implements the Ratcliff-Obershelp comparator.
--
Tim C
PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere
or at http://members.optushome.com.au/tchur/pubkey.asc
Key fingerprint = 8C22 BF76 33BA B3B5 1D5B EB37 7891 46A9 EAF9 93D0
Bruce Eckel http://www.BruceEckel.com mailto:[email protected]
Contains electronic books: "Thinking in Java 3e" & "Thinking in C++ 2e"
Web log: http://www.mindview.net/WebLog
Subscribe to my newsletter:
http://www.mindview.net/Newsletter
My schedule can be found at:
http://www.mindview.net/Calendar
"The whole problem with the world is that fools and fanatics are always
so certain of themselves, and wiser people so full of doubts."
--Bertrand Russell