V
Vlastimil Brom
Hi all,
I'd like to ask about the most reasonable/recommended/... way to
modify the functionality of the standard library module (if it is
recommended at all).
I'm using difflib.SequenceMatcher for character-wise comparisons of
the texts; although this might not be a usual use case, the results
are fine for the given task; however, there were some cornercases,
where the shown differences were clearly larger than needed. As it
turned out, this is due to a kind of specialcasing of relatively more
frequent items; cf.
http://bugs.python.org/issue1528074#msg29269
http://bugs.python.org/issue2986
The solution (or workaround) for me was to modify the SequenceMatcher
class by adding another parameter checkpopular=True which influences
the behaviour of the __chain_b function accordingly. The possible
speed issues with this optimisation turned off (checkpopular=False)
don't really matter now and the comparison results are much better for
my use cases.
However, I'd like to ask, how to best maintain this modified
functionality in the sourcecode.
I tried some possibilities, which seem to work, but I'd appreciate
suggestions on the preferred way in such cases.
- It is simply possibly to have a modified sourcefile difflib.py in
the script directory.
- Furthermore one can subclass difflib.SequenceMatcher an overide its
__chain_b function (however the name doesn't look like a "public"
function ...
- I guess, it wouldn't be recommended to directly replace
difflib.SequenceMatcher._SequenceMatcher__chain_b ...
In all cases I have either a copy of the whole file or the respective
function as a part of my source.
I'd appreciate comments or suggestions on this or maybe another better
approaches to this problem.
Thanks in advance,
vbr
I'd like to ask about the most reasonable/recommended/... way to
modify the functionality of the standard library module (if it is
recommended at all).
I'm using difflib.SequenceMatcher for character-wise comparisons of
the texts; although this might not be a usual use case, the results
are fine for the given task; however, there were some cornercases,
where the shown differences were clearly larger than needed. As it
turned out, this is due to a kind of specialcasing of relatively more
frequent items; cf.
http://bugs.python.org/issue1528074#msg29269
http://bugs.python.org/issue2986
The solution (or workaround) for me was to modify the SequenceMatcher
class by adding another parameter checkpopular=True which influences
the behaviour of the __chain_b function accordingly. The possible
speed issues with this optimisation turned off (checkpopular=False)
don't really matter now and the comparison results are much better for
my use cases.
However, I'd like to ask, how to best maintain this modified
functionality in the sourcecode.
I tried some possibilities, which seem to work, but I'd appreciate
suggestions on the preferred way in such cases.
- It is simply possibly to have a modified sourcefile difflib.py in
the script directory.
- Furthermore one can subclass difflib.SequenceMatcher an overide its
__chain_b function (however the name doesn't look like a "public"
function ...
- I guess, it wouldn't be recommended to directly replace
difflib.SequenceMatcher._SequenceMatcher__chain_b ...
In all cases I have either a copy of the whole file or the respective
function as a part of my source.
I'd appreciate comments or suggestions on this or maybe another better
approaches to this problem.
Thanks in advance,
vbr