F
Fuzzyman
I've written an anagram finder that produces anagrams from a
dictionary of words. The user can load their own dictionary.
( http://www.voidspace.org.uk/atlantibots/nanagram.html )
In order to ensure it is able to find anagrams properly I wanted to
strip characters like punctuation etc from words in the dictionary and
words the user entered. I test(ed) against the 26 English letters (
string.ascii_lowercase ).
I now have someone who wants to use a French dictionary - with words
containing accented characters !! I have two choices - either map the
accented characters to their unaccented equivalent (slightly
innacurate) or treat the accented charcters as a separate letter (very
few anagrams). However - at the moment I can't experiment with either
because my default codec is the 7-bit ascii and crashes (sometimes !!)
when using the accented characters.
Has anyone any advice - or can point me to any resources - for
effectively handling these characters. I guess it's a latin-1 encoding
I want to use... I can't even work out how to cahnge the default
codec........
Thanks,
Fuzzy
http://www.voidspace.org.uk/atlantibots/pythonutils.html
dictionary of words. The user can load their own dictionary.
( http://www.voidspace.org.uk/atlantibots/nanagram.html )
In order to ensure it is able to find anagrams properly I wanted to
strip characters like punctuation etc from words in the dictionary and
words the user entered. I test(ed) against the 26 English letters (
string.ascii_lowercase ).
I now have someone who wants to use a French dictionary - with words
containing accented characters !! I have two choices - either map the
accented characters to their unaccented equivalent (slightly
innacurate) or treat the accented charcters as a separate letter (very
few anagrams). However - at the moment I can't experiment with either
because my default codec is the 7-bit ascii and crashes (sometimes !!)
when using the accented characters.
Has anyone any advice - or can point me to any resources - for
effectively handling these characters. I guess it's a latin-1 encoding
I want to use... I can't even work out how to cahnge the default
codec........
Thanks,
Fuzzy
http://www.voidspace.org.uk/atlantibots/pythonutils.html