I looked for "VAV" in the files in the "encodings" directory
(/usr/lib/python2.4/encodings/*.py on my machine). I found that the following
character encodings seem to include hebrew characters:
cp1255
cp424
cp856
cp862
iso8859-8
A file containing hebrew text might be in any one of these encodings, or
any unicode-based encoding.
To open an encoded file for reading, use
f = codecs.open(file, 'r', encoding='...')
Now, calls like 'f.readline()' will return unicode strings.
Here's an example, using a file in UTF-8 I have laying around:...
u'UTF-8 encoded sample plain-text file\n'
u'\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\n'
u'\n'
u'Markus Kuhn [\u02c8ma\u02b3k\u028as ku\u02d0n] <
[email protected]> \u2014 1999-08-20\n'
u'\n'
Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFDU7SmJd01MZaTXX0RAsKzAJsFV94dRovEucFI0lzmrmjduiYsmQCfX7/F
NZ1jDK/UudrQmYgxFE/Ur0k=
=J63I
-----END PGP SIGNATURE-----