reading hebrew text file

H

hagai26

I have a hebrew text file, which I want to read in python
I don't know which encoding I need to use & how I do that

thanks,
hagai
 
A

Alex Martelli

I have a hebrew text file, which I want to read in python
I don't know which encoding I need to use & how I do that

As for the "how", look to the codecs module -- but if you don't know
what codec the textfile is written in, I know of no ways to guess from
here!-)


Alex
 
J

jepler

I looked for "VAV" in the files in the "encodings" directory
(/usr/lib/python2.4/encodings/*.py on my machine). I found that the following
character encodings seem to include hebrew characters:
cp1255
cp424
cp856
cp862
iso8859-8
A file containing hebrew text might be in any one of these encodings, or
any unicode-based encoding.

To open an encoded file for reading, use
f = codecs.open(file, 'r', encoding='...')
Now, calls like 'f.readline()' will return unicode strings.

Here's an example, using a file in UTF-8 I have laying around:...
u'UTF-8 encoded sample plain-text file\n'
u'\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\n'
u'\n'
u'Markus Kuhn [\u02c8ma\u02b3k\u028as ku\u02d0n] <[email protected]> \u2014 1999-08-20\n'
u'\n'

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDU7SmJd01MZaTXX0RAsKzAJsFV94dRovEucFI0lzmrmjduiYsmQCfX7/F
NZ1jDK/UudrQmYgxFE/Ur0k=
=J63I
-----END PGP SIGNATURE-----
 
F

Fredrik Lundh

I have a hebrew text file, which I want to read in python
I don't know which encoding I need to use

that's not a good start. but maybe it's one of these:

http://sites.huji.ac.il/tex/hebtex_fontsrep.html

?
how I do that

f = open(myfile)
text = f.readline()

followed by one of

text = text.decode("iso-8859-8")
text = text.decode("cp1255")
text = text.decode("cp862")

alternatively, use:

f = codecs.open(myfile, "r", encoding)

to get a stream that decodes things on the fly.

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,269
Messages
2,571,348
Members
48,026
Latest member
ArnulfoCat

Latest Threads

Top