reading hebrew text file

hagai26 · Oct 17, 2005

I have a hebrew text file, which I want to read in python
I don't know which encoding I need to use & how I do that

thanks,
hagai

Alex Martelli · Oct 17, 2005

I have a hebrew text file, which I want to read in python
I don't know which encoding I need to use & how I do that

As for the "how", look to the codecs module -- but if you don't know
what codec the textfile is written in, I know of no ways to guess from
here!-)

Alex

jepler · Oct 17, 2005

I looked for "VAV" in the files in the "encodings" directory
(/usr/lib/python2.4/encodings/*.py on my machine). I found that the following
character encodings seem to include hebrew characters:
cp1255
cp424
cp856
cp862
iso8859-8
A file containing hebrew text might be in any one of these encodings, or
any unicode-based encoding.

To open an encoded file for reading, use
f = codecs.open(file, 'r', encoding='...')
Now, calls like 'f.readline()' will return unicode strings.

Here's an example, using a file in UTF-8 I have laying around:...
u'UTF-8 encoded sample plain-text file\n'
u'\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\n'
u'\n'
u'Markus Kuhn [\u02c8ma\u02b3k\u028as ku\u02d0n] <[email protected]> \u2014 1999-08-20\n'
u'\n'

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDU7SmJd01MZaTXX0RAsKzAJsFV94dRovEucFI0lzmrmjduiYsmQCfX7/F
NZ1jDK/UudrQmYgxFE/Ur0k=
=J63I
-----END PGP SIGNATURE-----

Fredrik Lundh · Oct 17, 2005

I have a hebrew text file, which I want to read in python
I don't know which encoding I need to use

that's not a good start. but maybe it's one of these:

http://sites.huji.ac.il/tex/hebtex_fontsrep.html

?

how I do that

f = open(myfile)
text = f.readline()

followed by one of

text = text.decode("iso-8859-8")
text = text.decode("cp1255")
text = text.decode("cp862")

alternatively, use:

f = codecs.open(myfile, "r", encoding)

to get a stream that decodes things on the fly.

</F>

hagai26 · Oct 18, 2005

realy thanks

hagai

Reading Ports (uart) in windows	3	Jan 27, 2023
Text Editor that Supports GCC	4	Dec 1, 2024
Hebrew in idle ans eclipse (Windows)	7	Jan 16, 2008
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
Python - Hidden Text / Html Mail	3	Feb 7, 2025
Text box simply do not stand out against the wall paper.	2	Feb 7, 2025
children of option	3	Nov 26, 2013
Need help reading .wav file in C#	0	Jun 18, 2019

reading hebrew text file

hagai26

Alex Martelli

jepler

Fredrik Lundh

hagai26

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads