How to know if a file is a text file

L

Luca Fabbri

Hi all.

I'm looking for a way to be able to load a generic file from the
system and understand if he is plain text.
The mimetype module has some nice methods, but for example it's not
working for file without extension.

Any suggestion?
 
N

Nobody

I'm looking for a way to be able to load a generic file from the
system and understand if he is plain text.
The mimetype module has some nice methods, but for example it's not
working for file without extension.

Any suggestion?

You could use the "file" command. It's normally installed by default on
Unix systems, but you can get a Windows version from:

http://gnuwin32.sourceforge.net/packages/file.htm
 
C

Chris Rebert

You could use the "file" command. It's normally installed by default on
Unix systems, but you can get a Windows version from:

FWIW, IIRC the heuristic `file` uses to check whether a file is text
or not is whether it contains any null bytes; if it does, it
classifies it as binary (i.e. not text).

Cheers,
Chris
 
N

Nobody

FWIW, IIRC the heuristic `file` uses to check whether a file is text
or not is whether it contains any null bytes; if it does, it
classifies it as binary (i.e. not text).

"file" provides more granularity than that, recognising many specific
formats, both text and binary.

First, it uses "magic number" checks, checking for known signature bytes
(e.g. "#!" or "JFIF") at the beginning of the file. If those checks fail
it checks for common text encodings. If those also fail, it reports "data".

Also, UTF-16-encoded text is recognised as text, even though it may
contain a high proportion of NUL bytes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top