Check for text file

A

Alin Popa

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

Thanks,

Alin
 
A

Alex Young

Alin said:
Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

I think so, but it's a little unclear exactly what you're trying to
achieve. Do you have an example?
 
A

Alin Popa

Alex said:
I think so, but it's a little unclear exactly what you're trying to
achieve. Do you have an example?

I'm trying to do a replace in file for some text but I don't want to
consider files like archives or other binary files.
 
A

Alin Popa

Alin said:
I'm trying to do a replace in file for some text but I don't want to
consider files like archives or other binary files.

Of course, when I'm on windows I can go after the file extension and try
to ignore some specific (eg. .exe, .zip, .jar, .rar, .anything_i_want)
but I don't know how to do it on Linux/Unix OS where file extension is
not mandatory.
 
R

Robert Klemme

Of course, when I'm on windows I can go after the file extension and try
to ignore some specific (eg. .exe, .zip, .jar, .rar, .anything_i_want)
but I don't know how to do it on Linux/Unix OS where file extension is
not mandatory.

You could read the file (or portion of the file), create a histogram of
byte (or groups of bytes) occurrences and compare that to what you
expect for text files (e.g. most chars are "0-9a-zA-Z" and punctuation).

You could as well use command "file" and parse its output.

Kind regards

robert
 
G

George Malamidis

Hello,

On a *nix system, you can do

file_type = `file my_file`
puts file_type

but this will not work on Windows.

George
 
R

Robert Klemme

Hello,

On a *nix system, you can do

file_type = `file my_file`
puts file_type

but this will not work on Windows.

robert@fussel ~
$ file .inputrc
..inputrc: ASCII English text

robert@fussel ~
$ uname -a
CYGWIN_NT-5.1 fussel 1.5.24(0.156/4/2) 2007-01-31 10:57 i686 Cygwin

:)

robert
 
A

Alin Popa

George said:
Hello,

On a *nix system, you can do

file_type = `file my_file`
puts file_type

but this will not work on Windows.

George

Thanks guys, the problem was solved due to your indications ;)

Regarding file command, I can use it on win also since there are
gnuwin32 tools :)

Best regards,

Alin
 
D

Daniel DeLorme

Nobuyoshi said:
You can use String#count:

def File.binary?(path)
s = read(path, 4096) and
!s.empty? and
(/\0/n =~ s or s.count("\t\n -~").to_f/s.size<=0.7)
end

In any case, it doesn't work for non-ascii files.

Pedantic correction: it desn't work for non-western scripts. French uses
accents here and there but it would pass the test above.

Still, I have to say I was surprised; I didn't know that a hyphen in
String#count had the same effect as in a regexp character class. Talk
about an undocumented feature!

Daniel
 
N

Nobuyoshi Nakada

Hi,

At Wed, 20 Jun 2007 08:22:51 +0900,
Daniel DeLorme wrote in [ruby-talk:256241]:
Still, I have to say I was surprised; I didn't know that a hyphen in
String#count had the same effect as in a regexp character class. Talk
about an undocumented feature!

It's documented.

It can be
s.count("^\t\n -~").to_f/s.size>0.3
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,310
Members
47,978
Latest member
SheriBolli

Latest Threads

Top