Document identification

M

M. Eteum

Dear Ruby Guru:
Is there a way to identify any documents from its header? I have a
bunch of document collected over the year from multi platform system,
Mac, Windows, and various unix/linux variant where some of the document
does not have file extension. Are there a list that tells us what header
should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
word, excel, visio, etc ...

Thanks
 
R

Robin Stocker

M. Eteum said:
Is there a way to identify any documents from its header? I have a
bunch of document collected over the year from multi platform system,
Mac, Windows, and various unix/linux variant where some of the document
does not have file extension. Are there a list that tells us what header
should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
word, excel, visio, etc ...

Hi,

On a Unix system you could use the "file" command, it is able to detect
file types even when there's no extension.
I don't know if a Ruby module exists for this purpose though.

Regards,
Robin
 
A

Austin Ziegler

On a Unix system you could use the "file" command, it is able to detect
file types even when there's no extension.
I don't know if a Ruby module exists for this purpose though.

Not yet. ;) I do plan on adding it to MIME::Types in the future.

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)
 
M

M. Eteum

Robin said:
Hi,

On a Unix system you could use the "file" command, it is able to detect
file types even when there's no extension.
I don't know if a Ruby module exists for this purpose though.

Regards,
Robin
Thanks for the reply.

I'm running on Windows as well as MAC. We exchange files between both
OS. Ruby modules that can handle this function would have been nice but
I'll take anything for now.

Thanks again
 
M

M. Eteum

Austin said:
Not yet. ;) I do plan on adding it to MIME::Types in the future.

-austin

Super! Oh by the way, do you know if Perl or Python has it? I'm quite
desperate to find the solution, therefore I'll take any solution while
waiting for the Ruby modules.

Thanks
 
I

Ilmari Heikkinen

ke, 2005-06-01 kello 19:00, M. Eteum kirjoitti:
Dear Ruby Guru:
Is there a way to identify any documents from its header? I have a
bunch of document collected over the year from multi platform system,
Mac, Windows, and various unix/linux variant where some of the document
does not have file extension. Are there a list that tells us what header
should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
word, excel, visio, etc ...

Thanks

Hello,

If you have shared-mime-info database installed
( http://freedesktop.org/wiki/Software_2fshared_2dmime_2dinfo )
you can use this: http://www.code-monkey.de/projects/mimeInfoRb.html
Or my extended version: http://dark.fhtr.org/mime_info_rb.tar.gz
From the README:

MimeInfo class provides an interface to query freedesktop.org's
shared-mime-info database. It can be used to guess a filename's
Mimetype and to get the description for the Mimetype.

require 'mime_info'

info = MimeInfo.get('foo.xml') #=> Mimetype['text/xml']
info.description
#=> "eXtensible Markup Language document"
info.description("de") #=> "XML-Dokument"

info2 = MimeInfo.get('foo.rb') #=> Mimetype['application/x-ruby']
info2.description #=> "Ruby script"
info2.is_a? Mimetype['text/plain'] #=> true

t = Mimetype['audio/x-mp3'] #=> Mimetype['audio/x-mp3']
t.description #=> "MP3 audio"
t.description('cy') #=> "Sain MP3"
t.descriptions['fr'] #=> "audio MP3"
t == Mimetype['audio']['x-mp3'] #=> true
t.is_a? Mimetype['audio'] #=> true
t.ancestors #=> [Mimetype['audio/x-mp3'], Mimetype['audio'],
# Mimetype['application/octet-stream'], Mimetype,
# Module, Object, Kernel]


HTH,

Ilmari
 
A

Austin Ziegler

ke, 2005-06-01 kello 19:00, M. Eteum kirjoitti:

Most of this is covered by MIME::Types on RubyForge. However, the OP
indicated that the problem was related to NOT having proper filename
extensions. The OP wants to look for magic numbers and strings.

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)
 
I

Ilmari Heikkinen

ke, 2005-06-01 kello 23:33, Austin Ziegler kirjoitti:
Most of this is covered by MIME::Types on RubyForge. However, the OP
indicated that the problem was related to NOT having proper filename
extensions. The OP wants to look for magic numbers and strings.

Shared-mime-info does this aswell. Though it may fare worse than file in
some cases.

kig@bauhaus:~$ mv fire.avi fire
kig@bauhaus:~$ irb
irb(main):001:0> require 'mime_info'
=> true
irb(main):002:0> MimeInfo.get('fire')
=> Mimetype['video/x-msvideo']
 
M

Martin DeMello

M. Eteum said:
Super! Oh by the way, do you know if Perl or Python has it? I'm quite
desperate to find the solution, therefore I'll take any solution while
waiting for the Ruby modules.

Your best bet would be to find a windows port of unix's 'file' (Mac OSX
is definitely bound to have it). Sadly, it's a very hard thing to google
for :)

martin
 
M

M. Eteum

Ilmari said:
ke, 2005-06-01 kello 23:33, Austin Ziegler kirjoitti:
Most of this is covered by MIME::Types on RubyForge. However, the OP
indicated that the problem was related to NOT having proper filename
extensions. The OP wants to look for magic numbers and strings.


Shared-mime-info does this aswell. Though it may fare worse than file in
some cases.

kig@bauhaus:~$ mv fire.avi fire
kig@bauhaus:~$ irb
irb(main):001:0> require 'mime_info'
=> true
irb(main):002:0> MimeInfo.get('fire')
=> Mimetype['video/x-msvideo']
Thanks, but where do you get the mime_info.rb? I'm running "ruby 1.8.2
(2004-12-25) [i386-mswin32]" and it seems it does not have the necessary
files.

Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,172
Messages
2,570,934
Members
47,478
Latest member
ReginaldVi

Latest Threads

Top