I
Ilmari Heikkinen
Quoth Ilmari Heikkinen:Yeah, I'm having some trouble. I have latest metadata (0.2).It should at least. If you're having trouble, lemme know
[snip]
Any ideas?
Yeah, I failed at using git. Jeez. Sorry about that.
Here's 0.3, it oughta work:
tarball: http://dark.fhtr.org/repos/metadata/metadata-0.3.tar.gz
git: http://dark.fhtr.org/repos/metadata
Hi Ilmari!
Just wanted to mention that despite the name, wmainfo will parse anything
wrapped in an ASF audio/video container format[0], so, you could use it to
parse wmv movies as well if your user didn't have mplayer installed.
[0] http://en.wikipedia.org/wiki/Advanced_Systems_Format
Thanks for the pointer!
I made it merge the wmainfo output to the mplayer output for wmv and asf.
Description
-----------
This package `Metadata' comes with a library called `metadata' and
a small program called `mdh'.
The library probes files for their metadata (e.g. jpeg dimensions
and camera make, mp3 artist, pdf word count) and returns the metadata
as a Hash.
Mdh can print out file metadata as YAML and package the metadata
with the file.
This package has many dependencies since there is no single universal
metadata header format that all files use. Blame resource forks, filename
extensions, bags of bytes and mimetypes.
Usage
-----
# print out metadata header
mdh -p myfile.jpg
# create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
mdh myfile.jpg
# print out metadata header from mdh file
mdh -e -p myfile.jpg.mdh
# strip out metadata header from mdh file and save it to myfile.jpg
mdh -e myfile.jpg.mdh
irb> Metadata.extract('myfile.jpg')
irb> Metadata.extract_text('myfile.jpg')
irb> Pathname.new("myfile.jpg").metadata
List of supported formats
-------------------------
Audio:
Successfully tested with:
mp3, flac, ogg, wav
Should also work:
wma, m4a
Video:
What you manage to make mplayer play, which can be just about anything.
Then again, missing title and author data, etc. (do videos even have those?)
Successfully tested with:
wmv, mov, divx, xvid, flv, ogm, mpg
Images:
Should handle pretty much anything (apart from XCF and ORF.)
Successfully tested with:
jpeg, png, gif, nef, dng, crw, pef, psd
Documents:
Successfully tested with:
pdf, ppt, odp, sxi, ps, ps.gz, html, txt
Should work:
- OpenOffice docs work to some degree (personally, I'm using unoconv to
convert OO docs to temp PDFs for the text & dimensions extraction, so
those bits of data are missing.)
- MS Office docs to some degree (ppt at least, doc and xls should work too,
dimensions missing due to the above temp PDF -thing.)
Others:
Whatever extract spits out on the five or six bits of metadata I'm using
from it. Archive contents at least.
Requirements
------------
* Ruby 1.8
* Tons of metadata extraction programs and libs,
list of gems:
flacinfo-rb
wmainfo-rb
MP4info
list of debian packages:
dcraw
libimlib2-ruby
extract
libimage-exiftool-perl
poppler-utils
mplayer
html2text
imagemagick
unhtml
pstotext
antiword
catdoc
shared-mime-info
vorbis-tools
* You do want to install the latest versions of dcraw and
shared-mime-info to be able to handle camera raw images.
http://cybercom.net/~dcoffin/dcraw/
http://freedesktop.org/wiki/Software/shared-mime-info
* Python + chardet library
http://chardet.feedparser.org/
Install
-------
De-compress archive and enter its top directory.
Then type:
($ su)
# ruby setup.rb
These simple step installs this program under the default
location of Ruby libraries. You can also install files into
your favorite directory by supplying setup.rb some options.
Try "ruby setup.rb --help".
License