[ANN] metadata extractor

I

Ilmari Heikkinen

url: http://dark.fhtr.org/repos/metadata
tarball: http://dark.fhtr.org/repos/metadata/metadata-0.1.tar.gz

Description
-----------

This package `Metadata' comes with a library called `metadata' and
a small program called `mdh'.

The library probes files for their metadata (e.g. jpeg dimensions
and camera make, mp3 artist, pdf word count) and returns the metadata
as a Hash.

Mdh can print out file metadata as YAML and package the metadata
with the file.

This package has many dependencies since there is no single universal
metadata header format that all files use. Blame resource forks, filename
extensions, bags of bytes and mimetypes.

The metadata hash mostly follows the shared-metadata-spec naming.
http://wiki.freedesktop.org/wiki/Specifications/shared-filemetadata-spec

Usage
-----

# print out metadata header
mdh -p myfile.jpg

# create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
mdh myfile.jpg

# print out metadata header from mdh file
mdh -e -p myfile.jpg.mdh

# strip out metadata header from mdh file and save it to myfile.jpg
mdh -e myfile.jpg.mdh

irb> Metadata.extract('myfile.jpg')
irb> Metadata.extract_text('myfile.pdf')
irb> Pathname.new("myfile.jpg").metadata


Requirements
------------

* Ruby 1.8

* Tons of metadata extraction programs,
list of debian packages follows:
dcraw
libimlib2-ruby
extract
libimage-exiftool-perl
poppler-utils
mplayer
html2text
imagemagick
unhtml
pstotext
antiword
catdoc
shared-mime-info

* You do want to install the latest versions of dcraw and
shared-mime-info to be able to handle camera raw images.
http://cybercom.net/~dcoffin/dcraw/
http://freedesktop.org/wiki/Software/shared-mime-info

* Python + chardet library
http://chardet.feedparser.org/

License
-------

Ruby's


Ilmari Heikkinen <ilmari.heikkinen gmail com>
 
K

Konrad Meyer

--nextPart1602776.CBAyk5jXRV
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Quoth Ilmari Heikkinen on Monday 10 September 2007 04:18:25 pm:
url: http://dark.fhtr.org/repos/metadata
tarball: http://dark.fhtr.org/repos/metadata/metadata-0.1.tar.gz
=20
Description
-----------
=20
This package `Metadata' comes with a library called `metadata' and
a small program called `mdh'.
=20
The library probes files for their metadata (e.g. jpeg dimensions
and camera make, mp3 artist, pdf word count) and returns the metadata
as a Hash.
=20
Mdh can print out file metadata as YAML and package the metadata
with the file.
=20
This package has many dependencies since there is no single universal
metadata header format that all files use. Blame resource forks, filena= me
extensions, bags of bytes and mimetypes.
=20
The metadata hash mostly follows the shared-metadata-spec naming.
http://wiki.freedesktop.org/wiki/Specifications/shared-filemetadata-spec
=20
Usage
-----
=20
# print out metadata header
mdh -p myfile.jpg
=20
# create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
mdh myfile.jpg
=20
# print out metadata header from mdh file
mdh -e -p myfile.jpg.mdh
=20
# strip out metadata header from mdh file and save it to myfile.jpg
mdh -e myfile.jpg.mdh
=20
irb> Metadata.extract('myfile.jpg')
irb> Metadata.extract_text('myfile.pdf')
irb> Pathname.new("myfile.jpg").metadata
=20
=20
Requirements
------------
=20
* Ruby 1.8
=20
* Tons of metadata extraction programs,
list of debian packages follows:
dcraw
libimlib2-ruby
extract
libimage-exiftool-perl
poppler-utils
mplayer
html2text
imagemagick
unhtml
pstotext
antiword
catdoc
shared-mime-info
=20
* You do want to install the latest versions of dcraw and
shared-mime-info to be able to handle camera raw images.
http://cybercom.net/~dcoffin/dcraw/
http://freedesktop.org/wiki/Software/shared-mime-info
=20
* Python + chardet library
http://chardet.feedparser.org/
=20
License
-------
=20
Ruby's
=20
=20
Ilmari Heikkinen <ilmari.heikkinen gmail com>

Any chance this could be expanded to add FLAC and OGG support?

Thanks!
=2D-=20
Konrad Meyer <[email protected]> http://konrad.sobertillnoon.com/

--nextPart1602776.CBAyk5jXRV
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBG6PGwCHB0oCiR2cwRAi0BAKDA1t5zETum8zJc9A6pG36RNcf5FQCgjVfp
1dqPUoHkVeDkulSenEPi3lw=
=Bdzr
-----END PGP SIGNATURE-----

--nextPart1602776.CBAyk5jXRV--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top