[ANN] Metadata 0.3

Ilmari Heikkinen · Sep 15, 2007

Quoth Ilmari Heikkinen:

It should at least. If you're having trouble, lemme know

Click to expand...

Yeah, I'm having some trouble. I have latest metadata (0.2).

[snip]

Any ideas?

Yeah, I failed at using git. Jeez. Sorry about that.
Here's 0.3, it oughta work:

tarball: http://dark.fhtr.org/repos/metadata/metadata-0.3.tar.gz
git: http://dark.fhtr.org/repos/metadata

Hi Ilmari!

Just wanted to mention that despite the name, wmainfo will parse anything
wrapped in an ASF audio/video container format[0], so, you could use it to
parse wmv movies as well if your user didn't have mplayer installed.

[0] http://en.wikipedia.org/wiki/Advanced_Systems_Format

Thanks for the pointer!
I made it merge the wmainfo output to the mplayer output for wmv and asf.

Description
-----------

This package `Metadata' comes with a library called `metadata' and
a small program called `mdh'.

The library probes files for their metadata (e.g. jpeg dimensions
and camera make, mp3 artist, pdf word count) and returns the metadata
as a Hash.

Mdh can print out file metadata as YAML and package the metadata
with the file.

This package has many dependencies since there is no single universal
metadata header format that all files use. Blame resource forks, filename
extensions, bags of bytes and mimetypes.

Usage
-----

# print out metadata header
mdh -p myfile.jpg

# create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
mdh myfile.jpg

# print out metadata header from mdh file
mdh -e -p myfile.jpg.mdh

# strip out metadata header from mdh file and save it to myfile.jpg
mdh -e myfile.jpg.mdh

irb> Metadata.extract('myfile.jpg')
irb> Metadata.extract_text('myfile.jpg')
irb> Pathname.new("myfile.jpg").metadata

List of supported formats
-------------------------

Audio:
Successfully tested with:
mp3, flac, ogg, wav
Should also work:
wma, m4a

Video:
What you manage to make mplayer play, which can be just about anything.
Then again, missing title and author data, etc. (do videos even have those?)
Successfully tested with:
wmv, mov, divx, xvid, flv, ogm, mpg

Images:
Should handle pretty much anything (apart from XCF and ORF.)
Successfully tested with:
jpeg, png, gif, nef, dng, crw, pef, psd

Documents:
Successfully tested with:
pdf, ppt, odp, sxi, ps, ps.gz, html, txt
Should work:
- OpenOffice docs work to some degree (personally, I'm using unoconv to
convert OO docs to temp PDFs for the text & dimensions extraction, so
those bits of data are missing.)
- MS Office docs to some degree (ppt at least, doc and xls should work too,
dimensions missing due to the above temp PDF -thing.)

Others:
Whatever extract spits out on the five or six bits of metadata I'm using
from it. Archive contents at least.

Requirements
------------

* Ruby 1.8

* Tons of metadata extraction programs and libs,
list of gems:
flacinfo-rb
wmainfo-rb
MP4info
list of debian packages:
dcraw
libimlib2-ruby
extract
libimage-exiftool-perl
poppler-utils
mplayer
html2text
imagemagick
unhtml
pstotext
antiword
catdoc
shared-mime-info
vorbis-tools

* You do want to install the latest versions of dcraw and
shared-mime-info to be able to handle camera raw images.
http://cybercom.net/~dcoffin/dcraw/
http://freedesktop.org/wiki/Software/shared-mime-info

* Python + chardet library
http://chardet.feedparser.org/

Install
-------

De-compress archive and enter its top directory.
Then type:

($ su)
# ruby setup.rb

These simple step installs this program under the default
location of Ruby libraries. You can also install files into
your favorite directory by supplying setup.rb some options.
Try "ruby setup.rb --help".

License

Konrad Meyer · Sep 15, 2007

--nextPart6653061.QBXH3ZSY6e
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Quoth Ilmari Heikkinen:

Quoth Ilmari Heikkinen:

Hmm, am I not seeing it (just using 'mdh -p') or can metadata.rb=20 extract
stuff like artist, title, album, track, and whatnot from ogg/flac?

It should at least. If you're having trouble, lemme know

Click to expand...

Yeah, I'm having some trouble. I have latest metadata (0.2).

[snip]

Any ideas?

Click to expand...

=20
Yeah, I failed at using git. Jeez. Sorry about that.
Here's 0.3, it oughta work:
=20
tarball: http://dark.fhtr.org/repos/metadata/metadata-0.3.tar.gz
git: http://dark.fhtr.org/repos/metadata
=20
=20

Hi Ilmari!

Just wanted to mention that despite the name, wmainfo will parse anythi= ng
wrapped in an ASF audio/video container format[0], so, you could use it= to
parse wmv movies as well if your user didn't have mplayer installed.

[0] http://en.wikipedia.org/wiki/Advanced_Systems_Format

Click to expand...

=20
Thanks for the pointer!
I made it merge the wmainfo output to the mplayer output for wmv and asf.
=20
=20
Description
-----------
=20
This package `Metadata' comes with a library called `metadata' and
a small program called `mdh'.
=20
The library probes files for their metadata (e.g. jpeg dimensions
and camera make, mp3 artist, pdf word count) and returns the metadata
as a Hash.
=20
Mdh can print out file metadata as YAML and package the metadata
with the file.
=20
This package has many dependencies since there is no single universal
metadata header format that all files use. Blame resource forks, filena= me
extensions, bags of bytes and mimetypes.
=20
=20
Usage
-----
=20
# print out metadata header
mdh -p myfile.jpg
=20
# create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
mdh myfile.jpg
=20
# print out metadata header from mdh file
mdh -e -p myfile.jpg.mdh
=20
# strip out metadata header from mdh file and save it to myfile.jpg
mdh -e myfile.jpg.mdh
=20
irb> Metadata.extract('myfile.jpg')
irb> Metadata.extract_text('myfile.jpg')
irb> Pathname.new("myfile.jpg").metadata
=20
=20
List of supported formats
-------------------------
=20
Audio:
Successfully tested with:
mp3, flac, ogg, wav
Should also work:
wma, m4a
=20
Video:
What you manage to make mplayer play, which can be just about anythin= g.
Then again, missing title and author data, etc. (do videos even have= =20
those?)
Successfully tested with:
wmv, mov, divx, xvid, flv, ogm, mpg
=20
Images:
Should handle pretty much anything (apart from XCF and ORF.)
Successfully tested with:
jpeg, png, gif, nef, dng, crw, pef, psd
=20
Documents:
Successfully tested with:
pdf, ppt, odp, sxi, ps, ps.gz, html, txt
Should work:
- OpenOffice docs work to some degree (personally, I'm using unoconv = to
convert OO docs to temp PDFs for the text & dimensions extraction, = so
those bits of data are missing.)
- MS Office docs to some degree (ppt at least, doc and xls should wor= k=20
too,
dimensions missing due to the above temp PDF -thing.)
=20
Others:
Whatever extract spits out on the five or six bits of metadata I'm us= ing
from it. Archive contents at least.
=20
Requirements
------------
=20
* Ruby 1.8
=20
* Tons of metadata extraction programs and libs,
list of gems:
flacinfo-rb
wmainfo-rb
MP4info
list of debian packages:
dcraw
libimlib2-ruby
extract
libimage-exiftool-perl
poppler-utils
mplayer
html2text
imagemagick
unhtml
pstotext
antiword
catdoc
shared-mime-info
vorbis-tools
=20
* You do want to install the latest versions of dcraw and
shared-mime-info to be able to handle camera raw images.
http://cybercom.net/~dcoffin/dcraw/
http://freedesktop.org/wiki/Software/shared-mime-info
=20
* Python + chardet library
http://chardet.feedparser.org/
=20
Install
-------
=20
De-compress archive and enter its top directory.
Then type:
=20
($ su)
# ruby setup.rb
=20
These simple step installs this program under the default
location of Ruby libraries. You can also install files into
your favorite directory by supplying setup.rb some options.
Try "ruby setup.rb --help".
=20
=20
License
-------
=20
Ruby's
=20
=20

Any chance you could wrap this up as a gem? It's not something I care
strongly about, and I don't know how complicated the process is, but I think
it would help ease installation for some users.

=2D-=20
Konrad Meyer <[email protected]> http://konrad.sobertillnoon.com/

--nextPart6653061.QBXH3ZSY6e
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBG64q7CHB0oCiR2cwRAo7PAKCd+VEEC1laxOSyE9fgwTVxty08RgCggAQC
Ud/k87bjXEpRm23L2gRyPyA=
=qXTO
-----END PGP SIGNATURE-----

--nextPart6653061.QBXH3ZSY6e--

Konrad Meyer · Sep 15, 2007

--nextPart2995463.av4K1aPkjp
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Quoth Ilmari Heikkinen:

Quoth Ilmari Heikkinen:

Hmm, am I not seeing it (just using 'mdh -p') or can metadata.rb=20 extract
stuff like artist, title, album, track, and whatnot from ogg/flac?

It should at least. If you're having trouble, lemme know

Click to expand...

Yeah, I'm having some trouble. I have latest metadata (0.2).

[snip]

Any ideas?

Click to expand...

=20
Yeah, I failed at using git. Jeez. Sorry about that.
Here's 0.3, it oughta work:
=20
tarball: http://dark.fhtr.org/repos/metadata/metadata-0.3.tar.gz
git: http://dark.fhtr.org/repos/metadata
=20
=20

Hi Ilmari!

Just wanted to mention that despite the name, wmainfo will parse anythi= ng
wrapped in an ASF audio/video container format[0], so, you could use it= to
parse wmv movies as well if your user didn't have mplayer installed.

[0] http://en.wikipedia.org/wiki/Advanced_Systems_Format

Click to expand...

=20
Thanks for the pointer!
I made it merge the wmainfo output to the mplayer output for wmv and asf.
=20
=20
Description
-----------
=20
This package `Metadata' comes with a library called `metadata' and
a small program called `mdh'.
=20
The library probes files for their metadata (e.g. jpeg dimensions
and camera make, mp3 artist, pdf word count) and returns the metadata
as a Hash.
=20
Mdh can print out file metadata as YAML and package the metadata
with the file.
=20
This package has many dependencies since there is no single universal
metadata header format that all files use. Blame resource forks, filena= me
extensions, bags of bytes and mimetypes.
=20
=20
Usage
-----
=20
# print out metadata header
mdh -p myfile.jpg
=20
# create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
mdh myfile.jpg
=20
# print out metadata header from mdh file
mdh -e -p myfile.jpg.mdh
=20
# strip out metadata header from mdh file and save it to myfile.jpg
mdh -e myfile.jpg.mdh
=20
irb> Metadata.extract('myfile.jpg')
irb> Metadata.extract_text('myfile.jpg')
irb> Pathname.new("myfile.jpg").metadata
=20
=20
List of supported formats
-------------------------
=20
Audio:
Successfully tested with:
mp3, flac, ogg, wav
Should also work:
wma, m4a
=20
Video:
What you manage to make mplayer play, which can be just about anythin= g.
Then again, missing title and author data, etc. (do videos even have= =20
those?)
Successfully tested with:
wmv, mov, divx, xvid, flv, ogm, mpg
=20
Images:
Should handle pretty much anything (apart from XCF and ORF.)
Successfully tested with:
jpeg, png, gif, nef, dng, crw, pef, psd
=20
Documents:
Successfully tested with:
pdf, ppt, odp, sxi, ps, ps.gz, html, txt
Should work:
- OpenOffice docs work to some degree (personally, I'm using unoconv = to
convert OO docs to temp PDFs for the text & dimensions extraction, = so
those bits of data are missing.)
- MS Office docs to some degree (ppt at least, doc and xls should wor= k=20
too,
dimensions missing due to the above temp PDF -thing.)
=20
Others:
Whatever extract spits out on the five or six bits of metadata I'm us= ing
from it. Archive contents at least.
=20
Requirements
------------
=20
* Ruby 1.8
=20
* Tons of metadata extraction programs and libs,
list of gems:
flacinfo-rb
wmainfo-rb
MP4info
list of debian packages:
dcraw
libimlib2-ruby
extract
libimage-exiftool-perl
poppler-utils
mplayer
html2text
imagemagick
unhtml
pstotext
antiword
catdoc
shared-mime-info
vorbis-tools
=20
* You do want to install the latest versions of dcraw and
shared-mime-info to be able to handle camera raw images.
http://cybercom.net/~dcoffin/dcraw/
http://freedesktop.org/wiki/Software/shared-mime-info
=20
* Python + chardet library
http://chardet.feedparser.org/
=20
Install
-------
=20
De-compress archive and enter its top directory.
Then type:
=20
($ su)
# ruby setup.rb
=20
These simple step installs this program under the default
location of Ruby libraries. You can also install files into
your favorite directory by supplying setup.rb some options.
Try "ruby setup.rb --help".
=20
=20
License
-------
=20
Ruby's
=20
=20

Er, I'm still not getting information out of ogg files:

$ mdh -p ~/music/bowling_for_soup_-_1985.ogg=20
---=20
Video.Duration: 192.78
Audio.Samplerate: 44100
Audio.Bitrate: 192.0
Image.DimensionUnit: px
Video.Codec: ""
File.Size: 4618665
Audio.Codec: vrbs
File.Modified: 2007-01-03T22:10:11-08:00
File.Format: video/x-theora+ogg

$ mplayer ~/music/bowling_for_soup_-_1985.ogg=20
...
Clip info:
Genre: Pop
Name: 1985
Artist: Bowling for Soup
Creation Date: 2004
Album: A Hangover You Don't Deserve
Track: 03

Thanks for your quick responses!

=2D-=20
Konrad Meyer <[email protected]> http://konrad.sobertillnoon.com/

--nextPart2995463.av4K1aPkjp
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBG65mRCHB0oCiR2cwRAnVtAJ0cVXoFSnrv0UpZ2lfvGMLJWqxjOACfcymj
VXGRvIgXtNAka48TWkAs49o=
=dS7r
-----END PGP SIGNATURE-----

--nextPart2995463.av4K1aPkjp--

Ilmari Heikkinen · Sep 15, 2007

Er, I'm still not getting information out of ogg files:

$ mdh -p ~/music/bowling_for_soup_-_1985.ogg
---
Video.Duration: 192.78
Audio.Samplerate: 44100
Audio.Bitrate: 192.0
Image.DimensionUnit: px
Video.Codec: ""
File.Size: 4618665
Audio.Codec: vrbs
File.Modified: 2007-01-03T22:10:11-08:00

File.Format: video/x-theora+ogg

^- That's the problem there. It thinks it's a video file.

<technical blather>
Why? Probably because I hacked the mimetype guesser to _not_ assume
things based on the filename extension, and the shared-mime-info db
assumes that the guesser _is_ assuming things based on the filename
extension.

Which is something I'd rather not do with downloaded files (which, by
their very nature, have wild disparities between the extension and the
real mimetype.) And the header content-type is often totally wrong or
doesn't match shared-mime-info's naming (e.g.
application/octet-stream vs. image/gif, audio/x-mp3 vs. audio/mpeg,
video/divx vs. video/x-msvideo, video/x-ms-asf vs. video/vnd.ms-asf...)

And this magic-over-extension sometimes leads to me getting generic
lesser-magic guesses instead of more specific filename extension
guesses (e.g. zip instead of OO document.) So, I have a list of
generic formats that defer to the extension rather than rely on
the lesser-magic.

Anyhow, it's ugly, hacky magic.
Just like the rest of mimetype guessing.
</technical blather>

But! Fixing this instance of the problem in the next thirty seconds.
... There!

And now, adding ogginfo metadata to video/x-theora+ogg.

Ok, try this:

http://dark.fhtr.org/repos/metadata/metadata-0.4.tar.gz

Thanks for your quick responses!

Thanks for the bug reports! They really help in making this thing
more robust.

Ilmari Heikkinen · Sep 15, 2007

$ mplayer ~/music/bowling_for_soup_-_1985.ogg
...
Clip info:
Genre: Pop
Name: 1985
Artist: Bowling for Soup
Creation Date: 2004
Album: A Hangover You Don't Deserve
Track: 03

Oh, nice, mplayer does give out metadata fields. I better augment
the mplayer info parser to grab those

0.5 here we come!

Konrad Meyer · Sep 15, 2007

--nextPart1969409.vsceX6HOzH
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Quoth Ilmari Heikkinen:

=20
Oh, nice, mplayer does give out metadata fields. I better augment
the mplayer info parser to grab those
=20
0.5 here we come!

Another bug (Sorry

):
$ mdh -p ~/music/Limp\ Bizkit\ -\ Rollin\'\ $edited$.ogg=20
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `ogginfo '/home/konrad/music/Limp Bizkit - Rollin\'
(edited).ogg''

(Last line was broken up to email length.) You're already escaping single
quotes for the shell, need to escape start-parens and end-parens as well.

Thanks,

=2D-=20
Konrad Meyer <[email protected]> http://konrad.sobertillnoon.com/

--nextPart1969409.vsceX6HOzH
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBG7DeUCHB0oCiR2cwRAgfCAJ9JhfSO4kETO7lDswAoWxMtmUaEFwCcCche
QslxJUdP2NKvjBuybxcJNx8=
=AWKF
-----END PGP SIGNATURE-----

--nextPart1969409.vsceX6HOzH--

Konrad Meyer · Sep 15, 2007

--nextPart5838677.qWcsKGLdKI
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Quoth Ilmari Heikkinen:

=20
Oh, nice, mplayer does give out metadata fields. I better augment
the mplayer info parser to grab those
=20
0.5 here we come!

Also:
=46or mp3 id3v2 tags, the binary string "\xCB\x99\xC5\xA3" is being inserted
at the front of all the string fields.

$ mdh -p ~/music/Snoop\ Dogg\ -\ Gin\ \&\ Juice.mp3
---=20
Audio.Album: "\xCB\x99\xC5\xA3Death Row's Snoop Doggy Dogg Greatest Hits
(2001)"
...
Audio.Genre: "\xCB\x99\xC5\xA3Hip-Hop"
Audio.Title: "\xCB\x99\xC5\xA3Gin & Juice"
...
Audio.Artist: "\xCB\x99\xC5\xA3Snoop Dogg"

I *think* this is an id3v2 thing. Also, it happens in more than one file and
amaroK sees the tags "correctly", so I'm thinking it's on the metadata's
end. Thanks!
=2D-=20
Konrad Meyer <[email protected]> http://konrad.sobertillnoon.com/

--nextPart5838677.qWcsKGLdKI
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBG7DlWCHB0oCiR2cwRAlSmAKDLiG+vPqM9m+ELgshJ26iXArm3XwCgis+h
F1zgbew/iWQuYDX5ccD5YvE=
=0DiB
-----END PGP SIGNATURE-----

--nextPart5838677.qWcsKGLdKI--

[ANN] Metadata 1.0-rc2	12	Sep 19, 2007
[ANN] Metadata 1.1	2	Sep 24, 2007
[ANN] metadata extractor	1	Sep 11, 2007
[ANN] Metadata 0.5	17	Sep 16, 2007
[ANN] igraph 0.3 Released	0	Aug 16, 2007
[ANN] rubydium 0.3 and nanovm 0.1	2	Feb 9, 2005
[ANN] mime-types 1.16 Released	0	Mar 2, 2009
[ANN] IHelp 0.2.0	10	Jan 16, 2005

[ANN] Metadata 0.3

Ilmari Heikkinen

Konrad Meyer

Konrad Meyer

Ilmari Heikkinen

Ilmari Heikkinen

Konrad Meyer

Konrad Meyer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads