Request for comments on a JPEG metadata Perl module

S

Stefano Bettelli

Hi,

I got recently interested in the possibility of designing a Perl
library for reading and modifying JPEG image metadata (with Exif
info, IPTC info, comments, thumbnails and so on). This kind of
additional data stored in the image itself is very useful for
organising digital photo collections. For various reasons, the
existing Perl libraries and programs do not fully satisfy me,
so I decided to enter the arena and write a Perl module (this is
also a good way to learn the language better ...).

I would like to ask you some suggestions
on how to design this module:

1) Do you think that submitting this module to CPAN is worth of,
or do you think that what is available is already sufficient?
2) What is the best name for the module?
I am currently using Image::MetaInfo::JPEG.
3) How can I decide what is the minimum Perl version
for running the module?
4) Do you have any idea on how it could be extended? Whether
there are interesting functionalities I did not think about?
Do you have any suggestion on code style?

Every other suggestion is of course welcome. You can download
(FOR THE TIME BEING, let's say for next two weeks) the module
at the following address:

http://82.229.136.165/IMJ/

In the following I am listing the main functionalities supplied
by my module. The purpose of this module is to read/modify/
rewrite meta-data segments in JPEG files, which can contain
comments, thumbnails, Exif information (photographic parameters),
IPTC information (editorial parameters) and similar data.

Each JPEG file is made of consecutive segments (data blocks
prefixed by a 2 byte segment code and a 2 byte segment length),
exception made for the actual picture data (the so called entropy
coded segment(s), which are indeed row data). Most of these
segments specify parameters for decoding the picture data into a
bitmap; some of them, namely the COMment and APPlication segments,
contain however meta-data, i.e., information about how the photo
was shot (usually added by a digital camera) and additional notes
from the photograph. These additional pieces of information are
especially valuable for picture databases, since the meta-data
can be saved together with the picture without resorting to
additional database structures.

This module works by breaking a JPEG file into individual segments.
Each file is associated to an Image::MetaInfo::JPEG structure
object, which contains one Image::MetaInfo::JPEG::Segment object
for each segment. Segments with a known format are then parsed,
and their content can be accessed in a structured way for display.
Some of them can even be modified and then rewritten to disk.

The current state is the following:

Segment Possible content Status
------------------------------------------
COM User comments parse/read/write
APP0 JFIF data (+ thumbnail) parse/read
APP1 Exif or XMP data parse
APP2 FPXR data or ICC profiles parse
APP3 additional EXIF-like data parse
APP4 HPSC nothing
APP12 PreExif ASCII meta parse[devel.]
APP13 IPTC and PhotoShop data parse/read/(write IPTC [devel.])
APP14 Adobe tags parse

"Parse" means that the segment content is decoded and stored
in low-level records. "Read" means that these data are available
in a more organised way at a higher level. The package contains
a quite detailed perldoc page, which you can read for further
info. This is the index:

1) STRUCTURE OF JPEG PICTURES
2) MANAGING A JPEG STRUCTURE OBJECT
3) MANAGING A JPEG SEGMENT OBJECT
4) MANAGING A JPEG RECORD OBJECT
5) COMMENTS ("COM" segments)
6) JFIF data ("APP0" segments)
7) IPTC DATA (from "APP13" segments)
8) CURRENT STATUS
-) Known Problems
-) References
-) OTHER PACKAGES (the "competitors")

I plan to add read/write support for Exif data in a few weeks.
The module contains already a test suite with 67 tests.

Thank you in advance for every suggestions,
best regards,
Stefano Bettelli
 
G

GreenLight

Stefano Bettelli said:
Hi,

I got recently interested in the possibility of designing a Perl
library for reading and modifying JPEG image metadata (with Exif
info, IPTC info, comments, thumbnails and so on). This kind of
additional data stored in the image itself is very useful for
organising digital photo collections. For various reasons, the
existing Perl libraries and programs do not fully satisfy me,
so I decided to enter the arena and write a Perl module (this is
also a good way to learn the language better ...).

I have been wanting to create a catalog of my photos for quite some
time. I have thousands of photos that I have taken over the past five
years, and it looks like your module could help me quite a bit.
4) Do you have any idea on how it could be extended? Whether
there are interesting functionalities I did not think about?

I guess that I should read the information regarding the JPEG format
to get the answer, but maybe you know this: would it be possible to
add segments of information to the file that were of my own design? I
would like to add some flags to each photo that would show that I have
completed cataloging it.

I used you module to parse a file from my camera (Casio QV-2000UX).
Here is part of the info that was returned:

********** APP1 --> IFD0 ********** (11 records)
[ Make]<0x010f> = [ ASCII] "CASIO\00"
[ Model]<0x0110> = [ ASCII] "QV-2000UX\00"
[ Orientation]<0x0112> = [ SHORT] 1
[ XResolution]<0x011a> = [ RATIONAL] 72/1
[ YResolution]<0x011b> = [ RATIONAL] 72/1
[ ResolutionUnit]<0x0128> = [ SHORT] 2
[ Software]<0x0131> = [ ASCII] "99.09.07.11.08\00
\00\00\00\00\00\00\00\00"
[ DateTime]<0x0132> = [ ASCII] "2001:03:07
13:53:27\00"
[ YCbCrPositioning]<0x0213> = [ SHORT] 1
[ ExifOffset]<0x8769> = [ LONG] 210
[ SubIFD]<......> = [REFERENCE] --> 19692dc
********** APP1 --> IFD0 --> SubIFD ********** (21 records)
[ ExposureTime]<0x829a> = [ RATIONAL]
10000/653167
[ FNumber]<0x829d> = [ RATIONAL] 20/10
[ ExposureProgram]<0x8822> = [ SHORT] 2
[ ExifVersion]<0x9000> = [ UNDEF] 30 32 31 30
[ DateTimeOriginal]<0x9003> = [ ASCII] "2001:03:07
13:53:27\00"
[ DateTimeDigitized]<0x9004> = [ ASCII] "2001:03:07
13:53:27\00"
[ ComponentsConfiguration]<0x9101> = [ UNDEF] 01 02 03 00
[ CompressedBitsPerPixel]<0x9102> = [ RATIONAL]
2048000/480000
[ ExposureBiasValue]<0x9204> = [SRATIONAL] 0/3
[ MaxApertureValue]<0x9205> = [ RATIONAL] 20/10
[ MeteringMode]<0x9207> = [ SHORT] 5
[ Flash]<0x9209> = [ SHORT] 1
[ FocalLength]<0x920a> = [ RATIONAL]
126865/10000
[ MakerNote]<0x927c> = [ UNDEF] 00 14 00 01
00 03 00 00 ... (238 more values)
[ FlashPixVersion]<0xa000> = [ UNDEF] 30 31 30 30
[ ColorSpace]<0xa001> = [ SHORT] 1
[ PixelXDimension]<0xa002> = [ LONG] 800
[ PixelYDimension]<0xa003> = [ LONG] 600
[ InteroperabilityOffset]<0xa005> = [ LONG] 790
[ FileSource]<0xa300> = [ UNDEF] 03
[ Interop]<......> = [REFERENCE] --> 196a728

This is just what I need: the date & time of the photo, etc. I can use
this info to stick a record in a database that holds basic info & the
filesystem location of the photo. I would like to be able to set some
kind of flag in the file, then, so that when I did a subsequent sweep
of the disk for image files, I could easily skip photos that had
already been processed.
 
J

Josef Moellers

GreenLight said:
Stefano Bettelli <[email protected]> wrote in message news:<[email protected]>...
I guess that I should read the information regarding the JPEG format
to get the answer, but maybe you know this: would it be possible to
add segments of information to the file that were of my own design? I
would like to add some flags to each photo that would show that I have
completed cataloging it.

It depends. Obviously the standard is designed such that any application
can skip those tags that it doesn't know. However, you cannot guarantee
that all software is written properly.
I used you module to parse a file from my camera (Casio QV-2000UX).
Here is part of the info that was returned:
[ ... ]

This is just what I need: the date & time of the photo, etc. I can use
this info to stick a record in a database that holds basic info & the
filesystem location of the photo. I would like to be able to set some
kind of flag in the file, then, so that when I did a subsequent sweep
of the disk for image files, I could easily skip photos that had
already been processed.

I, too, did some work on this subject.
I have a Kodac DC240 which stores the images in EXIF format.

I store all my photos in the "Exif" directory.
Also, there exist "Photo", "Info" and "Thumb" directories.
When I scan the photos, I scan the Exif directory (File::Find), then, if
no entry exists in Photo, I extract the large image, if no entry exists
in Info, I extract the information, if no entry exists in Thumb, I
extract the thumbnail.
Then I have .alb files which describe what photos belong together and I
create html pages with the thumbnails that have links to the large images.
 
G

Gisle Aas

Stefano Bettelli said:
I got recently interested in the possibility of designing a Perl
library for reading and modifying JPEG image metadata (with Exif
info, IPTC info, comments, thumbnails and so on). This kind of
additional data stored in the image itself is very useful for
organising digital photo collections. For various reasons, the
existing Perl libraries and programs do not fully satisfy me,
so I decided to enter the arena and write a Perl module (this is
also a good way to learn the language better ...).

Could you name what existing Perl libraries you have looked at and why
they don't satisfy you?

I'm the author of Image::Info which seems to already do a lot of the
same as you try to do. One difference is that I don't plan to make
Image::Info able to update the meta info. I think that would
complicate the module too much and I don't have that need personally.
 
M

Martin Herrmann

I have been wanting to create a catalog of my photos for quite some
time. I have thousands of photos that I have taken over the past five
years, and it looks like your module could help me quite a bit.


I guess that I should read the information regarding the JPEG format
to get the answer, but maybe you know this: would it be possible to
add segments of information to the file that were of my own design? I
would like to add some flags to each photo that would show that I have
completed cataloging it.

That flag is already there:
I would use the IPTC/edit status for this purpose.
This is just what I need: the date & time of the photo, etc. I can use
this info to stick a record in a database that holds basic info & the
filesystem location of the photo. I would like to be able to set some
kind of flag in the file, then, so that when I did a subsequent sweep
of the disk for image files, I could easily skip photos that had
already been processed.

If you are looking for a GUI to manage your photos, you may have a
look at Mapivi (http://mapivi.de.vu), it's free, runs on Windows and
UNIX and the next version will use Stefanos Bettellis new module
Image::MetaInfo::JPEG.

Bye,
Martin
 
M

Martin Herrmann

Josef Moellers said:
It depends. Obviously the standard is designed such that any application

can skip those tags that it doesn't know. However, you cannot guarantee
that all software is written properly.

I'm sure that this approach will cause nothing but trouble. There are
so many picture applications which e.g. will only handle the first
comment segment and throw away the rest ...

As noted in the other post, I strongly recomment using "standard"
segements for storing such informations, like the IPTC info
(http://www.iptc.org).
I used you module to parse a file from my camera (Casio QV-2000UX).
Here is part of the info that was returned:
[ ... ]

This is just what I need: the date & time of the photo, etc. I can use
this info to stick a record in a database that holds basic info & the
filesystem location of the photo. I would like to be able to set some
kind of flag in the file, then, so that when I did a subsequent sweep
of the disk for image files, I could easily skip photos that had
already been processed.

I, too, did some work on this subject.
I have a Kodac DC240 which stores the images in EXIF format.

I store all my photos in the "Exif" directory.
Also, there exist "Photo", "Info" and "Thumb" directories.
When I scan the photos, I scan the Exif directory (File::Find), then, if

no entry exists in Photo, I extract the large image, if no entry exists
in Info, I extract the information, if no entry exists in Thumb, I
extract the thumbnail.
Then I have .alb files which describe what photos belong together and I

What are .alb files?
create html pages with the thumbnails that have links to the large images
.

I'm not exacly sure, that I understand everything you wrote, but it
seems to me, that most of this (including the html export, handling of
EXIF infos and thumbnails) can be done with Mapivi
(http://mapivi.de.vu).

Bye,
Martin
 
J

Josef Moellers

Martin said:
I'm sure that this approach will cause nothing but trouble. There are
so many picture applications which e.g. will only handle the first
comment segment and throw away the rest ...

If they throw away the rest, that would be fine, but some applications
will: "Unknown tag XXXX found in blabla.jpg, terminating".
What are .alb files?

Photo _alb_ums, a personal text file format that describes which
pictures make up an album and should be included in the web pages
generated (they specify background, title, subtitle and picture ranges).
I'm not exacly sure, that I understand everything you wrote, but it
seems to me, that most of this (including the html export, handling of
EXIF infos and thumbnails) can be done with Mapivi
(http://mapivi.de.vu).

Although I often prefer software that I have written myself (it does
_exactly_ what I want/need), I'll have a look.

Thanks.
 
S

Stefano Bettelli

Hi Gisle,

Il giorno Thu, 01 Jul 2004 09:18:44 -0700, Gisle Aas scrisse:
Could you name what existing Perl libraries you have
looked at and why they don't satisfy you?

the libraries and scripts which I looked at are listed in the
perldoc manpage of the module, together with a description and
my comments: "ExifTool" and "Image::ExifTool" by Phil Harvey,
"Image::IPTCInfo" by Josh Carter, "JPEG::JFIF" by Marcin
Krzyzanowski, "Image::Exif" by Sergey Prozhogin and "exiftags"
by Eric M. Johnston, "Image::Info" and "Image::TIFF" by you,
"exif" by Martin Krzywinski and "exifdump.py" by Thierry Bousch,
"exifprobe" by Duane H. Hesser, "libexif" by Lutz Müller,
"jpegrdf" by Norman Walsh and "OpenExif" by Eastman Kodak
Company [some of these are not written in Perl].
I'm the author of Image::Info which seems to already do a lot
of the same as you try to do. One difference is that I don't
plan to make Image::Info able to update the meta info. I think
that would complicate the module too much and I don't have
that need personally.

Actually, one of my goals is to be able to modify and rewrite
to disk almost all information I parse from APP* segments.
I read your library, but modifying it with this goal in mind
is not an easy task (not easier than starting from scratch, at
least). One other point is that the goal of your library is to
read a set of common tags from various graphic formats, while
I want to read/modify all tags from a specific graphic format
(namely JPEG, and maybe TIFF in the future).

What I could not see is how to integrate the "all" both on the
"format axis" and on the "tag axis". Do you think that there is
a possibility of integrating the two modules?

Bye,
Stefano
 
S

Stefano Bettelli

Hi,

Il giorno Thu, 01 Jul 2004 06:34:25 -0700, GreenLight scrisse:
I have been wanting to create a catalog of my photos
for quite some time.

this is exactly my problem :). I believe that in order to
manage a catalogue with little complication, one should be
able to enter his comments/additional info directly into the
image, and then use a perl script to generate dynamic web
pages with the required fields.

For the first task, I think that a winning combination is a
specialised library capable of parsing/modifying the JPEG
structure together with a GUI program allowing you to interact
with your photos more easily. Maybe you could have a look at
the following program:

http://herrmanns-stern.de/software/mapivi/mapivi.shtml

Bye,
Stefano
 
S

Stefano Bettelli

Il giorno Sat, 26 Jun 2004 19:57:26 +0200, Stefano Bettelli scrisse:
2) What is the best name for the module?
I am currently using Image::MetaInfo::JPEG.

Since I am getting paranoid about the correct name-space,
what do you think about the following:

Physics Meta-physics
Language Meta-Language

Meta is a prefix meaning (in current British English)
"at a level above". In our case, MetaInfo would imply that
there are Infos somewhere at a lower level. But this does
not appear to be the case. In fact, Gisle Aas' module is
name Image::Info, not Image::MetaInfo, and it obviously
refers to the same level as we do. But Image::Info::JPEG is
already used. So what about:

Image::MetaInfo --> Image::MetaData ?
 
M

Michael Meissner

l v said:
The closest I could get was using the Image::IPTCInfo module's SetAttribute,
AddKeyword, and Keywords functions to build my catalog. Although I would much
rather save the information in EXIF vs IPTC. I then use jhead (
http://www.sentex.net/~mwandel/jhead ) to autorotate the image which
automatically updates the EXIF orientation flag for me. ImageMagick is also
used to create my thumbnails (convert) and a contact sheet (montage)

If you are using jhead already, check out the -cl string option which allows
you to update the comment field. I use that for storing my copyright string in
all of my photos. I suspect there may be a way to set this under perl.
 
B

Ben Morrow

Quoth Stefano Bettelli said:
Il giorno Sat, 26 Jun 2004 19:57:26 +0200, Stefano Bettelli scrisse:

Since I am getting paranoid about the correct name-space,
what do you think about the following:

Physics Meta-physics
Language Meta-Language

Meta is a prefix meaning (in current British English)
"at a level above". In our case, MetaInfo would imply that
there are Infos somewhere at a lower level. But this does
not appear to be the case. In fact, Gisle Aas' module is
name Image::Info, not Image::MetaInfo, and it obviously
refers to the same level as we do. But Image::Info::JPEG is
already used. So what about:

Image::MetaInfo --> Image::MetaData ?

Yes, I would say that was good. 'Metainfo' is not an English expression;
and as you say, 'info' implies that it is describing the data without
the need for 'meta'.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top