Reading image dimensions with PIL

Will McGugan · May 17, 2005

Hi,

I'm writing an app that downloads images. It rejects images that are
under a certain size - whithout downloading them completely. I've
implemented this using PIL, by downloading the first K and trying to
create a PIL image with it. PIL raises an exception because the file is
incomplete, but the image object is initialised with the image
dimensions, which is what I need. It actualy works well enough, but I'm
concerened about side-effects - since it seems an unconventional way of
working with PIL. Can anyone see any problems with doing this? Or a
better method?

Thanks,

Will McGugan

Dave Brueck · May 17, 2005

Will said:
I'm writing an app that downloads images. It rejects images that are
under a certain size - whithout downloading them completely. I've
implemented this using PIL, by downloading the first K and trying to
create a PIL image with it. PIL raises an exception because the file is
incomplete, but the image object is initialised with the image
dimensions, which is what I need. It actualy works well enough, but I'm
concerened about side-effects - since it seems an unconventional way of
working with PIL. Can anyone see any problems with doing this? Or a
better method?

If you're tossing images that are too _small_, is there any benefit to not
downloading the whole image, checking it, and then throwing it away?

Checking just the first 1K probably won't save you too much time unless you're
over a modem. Are you using a byte-range HTTP request to pull down the images or
just a normal GET (via e.g. urllib)? If you're not using a byte-range request,
then all of the data is already on its way so maybe you could go ahead and get
it all.

But hey, if your current approach works...

It _is_ a bit unconventional, so
to reduce the risk you could test it on a decent mix of image types (normal
JPEG, progressive JPEG, normal & progressive GIF, png, etc.) - just to make sure
PIL is able to handle partial data for all different types you might encounter.

Also, if PIL can't handle the partial data, can you reliably detect that
scenario? If so, you could detect that case and use the
download-it-all-and-check approach as a failsafe.

-Dave

Will McGugan · May 18, 2005

Dave said:
If you're tossing images that are too _small_, is there any benefit to
not downloading the whole image, checking it, and then throwing it away?

Its a 'webscraper' app that downloads images based on search criteria.
The user may want only images above 640x480, although the general case
will be something like 200x200 to avoid downloading thumbnails

Checking just the first 1K probably won't save you too much time unless
you're over a modem. Are you using a byte-range HTTP request to pull
down the images or just a normal GET (via e.g. urllib)? If you're not
using a byte-range request, then all of the data is already on its way
so maybe you could go ahead and get it all.

I'm not familiar with byte-range requests. Is this a standard feature of
webservers? I know there will be more that one K in the pipeline if I do
a read, but if I close the file object from urllib it will stop the
download if there is data remaining - wont it?

But hey, if your current approach works... It _is_ a bit
unconventional, so to reduce the risk you could test it on a decent mix
of image types (normal JPEG, progressive JPEG, normal & progressive GIF,
png, etc.) - just to make sure PIL is able to handle partial data for
all different types you might encounter.

Also, if PIL can't handle the partial data, can you reliably detect that
scenario? If so, you could detect that case and use the
download-it-all-and-check approach as a failsafe.

The PIL code worked with most of the images I threw at it (just jpegs),
if there was no 'size' attribute then I just continue to download the
entire image. It may have caused a memory leak though, with this code in
memory usage increased continuously..

Actualy, this may all be moot now. Originally I looked at reading the
image dimensions from the jpeg header, but that turned out to be
non-trivial and I gave up. Fortunately I found some Perl code that does
it, and converted it to Python (and I dont even know Perl!). Here's the
code if anyone is interested..

import struct

def GetJpegSize(data):

idata = iter(data)

width = None
height = None

try:

B1 = ord(idata.next())
B2 = ord(idata.next())

if B1 != 0xFF or B2 != 0xD8:
return -1, -1

while True:

byte = ord(idata.next())

while byte != 0xFF:
byte = ord(idata.next())

while byte == 0xFF:
byte = ord(idata.next())

if byte >= 0xc0 and byte <= 0xc3:
idata.next()
idata.next()
idata.next()
height, width = struct.unpack( '>HH',
"".join(idata.next() for b in range(4)) )
break
else:
offset = struct.unpack('>H', idata.next() +
idata.next())[0] - 2
for _ in xrange(offset):
idata.next()

except StopIteration:
pass

return width, height

if __name__ == "__main__":

first_k = file("test.jpg","rb").read(1024)

print GetJpegSize(first_k)

Returns (-1, -1) for a non-jpeg, or (None, None) if the size wasn't
contained in the data supplied (some jpegs have embedded thumbnails), or
(width, height) if the dimensions were found.

And the original source: http://wiki.tcl.tk/757

Thanks,

Will

Fredrik Lundh · May 18, 2005

Will said:
I'm writing an app that downloads images. It rejects images that are
under a certain size - whithout downloading them completely. I've
implemented this using PIL, by downloading the first K and trying to
create a PIL image with it. PIL raises an exception because the file is
incomplete, but the image object is initialised with the image
dimensions, which is what I need. It actualy works well enough, but I'm
concerened about side-effects - since it seems an unconventional way of
working with PIL. Can anyone see any problems with doing this? Or a
better method?

the "right" way to do this is to use the ImageFile.Parser class. see the
last snippet on this page for an example:

http://effbot.org/zone/pil-image-size.htm

</F>

Will McGugan · May 18, 2005

Fredrik said:
the "right" way to do this is to use the ImageFile.Parser class. see the
last snippet on this page for an example:

http://effbot.org/zone/pil-image-size.htm

Excellent, thanks.

Will

PIL(Pillow) fails with PNG image	1	Dec 5, 2013
Update image in same window with, say, PIL	0	Jan 5, 2014
PIL questions: still supported? Problems on 2.7 for win? alternatives?	9	Sep 24, 2012
WCK and PIL	2	Feb 6, 2010
PIL : How to write array to image ???	6	Oct 3, 2009
Reading image dimensions before it is loaded from a web form usingpython.	0	Jun 30, 2007
PIL Error: "cannot read interlaced PNG files"	1	Sep 10, 2007
Resize image NO PIL!!	5	May 29, 2007

Reading image dimensions with PIL

Will McGugan

Dave Brueck

Will McGugan

Fredrik Lundh

Will McGugan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads