Finding MIME type for a data stream

T

Tobiah

I'm pulling image data from a database blob, and serving
it from a web2py app. I have to send the correct
Content-Type header, so I need to detect the image type.

Everything that I've found on the web so far, needs a file
name on the disk, but I only have the data.

It looks like the 'magic' package might be of use, but
I can't find any documentation for it.

Also, it seems like image/png works for other types
of image data, while image/foo does not, yet I'm afraid
that not every browser will play along as nicely.

Thanks!

Tobiah
 
D

Dave Angel

I'm pulling image data from a database blob, and serving
it from a web2py app. I have to send the correct
Content-Type header, so I need to detect the image type.

Everything that I've found on the web so far, needs a file
name on the disk, but I only have the data.

It looks like the 'magic' package might be of use, but
I can't find any documentation for it.

Also, it seems like image/png works for other types
of image data, while image/foo does not, yet I'm afraid
that not every browser will play along as nicely.

Thanks!

Tobiah

First step, ask the authors of the database what format of data this
blob is in.

Failing that, write the same data locally as a binary file, and see what
application can open it. Or if you're on a Linux system, run file on
it. "file" can identify most data formats (not just images) just by
looking at the data.

That assumes, of course, that there's any consistency in the data coming
out of the database. What happens if next time this blob is an Excel
spreadsheet?
 
T

Tobiah

First step, ask the authors of the database what format of data this
blob is in.

Failing that, write the same data locally as a binary file, and see what
application can open it. Or if you're on a Linux system, run file on
it. "file" can identify most data formats (not just images) just by
looking at the data.

That assumes, of course, that there's any consistency in the data coming
out of the database. What happens if next time this blob is an Excel
spreadsheet?


I should simplify my question. Let's say I have a string
that contains image data called 'mystring'.

I want to do

mime_type = some_magic(mystring)

and get back 'image/jpg' or 'image/png' or whatever is
appropriate for the image data.

Thanks!

Tobiah
 
T

Tobiah

Also, I realize that I could write the data to a file
and then use one of the modules that want a file path.
I would prefer not to do that.

Thanks
 
D

Dave Angel

<snip>


I should simplify my question. Let's say I have a string
that contains image data called 'mystring'.

I want to do

mime_type = some_magic(mystring)

and get back 'image/jpg' or 'image/png' or whatever is
appropriate for the image data.

Thanks!

Tobiah

I have to assume you're talking python 2, since in python 3, strings
cannot generally contain image data. In python 2, characters are pretty
much interchangeable with bytes.

Anyway, I don't know any way in the standard lib to distinguish
arbitrary image formats. (There very well could be one.) The file
program I referred to was an external utility, which you could run with
the multiprocessing module.

if you're looking for a specific, small list of file formats, you could
make yourself a signature list. Most (not all) formats distinguish
themselves in the first few bytes. For example, a standard zip file
starts with "PK" for Phil Katz. A Windows exe starts with "MZ" for
Mark Zbikowsky. And I believe a jpeg file starts hex(d8) (ff) (e0) (ff)

If you'd like to see a list of available modules, help() is your
friend. You can start with help("modules") to see quite a long list.
And I was surprised how many image related things already are there. So
maybe there's something I don't know about that could help.
 
T

Tobiah

I have to assume you're talking python 2, since in python 3, strings
cannot generally contain image data. In python 2, characters are pretty
much interchangeable with bytes.

Yeah, python 2

if you're looking for a specific, small list of file formats, you could
make yourself a signature list. Most (not all) formats distinguish
themselves in the first few bytes.

Yeah, maybe I'll just do that. I'm alowing users to paste
images into a rich-text editor, so I'm pretty much looking
at .png, .gif, or .jpg. Those should be pretty easy to
distinguish by looking at the first few bytes.

Pasting images may sound weird, but I'm using a jquery
widget called cleditor that takes image data from the
clipboard and replaces it with inline base64 data.
The html from the editor ends up as an email, and the
inline images cause the emails to be tossed in the
spam folder for most people. So I'm parsing the
emails, storing the image data, and replacing the
inline images with an img tag that points to a
web2py app that takes arguments that tell it which
image to pull from the database.

Now that I think of it, I could use php to detect the
image type, and store that in the database. Not quite
as clean, but that would work.

Tobiah
 
D

Dennis Lee Bieber

Pasting images may sound weird, but I'm using a jquery
widget called cleditor that takes image data from the
clipboard and replaces it with inline base64 data.

In Windows, I'd expect "device independent bitmap" to be the result
of a clipboard image...
 
I

Irmen de Jong

Also, I realize that I could write the data to a file
and then use one of the modules that want a file path.
I would prefer not to do that.

Thanks

Use StringIO then, instead of a file on disk

Irmen
 
J

Jon Clements

Yeah, python 2



Yeah, maybe I'll just do that. I'm alowing users to paste
images into a rich-text editor, so I'm pretty much looking
at .png, .gif, or .jpg. Those should be pretty easy to
distinguish by looking at the first few bytes.

Pasting images may sound weird, but I'm using a jquery
widget called cleditor that takes image data from the
clipboard and replaces it with inline base64 data.
The html from the editor ends up as an email, and the
inline images cause the emails to be tossed in the
spam folder for most people. So I'm parsing the
emails, storing the image data, and replacing the
inline images with an img tag that points to a
web2py app that takes arguments that tell it which
image to pull from the database.

Now that I think of it, I could use php to detect the
image type, and store that in the database. Not quite
as clean, but that would work.

Tobiah

Something like the following might be worth a go:
(untested)

from PIL import Image
img = Image.open(StringIO(blob))
print img.format

HTH
Jon.

PIL: http://www.pythonware.com/library/pil/handbook/image.htm
 
P

Peter Otten

Tobiah said:
I'm pulling image data from a database blob, and serving
it from a web2py app. I have to send the correct
Content-Type header, so I need to detect the image type.

Everything that I've found on the web so far, needs a file
name on the disk, but I only have the data.

It looks like the 'magic' package might be of use, but
I can't find any documentation for it.

After some try-and-error and a look into example.py:
'image/png'
 
T

Tobiah

In Windows, I'd expect "device independent bitmap" to be the result
of a clipboard image...

This jquery editor seems to detect the image data and
translate it into an inline image like:

<img src="...

I'm parsing those out with regular expressions and decoding
the base64, and putting the resulting image data into a blob.
Hmm... there's the mime type right there.
 
T

Tobiah

Something like the following might be worth a go:
(untested)

from PIL import Image
img = Image.open(StringIO(blob))
print img.format

This worked quite nicely. I didn't
see a list of all returned formats though
in the docs. The one image I had returned

PNG

So I'm doing:

mime_type = "image/%s" % img.format.lower()

I'm hoping that will work for any image type.

Thanks,

Tobiah
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top