P
Philip Tucker
Problem:
I'm implementing an active vision image analyzer using artificial
neural networks. By "active vision", I mean the neural net is able to
view an image and control vertical/horizontal movement, zoom, and
rotation to move about the image. The neural net has a certain
perspective defined by x,y coordinates, a zoom factor, and a rotation
angle, that determine the area of the image it is able to see. This
visible area - aka, it's "eye" - is much smaller than the entire
image. e.g., I might have a 500x500 image and a 10x10 eye. If the
eye is zoomed out, it can see a blurry version of the entire image.
If it is zoomed all the way in, it can see pixel-for-pixel a 10x10
area of the image.
My First Solution:
I'm using the Java imaging API to represent the original image
(sourceImage) and eye image (eyeImage) as BufferedImage objects. I
call eyeImage.getRGB() to get the image data I input to the neural
network. Given the original image and the location, zoom, and
rotation of the neural net viewer, I compute the eye image. I used
AffineTransform and AffineTransformOp to translate (i.e., move x,y),
scale, and rotate the image. Then, I called
BufferedImage.getSubImage() to crop the result and produce my eye
image.
This all worked okay, except the scaling wasn't very satisfactory.
TYPE_BILINEAR was better than TYPE_NEAREST_NEIGHBOR, but still lost
too much information when zooomed out. TYPE_BICUBIC was better, but
still not quite what I wanted.
My second solution:
Same as the first, except I use BufferedImage.getScaledInstance() with
SCALE_AREA_AVERAGING option instead of AffineTransform to do the
scaling. This gives me the kind of image I want, but since this
produces an Image instead of a BufferedImage, I have to write it to
eyeImage via Graphics2D. This gives me the result I want, but I'm
worried it's not very efficient.
In summary, my process to transform sourceImage into eyeImage is:
// translate
AffineTransform transform =
AffineTransform.getTranslateInstance( x, y );
// rotate
transform.preConcatenate(
AffineTransform.getRotateInstance( theta, anchorX, anchorY ) );
// transform (translate, rotate)
AffineTransformOp transformOp = new AffineTransformOp(
transform, AffineTransformOp.TYPE_NEAREST_NEIGHBOR );
BufferedImage postTransform =
transformOp.filter( sourceImage, null );
// crop
BufferedImage postCrop =
postTransform.getSubimage( xStart, yStart, xSize, ySize );
// scale and update eyeImage
eyeImage = new BufferedImage(
eyeWidth, eyeHeight, BufferedImage.TYPE_INT_ARGB);
Graphics2D g = eyeImage.createGraphics();
g.drawImage( postCrop.getScaledInstance( eyeHeight, eyeWidth,
Image.SCALE_AREA_AVERAGING ), 0, 0, null );
g.dispose();
My question:
Is there a more efficient way of doing this? Every time the viewer
moves and I have to update eyeImage, I'm creating 7 new objects (2
AffineTransform, 1 AffineTransformOp, 3 BufferedImage, 1 Graphics2D),
performing 3 separate image transformations (transformOp.filter(),
postTransform.getSubimage(), postCrop.getScaledInstance()), and 1
image copy (g.drawImage()). It seems like there should be an easier
way.
Thanks in advance for any help,
Philip
I'm implementing an active vision image analyzer using artificial
neural networks. By "active vision", I mean the neural net is able to
view an image and control vertical/horizontal movement, zoom, and
rotation to move about the image. The neural net has a certain
perspective defined by x,y coordinates, a zoom factor, and a rotation
angle, that determine the area of the image it is able to see. This
visible area - aka, it's "eye" - is much smaller than the entire
image. e.g., I might have a 500x500 image and a 10x10 eye. If the
eye is zoomed out, it can see a blurry version of the entire image.
If it is zoomed all the way in, it can see pixel-for-pixel a 10x10
area of the image.
My First Solution:
I'm using the Java imaging API to represent the original image
(sourceImage) and eye image (eyeImage) as BufferedImage objects. I
call eyeImage.getRGB() to get the image data I input to the neural
network. Given the original image and the location, zoom, and
rotation of the neural net viewer, I compute the eye image. I used
AffineTransform and AffineTransformOp to translate (i.e., move x,y),
scale, and rotate the image. Then, I called
BufferedImage.getSubImage() to crop the result and produce my eye
image.
This all worked okay, except the scaling wasn't very satisfactory.
TYPE_BILINEAR was better than TYPE_NEAREST_NEIGHBOR, but still lost
too much information when zooomed out. TYPE_BICUBIC was better, but
still not quite what I wanted.
My second solution:
Same as the first, except I use BufferedImage.getScaledInstance() with
SCALE_AREA_AVERAGING option instead of AffineTransform to do the
scaling. This gives me the kind of image I want, but since this
produces an Image instead of a BufferedImage, I have to write it to
eyeImage via Graphics2D. This gives me the result I want, but I'm
worried it's not very efficient.
In summary, my process to transform sourceImage into eyeImage is:
// translate
AffineTransform transform =
AffineTransform.getTranslateInstance( x, y );
// rotate
transform.preConcatenate(
AffineTransform.getRotateInstance( theta, anchorX, anchorY ) );
// transform (translate, rotate)
AffineTransformOp transformOp = new AffineTransformOp(
transform, AffineTransformOp.TYPE_NEAREST_NEIGHBOR );
BufferedImage postTransform =
transformOp.filter( sourceImage, null );
// crop
BufferedImage postCrop =
postTransform.getSubimage( xStart, yStart, xSize, ySize );
// scale and update eyeImage
eyeImage = new BufferedImage(
eyeWidth, eyeHeight, BufferedImage.TYPE_INT_ARGB);
Graphics2D g = eyeImage.createGraphics();
g.drawImage( postCrop.getScaledInstance( eyeHeight, eyeWidth,
Image.SCALE_AREA_AVERAGING ), 0, 0, null );
g.dispose();
My question:
Is there a more efficient way of doing this? Every time the viewer
moves and I have to update eyeImage, I'm creating 7 new objects (2
AffineTransform, 1 AffineTransformOp, 3 BufferedImage, 1 Graphics2D),
performing 3 separate image transformations (transformOp.filter(),
postTransform.getSubimage(), postCrop.getScaledInstance()), and 1
image copy (g.drawImage()). It seems like there should be an easier
way.
Thanks in advance for any help,
Philip