Most pictures taken these days are digital, and include information inside the picture’s exif tags about the correct orientation. For these types of pictures you can simply use exifautotran and you’re done. Easy Peasy.
What about pictures that don’t have exif info? Maybe you’ve got an old/cheap camera, scanned images, or another source that dosn’t include the orientation information. I’m in the middle of scanning about 5000 pictures, and I didn’t want to manually rotate all the images…so I started looking for solutions.
OpenCV: Open Source Computer Vision
I figured if I could detect a face in a photo, then it was right side up. I started searching for solutions and the Internet seemed to agree that the best free solution for facial detection was OpenCV, but I couldn’t find a script that did just what I wanted.
OpenCV “is a library of programming functions for real time computer vision.” It can be used in robotics, for doing cool stuff with web cams, motion detection, and more. I was mainly interested in because of its facial recognition features.
Then I found a this script by Jo Vermeulen from way back in 2008. It’s made for use with webcams, and the particular script is just a toy. He does have another script which interacts with DBus to provide present/away status changes for your IM client. With just a little bit of work I was able to repurpose his script to do what I needed.
Whatsup: An OpenCV Python Script To Detect Correct Photo Orientation
Prerequisites:
- Install libcv2.1
- Install python-opencv
- Download one or more Haar Cascades and save them to /usr/local/share/ (more on this below)
The script I made is called whatsup.
Usage: whatsup [--debug] path_to_file
whatsup returns the number of degrees it thinks your photo needs to be rotated to be right side up. You can then use that number of degrees in whatever program you want, be it imagemagick, jpegtran, gd, or whatever. whatsup assumes that your image is squared up (not square) and only returns multiples of 90 (0,90,180,270).
Using the –debug option will show the image with a box around the feature that gave the whatsup the final result.
#!/usr/bin/env python
# This script reads in a file and tries to determine which orientation is correct
# by looking for faces in the photos
# It starts with the existing orientation, then rotates it 90 degrees at a time until
# it has either tried all 4 directions or until it finds a face
# INSTALL: Put the xml files in /usr/local/share, or change the script. Put whatsup somewhere in your path
# Usage: whatsup [--debug] filename
# Returns the number of degrees it should be rotated clockwise to orient the faces correctly
# Some code came from here: http://blog.jozilla.net/2008/06/27/fun-with-python-opencv-and-face-detection/
# The rest was cobbled together by me from the documentation here [1] and from snippets and samples found via Google
# [1] http://opencv.willowgarage.com/documentation/python/core_operations_on_arrays.html#createmat
import sys
import os
import cv
def detectFaces(small_img,loadedCascade):
tries = 0 # 4 shots at getting faces.
while tries < 4:
faces = cv.HaarDetectObjects(small_img, loadedCascade, cv.CreateMemStorage(0), scale_factor =1.2, min_neighbors =2, flags =cv.CV_HAAR_DO_CANNY_PRUNING)
if(len(faces) > 0):
if(sys.argv[1] == '--debug'):
for i in faces:
cv.Rectangle(small_img, (i[0][0],i[0][1]),(i[0][0] + i[0][2],i[0][1] + i[0][3]), cv.RGB(255,255,255), 3, 8, 0)
cv.NamedWindow("Faces")
cv.ShowImage("Faces",small_img)
cv.WaitKey(1000)
return tries * 90
# The rotation routine:
tmp_mat = cv.GetMat(small_img)
tmp_dst_mat = cv.CreateMat(tmp_mat.cols,tmp_mat.rows,cv.CV_8UC1) # Create a Mat that is rotated 90 degrees in size (3x4 becomes 4x3)
dst_mat = cv.CreateMat(tmp_mat.cols,tmp_mat.rows,cv.CV_8UC1) # Create a Mat that is rotated 90 degrees in size (3x4 becomes 4x3)
# To rotate 90 clockwise, we transpose, then flip on Y axis
cv.Transpose(small_img,tmp_dst_mat) # Transpose it
cv.Flip(tmp_dst_mat,dst_mat,flipMode=1) # flip it
# put it back in small_img so we can try to detect faces again
small_img = cv.GetImage(dst_mat)
tries = tries + 1
return False
# Detect which side of the photo is brightest. Hopefully it will be the sky.
def detectBrightest(image):
image_scale = 4 # This scale factor doesn't matter much. It just gives us less pixels to iterate over later
newsize = (cv.Round(image.width/image_scale), cv.Round(image.height/image_scale)) # find new size
small_img = cv.CreateImage(newsize, 8, 1)
cv.Resize( image, small_img, cv.CV_INTER_LINEAR )
# Take the top 1/3, right 1/3, etc. to compare for brightness
width = small_img.width
height = small_img.height
top = small_img[0:height/3,0:width]
right = small_img[0:height,(width/3*2):width]
left = small_img[0:height,0:width/3]
bottom = small_img[(height/3*2):height,0:height]
sides = {'top':top,'left':left,'bottom':bottom,'right':right}
# Find the brightest side
greatest = 0
winning = 'top'
for name in sides:
sidelum = 0
side = sides[name]
for x in range(side.rows - 1):
for y in range(side.cols - 1):
sidelum = sidelum + side[x,y]
sidelum = sidelum/(side.rows*side.cols)
if sidelum > greatest:
winning = name
if(sys.argv[1] == '--debug'):
if winning == 'top':
first = (0,0)
second = (width,height/3)
elif winning == 'left':
first = (0,0)
second = (width/3,height)
elif winning == 'bottom':
first = (0,(height/3*2))
second = (width,height)
elif winning == 'right':
first = ((width/3*2),0)
second = (width,height)
cv.Rectangle(small_img,first,second,cv.RGB(125,125,125),3,8,0)
cv.NamedWindow("Faces")
cv.ShowImage("Faces",small_img)
cv.WaitKey(3000)
returns = {'top':0,'left':90,'bottom':180,'right':270}
# return the winner
if sys.argv[1] == '--debug':
print "The " + winning + " side was the brightest!"
return returns[winning]
# Try a couple different detection methods
def trydetect():
# Load some things that we'll use during each loop so we don't keep re-creating them
grayscale = cv.LoadImageM(os.path.abspath(sys.argv[-1]),cv.CV_LOAD_IMAGE_GRAYSCALE) # the image itself
# Get more at: https://code.ros.org/svn/opencv/tags/latest_tested_snapshot/opencv/data/haarcascades/
cascades = ( # Listed in order most likely to appear in a photo
'/usr/local/share/haarcascade_frontalface_alt.xml',
'/usr/local/share/haarcascade_profileface.xml',
'/usr/local/share/haarcascade_fullbody.xml',
)
for cascade in cascades:
loadedCascade = cv.Load(cascade)
image_scale = 4
while image_scale > 0: # Try 4 different sizes of our photo
newsize = (cv.Round(grayscale.width/image_scale), cv.Round(grayscale.height/image_scale)) # find new size
small_img = cv.CreateImage(newsize, 8, 1 )
cv.Resize( grayscale, small_img, cv.CV_INTER_LINEAR )
returnme = detectFaces(small_img,loadedCascade)
if returnme is not False:
return returnme
image_scale = image_scale - 1
return detectBrightest(grayscale) # no faces found, use the brightest side for orientation instead
# Usage Check
if ((len(sys.argv) != 2 and len(sys.argv) != 3) or (len(sys.argv) == 3 and sys.argv[1] != '--debug')):
print "USAGE: whatsup [--debug] filename"
sys.exit(-1)
# Sanity check
if not os.path.isfile(sys.argv[-1]):
print "File '" + sys.argv[-1] + "' does not exist"
sys.exit(-1)
# Make it happen
print str(trydetect()),
About OpenCV Feature Detection
In order to detect features, like faces, OpenCV needs to be trained. You then use the training file, called a Haar Cascade, to define the detection. OpenCV provides lots of different ready training files here.
To use whatsup you’ll need to download one or more harrcascade*.xml files and put them in /usr/local/share (or edit whatsup to point at the place you decide to save them).
What you need to be aware of is that detection works best at the resolution the training file was created for. So if you’re using haarcascade_frontalface_default.xml then you want to be giving the detection 24×24 pixel faces.
In order for this to happen, whatsup tries scaling the images to different sizes and tries detecting faces in those different sizes. I am using scans that are roughly 1200×800 and so I start with a scaling factor of 4 so that the first image tried is 1/4th the size of the original. If your images are larger then you probably need to start with a larger scaling factor.
What If No Faces/Features Are Detected
If no features are detected in any of the image sizes, then whatsup determines which side of the photo is brightest, and returns the number of degrees needed to rotate the brightest side upwards.
The assumption is that if no people are found, then maybe it’s a landscape photo and the sky should go at the top.
There are lots of times where this is incorrect (eg. a lit ski hill at night), but for my photos it will be true more times than not
How Accurate Is It / Improving Accuracy
I am getting better than 80% accuracy, but probably not 90%.
The better you know your photos the smarter you can make the script for your use. If you choose Haar Cascade files that are more applicable to your photo set you are less likely to get false positive.
If you know what sizes your faces typically are you can choose appropriate scales or order the image scaling to happen in the most likely order. You could even make your own hasscascade.xml files if you have certain features you want to look for.
Whatsup in Daily Use
I have actually incorporated whatsup into a script that gets run every time I scan something on my scanner, but this is the bash script I used for testing and developing it.
Make sure whatsup is in your path, and that you have jpegtran and jpegexiforient installed. Jpegtran does lossless jpeg rotations, jpegexiforient sets that missing Exif flag that lets programs know which way to display a photo.
Save this script in the same directory as your jpegs you want to test this on and run it.
#!/bin/bash
for i in saved/*
do
echo -n "Processing $i : "
degrees=`whatsup $i`
if [ $degrees -gt 0 ]
then
echo $degrees
cp $i /tmp/tmp.jpg
jpegtran -rotate $degrees /tmp/tmp.jpg > $i
jpegexiforient -1 $i
sleep 1
else
echo ""
fi
done
Disclaimers
While whatsup doesn’t modify your photo, any program you would use it with does, including the bash script above. Please make responsible use of backups and testing as I disclaim any liability for any lost data.
I’m not a pro python coder, so the script could probably be optimized somehow.