Debugging MongoDB Problems In PHP

Coming from the MySQL world, MongoDB presented some challenges to my workflow. With MySQL I usually would create a class or at least a function to pass my queries to so that if I had SQL problems, I had a single place to go and put a print statement to see the exact query I was trying to run.

It also gave me a single place to put a try/catch block with a debug_print_backtrace to figure out where the offending query was coming from. In my admitedly brief experience with MongoDB so far, Mongo doesn’t lend itself to such practices quite as easily.

With this project I’ve been working on at work we’ve got 34,000 lines of code. We have a global $mongo object that we run queries against. When a Mongo query chokes up our site I need to know what exact call has been made, with what arguments and coming from where.

So I wrote this shim, MongoLogger that would pretends to be Mongo. During development I replace the $mongo object with my MongoLogger object, like so:

$mongodb = new Mongo("mongodb://un:pw@host/db"); // Connect to Mongo
$mongo = $mongodb->db;                   // pick your database
$mongo = new MongoLogger($mongo); // Replace the $mongo

MongoLogger then uses the PHP magic methods __call and __get and the almost magic method call_user_func_array to do the needed logging, and to pass the arguments on to the real $mongo object.

Note: Magic Methods, an call_user_func_array in particular, are slow. Not recommended for production use!

This is simpler than what I wrote for work, but the concept is the same and the important bits are here. It’s what I wish I could’ve found when I first started worrying about how to debug MongoDB calls.

<?php

class MongoLogger{
    /**
     * @class MongoLogger
     *
     * @brief A debugging shim between your code and MongoDB. Should be disbaled for production
     *
     * Usage:
     * $mongodb = new Mongo("mongodb://un:pw@host/db"); // Connect to Mongo
     * $mongo = $mongodb->db;                           // pick your database
     * $mongo = new MongoLogger($mongo);                // and all calls made to the original $mongo just keep working...
     *
     * YAY MAGIC METHODS! http://php.net/manual/en/language.oop5.magic.php
     * Makes use of magic methods __construct, __call and __get and the almost magic method call_user_func_array.
     */

    var $collections = Array();

    function __construct($mongo){
    global $real_mongo;
    $real_mongo = $mongo;
    }

    public function __call($name,$args){
    global $real_mongo;

    error_log("Mongo->$name called with " . print_r($args,TRUE));

    try {
        $res = call_user_func_array(Array($real_mongo,$name),$args);
        return $res;
    } catch (Exception $e){
        debug_print_backtrace();
        error_log(print_r($e,TRUE));
        throw $e;
    }
    }

    /**
     * @brief Return a MongoLoggerCollection for the requested collection
     * This is the main function that gets called in this class. We just pretend that
     * we've got whatever we're asked for (just like Mongo does)
     *
     */
    public function __get($name){
    if(!array_key_exists($name,$this->collections)){ $this->collections[$name] = new MongoLoggerCollection($name); }
    return $this->collections[$name];
    }
}

class MongoLoggerCollection {
    /**
     * @class MongoLoggerCollection
     *
     * @brief Represents a collection in Mongo. Most of your calls will come through here.
     */
    var $collection;

    function __construct($name){
    $this->collection = $name;
    }

    public function __call($name,$args){
    global $real_mongo;

    // Logging!
    $debug = debug_backtrace();
    $caller = $debug[1];
    error_log($caller['file'] . ':' . $caller['line']);
    error_log("\$mongo->{$this->collection}->$name(");
    foreach($args as $arg){ error_log(print_r($arg,TRUE)); }
    error_log(")");

    try {
        $res = call_user_func_array(Array($real_mongo->{$this->collection},$name),$args);
        return $res;
    } catch (Exception $e){
        debug_print_backtrace();
        error_log(print_r($e,TRUE));
        throw $e;
    }
    }
}
Posted in Computers, Programming | Tagged , , , , , , , | Leave a comment

PHP Protocol Buffer to MySQL (and back!) bridge

Protocol Buffers are a binary data transfer protocol from Google. Google officially supports C++, Java and Python. There are 3rd party libraries that support other languages. I previously mentioned several that support PHP, including the one that we’re using at work, protoc-gen-php.

One challenge that we faced was storing our data. Should we store our data and convert to Protocol Buffers every time we sent it, or should we just work in Protocol Buffers and store it to the database directly?

We decided to store it in the database in a format compatible with the Protocol Buffer classes so we could easily access it as a Protocol Buffer object again later.

The following classes and scripts were written to help make that bridge between the Protocol Buffer classes generated by protoc-gen-php and MySQL.

They perform two main functions:

  1. Generate MySQL table create statements to build tables to hold the protocol buffer data
  2. Make classes which extend the protoc-gen-php classes with extra functions for database storage and retrieval (and a few bonuses)

 Generating The MySQL

We’ll start by generating some tables for our database. You’ll need php-cli installed, and you’ll need protoParser.php and protoMySQL.php in your PHP include path (or current directory) and makeMysql.php in your $PATH (or current directory).

Edit protoMySQL.php’s preferences (lines 17-36) to suit your configuration and needs.

Now something as simple as:

php ./makeMysql.php *.proto

should generate the MySQL table create statements you will need.

Generate The DB Classes

With protoc-gen-php, each .proto object gets a corresponding class. eg. list.proto.php is created from list.proto

makeClasses.php creates listDB.php which extends the classes generated by protoc-gen-php. Each proto object gets its own protoDB class and file.

php ./makeClasses.php *.proto

Those classes should then be used instead of their original non-database supporting proto classes.

DB Class Functions

__construct($id_or_object = NULL, $limit = PHP_INT_MAX)

$id_or_object
If it’s an object, we assume it’s a non-database variant, and use its members to populate this object.
If it’s numeric, we fetch the object from the database
Otherwise, we pass it up to the parent object.

$limit
Used in the parent constructor

get($id = NULL, $args = NULL)
$id
The database ID of the object to fetch.

$args
Enough MySQL arguments to uniquely identify the proto to retrieve.
If an array, each field is added to the query.
If a string, the string is appended to the query as is, after the WHERE clause.

Returns object if found, NULL if not found (or if multiples found)

unique()
Calls array_unique on any repeating elements.
Note: Arrays of objects are compared using their __toString methods

Returns nothing

load($object)

If you have an equivalent object (eg. a non-database or database version of the object) you can load it into the current object with this method.

Returns nothing

nullOrVal($val)
Determine if we should append NULL or an escaped string. Used in MySQL queries to ensure safe values.

Returns “NULL” or a mysql_real_escape’d string

delete()
Shallow delete from database. Since sub-objects could be shared/referenced by other proto objects this only deletes this object’s entry in the database

Returns nothing

put()
INSERT or UPDATE this object in the database.

Returns the insert ID of the object

toJSON($asArray = FALSE)
Returns a JSON representation of the current object, or an array appropriate for use in json_encode.

fromJSON($json)
Load the variables in the object from a JSON string

purge()
Like delete, but does delete referenced proto objects from database.
Returns number of sub-objects deleted.

License, Warranty and Support

My employers have been kind enough to let me release these scripts under and Open Source license. They are released under the GPL v.2 without any warranty.

We are actually switching away from MySQL on this particular project, and so these scripts are unlikely to receive any further updates.

I will provide such support as I have time for through the comments on this blog post.

Happy programming!

Posted in Computers, Digitization, Projects, Something Interesting | Leave a comment

Automatically Orient Scanned Photos With OpenCV

Most pictures taken these days are digital, and include information inside the picture’s exif tags about the correct orientation. For these types of pictures you can simply use exifautotran and you’re done. Easy Peasy.

What about pictures that don’t have exif info? Maybe you’ve got an old/cheap camera, scanned images, or another source that dosn’t include the orientation information. I’m in the middle of scanning about 5000 pictures, and I didn’t want to manually rotate all the images…so I started looking for solutions.

OpenCV: Open Source Computer Vision

I figured if I could detect a face in a photo, then it was right side up. I started searching for solutions and the Internet seemed to agree that the best free solution for facial detection was OpenCV, but I couldn’t find a script that did just what I wanted.

OpenCV “is a library of programming functions for real time computer vision.” It can be used in robotics, for doing cool stuff with web cams, motion detection, and more.  I was mainly interested in because of its facial recognition features.

Then I found a this script by Jo Vermeulen from way back in 2008. It’s made for use with webcams, and the particular script is just a toy. He does have another script which interacts with DBus to provide present/away status changes for your IM client. With just a little bit of work I was able to repurpose his script to do what I needed.

Whatsup: An OpenCV Python Script To Detect Correct Photo Orientation

Prerequisites:

  •  Install libcv2.1
  • Install python-opencv
  • Download one or more Haar Cascades and save them to /usr/local/share/ (more on this below)

The script I made is called whatsup.

Usage: whatsup [--debug] path_to_file

whatsup  returns the number of degrees it thinks your photo needs to be rotated to be right side up.  You can then use that number of degrees in whatever program you want, be it imagemagick, jpegtran, gd, or whatever. whatsup assumes that your image is squared up (not square) and only returns multiples of 90 (0,90,180,270).

Using the –debug option will show the image with a box around the feature that gave the whatsup the final result.

#!/usr/bin/env python

# This script reads in a file and tries to determine which orientation is correct
# by looking for faces in the photos
# It starts with the existing orientation, then rotates it 90 degrees at a time until
# it has either tried all 4 directions or until it finds a face

# INSTALL: Put the xml files in /usr/local/share, or change the script. Put whatsup somewhere in your path

# Usage: whatsup [--debug] filename
# Returns the number of degrees it should be rotated clockwise to orient the faces correctly

# Some code came from here: http://blog.jozilla.net/2008/06/27/fun-with-python-opencv-and-face-detection/
# The rest was cobbled together by me from the documentation here [1] and from snippets and samples found via Google
# [1] http://opencv.willowgarage.com/documentation/python/core_operations_on_arrays.html#createmat

import sys
import os
import cv

def detectFaces(small_img,loadedCascade):
    tries = 0 # 4 shots at getting faces. 

    while tries < 4:
	faces = cv.HaarDetectObjects(small_img, loadedCascade, cv.CreateMemStorage(0), scale_factor =1.2, min_neighbors =2, flags =cv.CV_HAAR_DO_CANNY_PRUNING)
	if(len(faces) > 0):
	    if(sys.argv[1] == '--debug'):
		for i in faces:
		    cv.Rectangle(small_img, (i[0][0],i[0][1]),(i[0][0] + i[0][2],i[0][1] + i[0][3]), cv.RGB(255,255,255), 3, 8, 0)
		cv.NamedWindow("Faces")
		cv.ShowImage("Faces",small_img)
		cv.WaitKey(1000)
	    return tries * 90

	# The rotation routine:
	tmp_mat = cv.GetMat(small_img)
	tmp_dst_mat = cv.CreateMat(tmp_mat.cols,tmp_mat.rows,cv.CV_8UC1) # Create a Mat that is rotated 90 degrees in size (3x4 becomes 4x3)
	dst_mat = cv.CreateMat(tmp_mat.cols,tmp_mat.rows,cv.CV_8UC1) # Create a Mat that is rotated 90 degrees in size (3x4 becomes 4x3)

	# To rotate 90 clockwise, we transpose, then flip on Y axis
	cv.Transpose(small_img,tmp_dst_mat) # Transpose it
	cv.Flip(tmp_dst_mat,dst_mat,flipMode=1) # flip it

	# put it back in small_img so we can try to detect faces again
	small_img = cv.GetImage(dst_mat)
	tries = tries + 1
    return False 

# Detect which side of the photo is brightest. Hopefully it will be the sky.
def detectBrightest(image):
    image_scale = 4 # This scale factor doesn't matter much. It just gives us less pixels to iterate over later
    newsize = (cv.Round(image.width/image_scale), cv.Round(image.height/image_scale)) # find new size
    small_img = cv.CreateImage(newsize, 8, 1)
    cv.Resize( image, small_img, cv.CV_INTER_LINEAR )

    # Take the top 1/3, right 1/3, etc. to compare for brightness
    width = small_img.width
    height = small_img.height
    top = small_img[0:height/3,0:width]
    right = small_img[0:height,(width/3*2):width]
    left = small_img[0:height,0:width/3]
    bottom = small_img[(height/3*2):height,0:height]

    sides = {'top':top,'left':left,'bottom':bottom,'right':right}

    # Find the brightest side
    greatest = 0
    winning = 'top'
    for name in sides:
	sidelum = 0
	side = sides[name]
	for x in range(side.rows - 1):
	    for y in range(side.cols - 1):
		sidelum = sidelum + side[x,y]
	sidelum = sidelum/(side.rows*side.cols)
	if sidelum > greatest:
	    winning = name

    if(sys.argv[1] == '--debug'):
	if winning == 'top':
	    first = (0,0)
	    second = (width,height/3)
	elif winning == 'left':
	    first = (0,0)
	    second = (width/3,height)
	elif winning == 'bottom':
	    first = (0,(height/3*2))
	    second = (width,height)
	elif winning == 'right':
	    first = ((width/3*2),0)
	    second = (width,height)
	cv.Rectangle(small_img,first,second,cv.RGB(125,125,125),3,8,0)
	cv.NamedWindow("Faces")
	cv.ShowImage("Faces",small_img)
	cv.WaitKey(3000)

    returns = {'top':0,'left':90,'bottom':180,'right':270}

    # return the winner
    if sys.argv[1] == '--debug':
	print "The " + winning + " side was the brightest!"
    return returns[winning]

# Try a couple different detection methods
def trydetect():
    # Load some things that we'll use during each loop so we don't keep re-creating them
    grayscale = cv.LoadImageM(os.path.abspath(sys.argv[-1]),cv.CV_LOAD_IMAGE_GRAYSCALE) # the image itself

    # Get more at: https://code.ros.org/svn/opencv/tags/latest_tested_snapshot/opencv/data/haarcascades/
    cascades = ( # Listed in order most likely to appear in a photo
	    '/usr/local/share/haarcascade_frontalface_alt.xml',
	    '/usr/local/share/haarcascade_profileface.xml',
	    '/usr/local/share/haarcascade_fullbody.xml',
	    )

    for cascade in cascades:
	loadedCascade = cv.Load(cascade)
	image_scale = 4
	while image_scale > 0: # Try 4 different sizes of our photo
	    newsize = (cv.Round(grayscale.width/image_scale), cv.Round(grayscale.height/image_scale)) # find new size
	    small_img = cv.CreateImage(newsize, 8, 1 )
	    cv.Resize( grayscale, small_img, cv.CV_INTER_LINEAR )
	    returnme = detectFaces(small_img,loadedCascade)
	    if returnme is not False:
		return returnme

	    image_scale = image_scale - 1
    return detectBrightest(grayscale) # no faces found, use the brightest side for orientation instead

# Usage Check
if ((len(sys.argv) != 2 and len(sys.argv) != 3) or (len(sys.argv) == 3 and sys.argv[1] != '--debug')):
    print "USAGE: whatsup [--debug] filename"
    sys.exit(-1)

# Sanity check
if not os.path.isfile(sys.argv[-1]):
    print "File '" + sys.argv[-1] + "' does not exist"
    sys.exit(-1)

# Make it happen
print str(trydetect()),

About OpenCV Feature Detection

In order to detect features, like faces, OpenCV needs to be trained. You then use the training file, called a Haar Cascade, to define the detection. OpenCV provides lots of different ready training files here.

To use whatsup you’ll need to download one or more harrcascade*.xml files and put them in /usr/local/share (or edit whatsup to point at the place you decide to save them).

What you need to be aware of is that detection works best at the resolution the training file was created for. So  if you’re using haarcascade_frontalface_default.xml then you want to be giving the detection 24×24 pixel faces.

In order for this to happen, whatsup tries scaling the images to different sizes and tries detecting faces in those different sizes. I am using scans that are roughly 1200×800 and so I start with a scaling factor of 4 so that the first image tried is 1/4th the size of the original. If your images are larger then you probably need to start with a larger scaling factor.

What If No Faces/Features Are Detected

If no features are detected in any of the image sizes, then whatsup determines which side of the photo is brightest, and returns the number of degrees needed to rotate the brightest side upwards.

The assumption is that if no people are found, then maybe it’s a landscape photo and the sky should go at the top.

There are lots of times where this is incorrect (eg. a lit ski hill at night), but for my photos it will be true more times than not

How Accurate Is It / Improving Accuracy

I am getting better than 80% accuracy, but probably not 90%.

The better you know your photos the smarter you can make the script for your use.  If you choose Haar Cascade files that are more applicable to your photo set you are less likely to get false positive.

If you know what sizes your faces typically are you can choose appropriate scales or order the image scaling to happen in the most likely order. You could even make your own hasscascade.xml files if you have certain features you want to look for.

Whatsup in Daily Use

I have actually incorporated whatsup into a script that gets run every time I scan something on my scanner, but this is the bash script I used for testing and developing it.

Make sure whatsup is in your path, and that you have jpegtran and jpegexiforient installed. Jpegtran does lossless jpeg rotations, jpegexiforient sets that missing Exif flag that lets programs know which way to display a photo.

Save this script in the same directory as your jpegs you want to test this on and run it.

#!/bin/bash
for i in saved/*
do
    echo -n "Processing $i : "
    degrees=`whatsup $i`
    if [ $degrees -gt 0 ]
    then
	echo $degrees
	cp $i /tmp/tmp.jpg
        jpegtran -rotate $degrees /tmp/tmp.jpg > $i
	jpegexiforient -1 $i
	sleep 1
    else
	echo ""
    fi
done

Disclaimers

While whatsup doesn’t modify your photo, any program you would use it with does, including the bash script above. Please make responsible use of backups and testing as I disclaim any liability for any lost data.

I’m not a pro python coder, so the script could probably be optimized somehow.

Posted in Computers, Programming | Tagged , , , | Leave a comment