I’m Looking for a Microfilm Digitization Quote

I’m looking for a microfilm digitization quote. If you or someone you know provides microfilm digitization, please have them send me a quote.

I’ve got 113 reels of microfilm I’d like to digitize and I’m looking for a ballpark estimate for the project.

Here’s the info I know, please let me know if you need anything else:

  • There are an average of 600 images per reel (about 67,800 images)
  • I’d like to scan at 300 dpi, 8bit grayscale lossless images (tiff? png?)
  • I have the copyright on the images on these reels
  • The reels are lightly used duplicates of the master reels. The master reels are unfortunately unavailable
  • The images are all scanned newspapers
  • I don’t need any OCR done
  • The only metadata I need is which reel each of the images came from eg. One directory per reel with incremental file names would be just fine.

Project Background: I would like to put 128 years of Iron County Miner newspaper archives online. They would be freely available (no subscription or account required) and there’s no plan to make money from them. Since there’s no revenue expected I’m looking for ways to reduce costs while still putting something out there to benefit genealogists and historians.

The master rolls are held by the Wisconsin Historical Society who wants nearly $10,000 ($0.145/image) for the project or $80 per reel to send us fresh copies of the reels. From their perspective, I think that’s probably fair; they aren’t in the digitization business and they probably aren’t set up to do this sort of project in a streamlined manner. They also can’t amortize their digitization equipment costs across so many clients as a commercial company can.

For me though, $10,000 means that I can’t pursue this project right now.

Most digitization companies I have contacted have been reluctant to provide even a ballpark quote without seeing test reels, and I understand that that is a factor. Right now though, I just need a gauge to determine if this project is viable. If $10,000 is the real cost for this sort of project it will have to wait till I’m rich, but if I can get a cheaper quote I hope to make it happen this summer.

Pre-Announcing NewspaperCMS

I have been working on a CMS (Content Management System) called NewspaperCMS, to host the scanned images with and to make them easily navigable. It is licensed under the GPLv2 so anybody needing to host newspaper archives can use it.

Here’s its page on Google Code: http://code.google.com/p/newspapercms/

I would classify it as in late Alpha or early Beta stages right now. I’ll do an official post on it as it matures and as I get a publicly accessible test site set up. As a teaser, features include:

  • Browse collection by microfilm, newspaper or date
    • Drill down within those categories by newspaper, issue, year or month
  • Access-driven generation of midsized images. No need to generate 60,000 midsized images ahead of time.
  • Valid HTML5/CSS3
  • HTML5/Canvas based client-side image viewer. The user can zoom, rotate, invert, sharpen and change the contrast of the image (uses the http://www.pixastic.com/JavaScript libraries)
    • Falls back to a static image if they don’t have Canvas or JavaScript support
  • Built in search engine
  • Support for the tesseract OCR engine

As I said, it’s still in development, but if you need something like it, you can play with it now. It’s at the point where more development doesn’t make sense until I know I can get the microfilms scanned.

Posted in Computers, Digitization, Programming, Projects, Something Interesting | Leave a comment

Create a Side-to-Side Draggable HTML5 Canvas in a Div

I have been playing with Pixastic and a little bit of HTML5 canvas image manipulation for a site I’m working on. I load an image into an HTML5 canvas and let the user do some basic manipulation, including zooming in on the image.

Zooming in on the image quickly causes the canvas to outgrow my browser window. The div around the canvas is set to use overflow: auto so that the growing canvas doesn’t disrupt the rest of the page flow.

<div id="pageimgdiv" style="max-width: 100%; overflow: auto;">
   <canvas>
      Your browser doesn't support HTML5 Canvas
   </canvas>
</div>

The overflowed div gains horizontal scrollbars (but not vertical ones, since there’s no max-height in my case). Unfortunately many people, including myself, don’t have horizontal scrolling configured for their mouse which means scrolling down to the scrollbar, moving over, then scrolling back up.

JQuery To The Rescue

I was able to use JQuery and the scrollLeft() function to make the canvas dragable within the div. The canvas itself doesn’t change sizes or pan (which would require a second canvas used as a buffer, I think). Instead we get the mouse position and current scrollLeft setting when the mouse is clicked, and then scroll more as they move the mouse, until they release the mouse or until they leave the wrapper div.

I’m using JQuery 1.7.2.

$(document).ready(function(){
    $('#pageimgdiv').on(
    {
	mousedown: function(clicke){
	    origX = clicke.pageX + $('#pageimgdiv').scrollLeft();
	    $('#pageimgdiv').on(
	    {
		mousemove : function(e){
		    curX = e.pageX + $('#pageimgdiv').scrollLeft();
		    var diff = (origX - curX);
		    var newpos = $('#pageimgdiv').scrollLeft() + diff;
		    if(newpos > ($('canvas').width() - $('#pageimgdiv').width())){
			newpos = ($('canvas').width() - $('#pageimgdiv').width());
		    }
		    if(newpos < 0){
			newpos = 0;
		    }
		    $('#pageimgdiv').scrollLeft(newpos);
		}
	    }
	    );
	},
	mouseleave: function(){
	    $('#pageimgdiv').off('mousemove');
	},
	mouseup: function(){
	    $('#pageimgdiv').off('mousemove');
	},
	click: function(){
	    $('#pageimgdiv').off('mousemove');
	}
    }
    );
});

Embrace and Extend

This code only scrolls horizontally. You could easily extend it to use scrollTop() and enable vertical scrolling as well.

Posted in Computers, Programming | Tagged , | Leave a comment

Setting up Xdebug with NetBeans on Windows, with a Remote Apache Server

I fought with Xdebug and NetBeans enough to necessitate a post about it, if only so I don’t forget.

Most Xdebug/NetBeans tutorial assume that you’re doing development on your local machine. That’s a fine setup, but not what I was needed for this project.

Environment

Server: A typical Linux server — Debian, Apache2 and PHP.

Debugger: Xdebug

Client: Firefox on Windows, etc.

IDE: NetBeans 7.1.1

Other Tools: PuTTY

Setting Up Xdebug

Install xdebug with the command:

pecl install xdebug

The final lines of output should say something like:

Build process completed successfully
Installing '/usr/lib/php5/20090626/xdebug.so'
install ok: channel://pecl.php.net/xdebug-2.2.

Take note of the install path, /usr/lib/php5/20090626/xdebug.so, in this case. Now add the following to your PHP configuration. I created a new .ini file at

/etc/php5/conf.d/xdebug.ini

Its contents should be as follows (change the zend_extension to match the install path found above):

[xdebug]
zend_extension = "/usr/lib/php5/20090626/xdebug.so"

xdebug.remote_enable = on

; Most users won't want autostart. More on this later.
; xdebug.remote_autostart = on
xdebug.remote_autostart = off
xdebug.remote_handler = dbgp
xdebug.remote_port = 9000
xdebug.remote_server = localhost
xdebug.remote_mode = req

; Most users won't want a hard coded idekey. More on this later.
; xdebug.idekey = netbeans-xdebug
output_buffering = off

xdebug.remote_log = "/var/log/xdebug.log"

Restart Apache to complete the installation.

xdebug.remote_server = localhost. Localhost? Wait a second, I thought this was for working with a remote server? Yes, but xdebug needs to be able to connect to your computer. The easy option is to have xdebug connect to the server’s localhost, then use PuTTY to create an SSH tunnel so that NetBeans can listen on your computer’s localhost.

The harder option is to configure your home or office router to forward port 9000 to you, be sure to never change your IP address, and open port 9000 on your Windows firewall. You could use xdebug.remote_connect_back so that xdebug would connect to whichever IP made the web request, but then someone who isn’t you could access your code.

In my opinion, the SSH tunneling is the cleanest option. You can use it anywhere that you have SSH access, xdebug access is restricted to those who have SSH access and you don’t have to worry about your IP address changing.

Creating an SSH Tunnel for Xdebug

Using PuTTY.exe create and save a new SSH session which connects to your server. In this saved session configure a tunnel. Set the source port to 9000, the destination to localhost:9000, and choose the Remove and Auto radio buttons. Click the Add button to add that port forwarding configuration, then save that session to use every time you want to use xdebug.

PuTTY xdebug SSH tunnel
PuTTY xdebug SSH tunnel

Go ahead and connect to that saved PuTTY session now. Once you’re connected to your server you should be able to run netstat to verify that it’s working correctly. It should show something like this:

netstat -a -n | grep 9000
tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN
tcp6       0      0 ::1:9000                :::*                    LISTEN

Setting up NetBeans

Set up your NetBeans project like you usually would. Verify that the PHP Debugging Settings (In the Tools->Options menu) are set to debugger port 9000, and the Session ID of netbeans-xdebug. I had problems with the Watches and Balloon Evaluation options. YMMV.

NetBeans Debugging Settings
NetBeans Debugging Settings

In your project properties (Right click your project, click Properties) edit the Run Configuration’s Advanced properties. Select “Do Not Open Web Browser”.

A Note on Path Mappings

Based on the tutorials I read, most users don’t seem to need Path Mappings. NetBeans seems to figure out the path based on the upload directory, the URL and the Web Root settings.  That didn’t work for me. I needed Path Mapping because I had a symlink on the server.

PHP (and so Xdebug) dereferences symlinks and NetBeans needs the mapping between the dereferenced path on the server and the local sources directory.

This is a known issue, but not very well known. Symptoms that you need to  use mapping include NetBeans not breaking in symlinked files, NetBeans not opening the file or not opening the correct file when Xdebug connection is made. Or it may work only if you use “Debug File” instead of “Debug Project”.

 Debugging With Xdebug and NetBeans

Here’s where you have some options. The manual way to active Xdebug is to append XDEBUG_SESSION_START=netbeans-xdebug to your query string when you request a page in the browser. With the current setup, this should work. Go ahead and test it.

Choose Debug -> Debug Project from the NetBeans menu. Then, in your browser, go to http://example.com/path/to/page.php??XDEBUG_SESSION_START=netbeans-xdebug (replacing the URL with your own, of course).

If that doesn’t work then something isn’t set up correctly. It might be NetBeans, it might be Xdebug, it might be the SSH tunnel. Figure it out, get it fixed, then keep reading (you can leave comments here, and I’ll help as I can. Google is pretty helpful too).

If that does work, then GREAT!

Now you have some options.

Browser Extensions

Firefox and Chrome both have Xdebug extensions that set the XDEBUG_SESSION_START parameter in the http headers. This makes it so you don’t have to type it yourself.  Yay Firefox. Yay Chrome.

Always Auto-start Xdebug

If you are testing embedded webkit, or from mobile devices, or something else that’s not a normal browser, then appending XDEBUG_SESSION_START is going to be difficult. This is where I send you back to your xdebug.ini file to change some settings. If you need to debug requests from these sorts of devices, then you’re going to want to edit xdebug.ini and set:

xdebug.remote_autostart = on
xdebug.idekey = netbeans-xdebug

This will cause Xdebug to attempt to connect on port 9000 on every request. The idekey setting is so that NetBeans will know that the connection is for it.

Other PHP Debugging Tools

If you are looking for good PHP debugging tools, you may also want to try out KCacheGrind which will profile your code giving you an idea what’s taking up time and memory. HipHop-PHP, a tool from Facebook of all places, compiles PHP into C++. In the process it spits out all sorts of helpful errors and notices that will help you find errors in your code.

That concludes one more NetBeans Xdebug tutorial that will hopefully get you that much closer to doing some serious PHP debugging. Happy coding!

Posted in Computers, Programming, Something Interesting | Tagged , , , , , , | Leave a comment