Sunday, June 6, 2010

1.a) Scan Photos

In the last post, I described six major steps in the plan for building a Media Archive.  This post details the first step: Scanning Photos.  This will be a bit technical and most likely a dry read for most readers.  The purpose for highly detailed, technical posts like this is to document the process for I followed for each step and how each step fits into the project as a whole.

Scanner Settings
When began this project, the first thing I did was to begin experimenting with a scanner I already had lying around.  I chose a picture and made some scans at various values of Dots per Inch (DPI), up to the scanner's maximum of 300 DPI.  While this setting might be fine for hosting the resulting images on the web or possibly re-printing them,  I wanted to be sure that even cropped, or zoomed in, or blown up versions of my scans could be usable without artifacts.   So I bought a Canon CanoScan 8800f scanner with a maximum optical resolution of 4800 x 9600 DPI.  Scanning equipment is usually distributed with software drivers that allow your image processing programs to use all the features of your particular model of scanner.  The next section describes the scan settings I chose for this project.

Dots Per Inch (DPI)
The new scanner provided quite an increase in resolution.  In fact, 4800 DPI scans turned out images much larger than amount of memory in my little old laptop (Which the computer didn't like so much because of thrashing).  The extreme number of pixels resulting from larger prints forced me to run some experiments to determine the pint at which the increases in DPI resulted in no noticeable increase details captured (even at extreme zooms).  This was a bit of an eyeball estimate based on a few sample pictures but for this effort, I decided on 2400 DPI.

Yes, I know.  This setting results in images that are beyond overkill for viewing on the computer screen.  They're also well beyond what the typical accepted resolution for re-printing (300).  However, imagine a photo that was not well composed.  Lets say most of the scene was empty background and you wanted a full-size print of just the 1/8 of the image containing the subject's face.  If the master scan was at 2400 DPI, the interesting quarter of the image could be blown up and printed (at 300 DPI) with very little if any pixelation visible.

This theme runs through every aspect of this project: store more than is generally of practical use now to allow future uses without the need to re-scan the original prints. It is trivial for software to downscale a property of a master for a specific use but the reverse is hard (and often impossible).  For example, I even scanned "Black and White" prints as color rather than grayscale.  This provides a consistent output for the master images and allows for color correction later for images scanned from faded prints.

Color Depth
Similarly, I selected more than is needed for the Color Depth of the scans.  One of the features of the CanoScan 8800f is 48-bit Color Depth scanning.  Even though very few computer display hardware (video cards, cables, monitors...) support color depths above 24-bit, there may be future applications that take advantage of colors of such precision.  For almost all practical uses of the images, a reduction of resolution will be required anyway.  The color depth can be dropped for those copies at the same time with little extra effort.

Other Settings
Scanner software and drivers often provide automatic features such as dust removal and color correction.  I turned all of these extra filters off except "Auto Tone" (color correction).  The others attempt to perform automatic "touch-ups" that should not really be applied to the master scans.  Other 3rd party photo editing applications offer similar tools with finer control over the effects.

Scan Profiles
The scanner drivers provided a way to save settings to named profiles.  This was helpful because it meant I could define the scan settings I wanted once and call up that profile for each image to be scanned.  This provided consistency so that I would never accidentally forget to change one of the settings from the default.  This is the profile I used for most prints:

Profile: Archive Photo
DPI: 2400
Color Mode: "Color(48-bit)"
Auto Tone: ON
[All Other Features]: OFF

As I moved through images, I created set of profiles to use under various conditions.  I found that some prints had potentially interesting captions or dates hand-written onto the back.  Recording these didn't require the full quality that I had defined for the images.  So I created a second profile:

Profile: Back Photo
DPI: 300
Color Mode: "Color" (meaning 24-bit color depth)
Auto Tone: ON
[All Other Features]: OFF

The DPI and Color Depth settings I chose resulted in massive numbers of pixels for even moderately sized prints.  For example a 4x6 print scanned at 2400 DPI results in 9,600 x 14,400 = 138,240,000 pixels.  The CanoScan 8800f has a limit of 10,000 x 30,000 pixels when scanning in 48-bit mode (something I did not see on the declared specifications!).  When scanning a 4x6 print, this limit is easily overcome by simply laying the print on the scanner bed in the correct orientation.  But my initial batch of photos had many prints of non-standard sizes; some of which were over the limit in both dimensions.  For these cases, I created a third profile:

Profile: Big Archive Photo
DPI: 2400
Color Mode: "Color" (meaning 24-bit color depth)
Auto Tone: ON
[All Other Features]: OFF

In cases where I was forced to use the "Big Archive Photo" profile, I attempted to also performed a second scan using the "Archive Photo" profile but cropped the image to fit within the scanner limits.  For prints where too much had to be cropped in order to fit within the limits (e.g. an 8x10 print), I resigned to only using the "Big Archive Photo" profile.

The photo set I was working through, also contained some copies of patents owned by my grandfather.  Similar to the backs of the prints, these texts did not require the full resolution provided by the "Archive Photo" profile.  For these, I created a fourth profile:

Profile: Archive Text
DPI: 600
Color Mode: "Color" (meaning 24-bit color depth)
Auto Tone: ON
[All Other Features]: OFF

Scanning Software Tools
In the last post, I outlined a process in which raw master scans are kept unmodified.  Since this post is about the scanning step, I'll focus here on the software tools used to perform the scans.  I'll devote a future post to creating touched-up versions of the master scans and the image processing tools I used.

Even though photo manipulation is performed later, the scanner drivers must be called from within some kind of image processing software (Windows now has scanning capability built in but this is still a kind of software).  Scanners are usually shipped with a bundle of image software tools to use with your scanner.  These are intended to provide the buyer everything they need to begin using their scanner right away.  These third party applications can be sufficient for casual scanning and photo manipulation tasks.  But, since different manufacturers include products developed by different vendors, it is difficult to speak to the quality of these bundled applications in general.  Personally, I usually stay away from them.   

Usually when I want to view or perform simple edit images, I reach for my favorite image processing tool: Irfanview.  However, I found that when used for scanning, the resulting images (even before saving to a file) were being automatically reduced to 24-bit color.  My favorite photo organizer software Google Picasa wouldn't allow me to scan at 48-bit color and stored the images in the information-lossy JPEG file format.

I finally settled on using and old version of Adobe Photoshop (I used an old version because it is what I had and while Photoshop is incredibly powerful, it is also priced to match). The scans were not stored in an intermediate information-lossy format or reduced color depth.  And, since I am not performing any of the actual image manipulation at this stage, an older version of the tool works just fine for scanning and saving.

I only performed one type of manipulation on the raw scans before saving them.  Many of the photos had a border area that was not printed on.  I cropped these borders out as close as possible.  Most of the prints turned out to not be perfectly square at high magnification (I used 300% zoom to set the crop lines), so small amounts of border are sometimes visible.

I even took extra effort to place images at right angles on the scanner bed.  This sometimes required two or three test scans at lower resolution to set the prints just right.   This is because the scanner "eye" does not "see" all the way to the edge of the scanner bed.  Placing a print with no border along any edge of the scanner bed would miss scanning a sliver of the print along that edge .  By taking the extra effort required to place prints squarely (by trial end error), I was able to avoid digitally rotating the images before saving them; further ensuring that the information stored in the master images is the exact unmodified set of pixels captured by the scanner.

Saving the Master Images
After the print is scanned, the master image must be saved to a file.  Future posts discuss the storage (1.b) and indexing (1.c) of the master image files.  For now, assume that there is a place to save the images and that each print is assigned an index name.  Saving the scanned image requires two more decisions: What to name the file and in which image file type to save the image data.

File Names
The basic approach is to save each master image with a file name that is based on the index assigned to the print.  This way, the print or the master image can be found by using the index marked on the other.  However, some prints produced multiple master images (See 'Scan Profiles' above).  For these, file name modifiers are added after the index to indicate the part of the print scanned.

Part of Print ScannedScan ProfileModified Image Name
Main print image with the border cropped off   Archive Photo   [Index]
Dates / Captions from the back  Archive Back   [Index]_back
Dates / Captions from the border Archive Back [Index]_border
Cropped scan of images too large for Archive Photo  Archive Photo [Index]_crop48
Full scan of images too large for Archive Photo profile Big Archive Photo [Index]_full24
Page of text Archive Text [Index]_text_p#

Image File Type
This section caused some trouble for me.  There are many image file types with a great deal of differences between them.  For saving the master images, the file type must be a well accepted standard and supported by many different software tools.  This increases the chances that the files saved now will still be readable by  software whenever they may be needed in the future.  The image file type must not use any kind of lossy compression (like JPEG). 

I narrowed the selection down to three different formats:
JPEG2000 is a new version of the JPEG standard which includes a lossless compression option.  In my testing, this format compressed the images the most.  However, it never seemed to catch on as a common file type.  Not many software editors support it.  The ones that do, usually need some kind of plug-in in order to obtain that support.

Portable Network Graphics (PNG) is a format that is intended to replace Graphics Interchange Format (GIF).  In my testing, this format did not compress quite as well as JPEG2000.  However, it is supported by almost all image viewers, editors, and web browsers.

Photoshop Document (PSD) is the native file format of Adobe Photoshop.  It supports the editing features provided by Photoshop (including layers).  However, the resulting files are very large. 

I eventually decided to store the master images in the PSD format.  It allows for more options for saving during later editing and touch-up steps.  Plus, storage is so inexpensive, the space saved by the compression of the other formats is not worth the extra effort and limitations.

This was a long and rambling post.  The next post will be a step-by-step procedure to follow while scanning.