Saturday, May 2, 2009

1) Media Archive Plan

In the last post, I described the three major components of the project. This post describes my general plan for constructing the first component: The Media Archive.

1.a) Scan Photos

The first step in the whole project is to scan family photos into a digital format. My wife and I have a few prints from when we were growing up but they are relatively newer prints than most of those held by other members of the family. I have decided to begin with photo sets passed down from my grandparents. Thanks to my aunt Linda, who lent me her collection, I have been able to begin experimenting with scanners and software.

I'll save the technical details of the scanning tools, settings, and procedures for a future post (you can't wait can you?). The general idea is to capture the images at a very high resolution and color depth with all of the "extras" provided by the scanner software (e.g. sharpening, dust & scratch removal) turned off.

The resulting digital image is saved in a file format that uses either no compression or lossless compression. Many popular graphical file formats (including JPEG) use lossy compression which sacrifices some of the information contained in the file in order to greatly reduce the file size. While most of the time the loss of information goes unnoticed (even with more than 90% of the information removed), deleting information runs counter to the entire purpose of the archive: to preserve as much information as possible.

All of this is done so that I obtain digital versions that are as close to the original print as possible.

1.b) Store Master Images

Because of all of the choices made in the scanning step, the resulting files are so large as to be completely impractical for direct uses (especially for web-based purposes). Instead, the raw scans are kept unmodified as master versions. Only copies of the master files are "touched-up" and scaled down to practical sizes. This way, all the information that was captured during the scan is always maintained at the master.

Besides storage is cheap! Today, you can get hard drives at about $0.08 per gigabyte and it will only get cheaper. Even if each image took up a whole gigabyte, you could store 2,500 images for only $200.

Of course, it's not that simple. Hard drives have moving parts which eventually wear out. A hard drive lures you into a sense of security until you have all of your important information stored on it with no backups and then fails catastrophically. Automatic data redundancies and regular backups are critical components of the Media Archive. A future post will explain how I plan to store

1.c) Index Originals

Occasionally, it may be necessary to retrieve the master file for an image. Perhaps to create a new print or blown up version. Similarly, the original print might be required. An image indexing system is required for either of these situations.

Each image created from the archive should be linked back to the master image and the original print. The name of the file could be used to associate an image with it's master but file names are easily and often changed. Some image file formats contain fields for storing metadata which could be used to store the indexing information. However, with this method, some table would be required to look up the image file name of the master from the information contained in the metadata of an image.

Additionally, the original prints must be stored in a way that allows them to be retrieved through the index. An index number could be written in archival ink onto the back of each image as it is scanned. Better yet, index cards or table of contents pages could be used to identify the index for small groups of prints.

1.d) Touch up & Downscale

With the master copies and original prints indexed and safely stored, the next step is to create usable versions. Any filters to improve the image are applied at this step (e.g. dust & scratch removal, sharpen, further cropping ...). If extensive work is required, the product is first saved in the same format as the master and archived with it. This way the touch ups do not need to be reapplied for any future versions made from that master.

There are many ways that the images can be scaled down to practical sizes. For example, black and white images can be converted to grayscale. The resolution can be reduced and the image stored in a file format that includes compression (even lossy compression if desired).

1.e) Publish

The downscaled images should then be employed in an accessible way. This might include digital photo frames, photo organization software, or online photo streams for sharing with the world.

1.f) Tag

The last step for a photo in the Media Archive is to be tagged with descriptive information. Generally, photo organization software and websites provide a slot for a text description of what is depicted.

However, most modern options provide a tag cloud feature. This allows photos to be assigned a set of descriptive phrases. This feature is used to identify individuals, places, or things in pictures and automatically associate that image with others tagged similarly.

I will not recognize the people and places depicted in all of the pictures. But, by publishing them on the web, I can leverage the collective knowledge the whole family in tagging media in the archive.

Ok, I know that was a lot of information all at once but it provides a general overview of the design of the Media Archive.

Thursday, April 30, 2009

Project Overview

The general goal of this project is to collect and safely preserve the history of my family. In the beginning, this project will focus on previous generations however, it will be designed to be a living archive. Growing with the family as time goes on.

What do I mean by family history? The specifics of this project continue to change as it is executed but there are a few obvious starting points for this project:

1) A Media Archive
Sure, digital cameras have made it easier to take and enjoy family photos. They'll never degrade and if you're good about backing up your data, you can even be reasonably sure that they are safely stored for the long term.

As amazing as this sounds to all you kids Tweeting and swapping pics with cell phones, photos used to only come printed on paper. Sure, you got the negatives too, but those never seem to make it through spring cleaning. This leaves collections of prints stashed away in albums, boxes, envelopes, and bags. When not cared for, they often fade, scratch, crack, stick together, rip and generally self destruct over time. While there are steps that can be taken to safely store prints, it's usually a losing battle against time.



Then there is the unexpected forces of nature such as fires and floods. You almost want to keep your originals in a climate controlled vault to protect them. Of course they're of little use locked away from view.

The solution is to scan the originals into a digital format. Not only does it drastically increase the chances that the photos will survive, it also allows people to share them more easily. Since the digital versions of the images are so easily accessible and shareable, the originals can then be stored safely away.

At the beginning of this project I will focus on images but I have already done preliminary work capturing old home movies to a digital format as well.

2) A Family Tree
A media archive is one part of the preservation of family history. But, without knowing who the people are and how they are related, the images convey little meaning. A family tree represents this type of information nicely. And the act of building a family tree can be a family activity that teaches younger generations about the family's past.


Manuel Turlin, "Turlin family tree", June 21 2009 viaWikimedia Commons, Creative Commons License

But, a family tree by itself doesn't provide much of a connection to who the people are (or were). My goal is to integrate the Media Archive with the Family Tree. Imagine the following scenario:

You are browsing the Media Archive and run across an interesting picture of your grandfather as a young man. He's standing with someone you don't recognize so you check the image's metadata to find out who he is. He turns out to be a member of the family so his name links into the Family Tree where you see his relationship to your grandfather (and to you). You read about where he lived and maybe what he did for a living. From there, you click through to his gallery on the Media Archive and begin browsing other images depicting him.
Rinse, repeat.

You've moved back and forth between the two systems seamlessly. The person shown standing with your grandfather now feels like a real person to you.

3) The Stories
In the scenario described in the last section, a fairly significant chunk of information was conveyed. But even with images and biographic information, you still don't get the whole picture. What was really going on in that picture?

You MIGHT know where it was taken if the image was tagged with that information. But why was your grandfather there? Why was he standing with that other member of the family? Was it a family reunion or did they spend every summer together?

These types of information don't typically fit in categories of metadata. They make up the story that led up to the picture being taken. You might expect this kind of information to be in the caption of each picture. But, usually a caption is fairly short and after writing enough of them, they tend to take the form: "Jon and Jenn with their Grandparents." Not very helpful...

This is part of the project I have put the least amount of research into but I expect to be the hardest to implement. Even if I am able to find the right technology to support it (comment with suggestions), gathering the stories from those who remember them is going to be tough. They are likely to be sparse and require a lot of time to capture and I know we all have very busy lives.

The plan is to put together a system that allows information to be entered by many people over time. I can't possibly sit down with everybody to get pictures, biographic info, and stories. But, if the system allows everybody to contribute what they know, the whole idea becomes much more feasible.

Sunday, April 26, 2009

What is this blog for?

In April of 2009, I finally began an effort of digitizing old family photos, documents, and memories. This was a project that I had been intending to tackle for quite a while. In fact, I had previously made a handful of attempts. Each time, the project got reshuffled lower on the priority list when things got busy (as they always seems to do) and I would eventually stop working on it altogether. A future post will explain some of the motivations and mechanisms I am employing to accomplish more focus this time around.

I plan to use this blog to document the project as I work through it. There are many reasons for creating this type of documentation. As a software engineer, I recognize the general need to document your work. The rest of this post will explain how I plan to use this blog.

One of the major consumers of this blog will be myself. I have a terrible terrible memory and will record details of my decision making process here. Basically, I want to make sure I know what the heck I was thinking when I look back at certain decisions along the way. Hopefully, this will help me avoid doing something that makes a planned step or intended use of the archive impossible.

I also plan to use this blog as a collection of technical instructions and procedures that I follow. This way, when I return to the project after a break (hey, life happens), I will remember where I left off and what to do next. This also enables me to accept help offered by others (post a comment to get involved) while ensuring consistency across the whole project.

Lastly, this blog will be used to discuss the project with others. It will keep those interested up to date on my progress and help coordinate the efforts of those participating directly in the project. It will also help me connect to those who have experience working on similar projects and allow me to benefit from their expertise.

These are some of the major reasons this blog exists. The next post will describe more detail about what type of archive I am attempting to create and the components I have selected so far.