Over the years, my wife and I have accumulated a large volume of digital photos. Nary a vacation has gone by that didn’t result in 1000+ photos of every landmark, vista, and museum artifact that we discovered during our travels.
Unfortunately, the majority of these photos have ended up haphazardly organized in various folders on my media server, all with different directory structures and naming methodologies, which makes it difficult to lay my hands on a particular photo from some time in the past.
One of the projects that I’ve decided to tackle this year is to come up with some method by which to rein in this madness and restore some semblance of order to the realm.
I’m not sure exactly what tools or processes I will use to tackle this problem, but I figure that the best way to start is to split it up into smaller, more manageable sub-problems.
This, then, is the first in what I hope to be a series of posts about organizing my family photo collection. Subsequent posts will (hopefully) deal with a pipeline for importing new photos, establishing a reliable offsite backup solution, and maybe even experimenting with some deep learning to automatically tag photos that contain recognizable faces.
Why not use [Insert Cloud Service Here]?
“But,” I hear you protest in an exasperated tone of voice, “you could just upload all of your photos to Google Photos and let The Cloud™ solve this problem for you.”
While it’s true that there are a number of cloud providers that offer reasonably-priced storage solutions, and that some of them even use the metadata in your photos to impose some kind of organization solution, I have a few concerns with these products:
- They cost money: Nothing in life is free, at least not if you have more than a few gigabytes of it. The largest of my photo folders contains around 70GB of files, and with the recent arrival of our son, we take new photos at a heretofore unimaginable clip. I already have a media server, and storage on it is effectively free.
- My metadata isn’t complete/correct: Garbage in, garbage out, as they say. Most (all?) of the cloud storage solutions that I’ve seen that purport to organize your photos will do a messy job of the task if the metadata on your photos is incorrect or missing. Any tool that I use will need to correct for this problem.
- Google is a creep: The same is true of Facebook et al. I’d rather generate less digital data for the multinational companies that control our modern world to indiscriminately slurp up and process in hopes of getting me to click on a single ad, thank you very much. Doubly so if the storage provider in question is going to use facial recognition to tie a name to faces in my photos, especially photos of my son.
Organizing Files with Elodie
My initial inclination, when considering the problem of sorting many gigabytes of photos into a reasonably logical folder structure was to write a script that would read the EXIF data out of the file, interpret it, and use it to determine where the file belongs.
In one of the better decisions that I’ve made so far this year (hey, it’s only January), I decided to take a look around GitHub to see if somebody had already written a script to do this work, and man, am I glad that I did.
As with most things that seem simple, it turns out that the task of reading EXIF data can get really complicated in a hurry. There are a bunch of different historical format differences, every camera manufacturer implements some number of custom extensions to the format, etc.
Elodie is a cutely-named tool with an even cuter mascot that uses exiftool to read the metadata out of your photos, and then sorts them into a configurable folder structure based on that data. If your camera or phone wrote GPS coordinates indicating where the photo was taken, Elodie can even query the MapQuest API to translate those coordinates into a human-readable place name that is added to the directory structure.
The documentation is comprehensive, albeit brief, but I’ll include a handful of the commands that I used in this post just to demonstrate the workflow that I ended up with.
There are basically two major operations in Elodie:
update. The former is used to move pictures from some source directory into the target directory, reading their metadata, renaming them, and organizing them within a configurable directory structure along the way. The latter, meanwhile, is essentially used to correct mistakes that were made during the
import process. It lets you correct the date, location, or album metadata for previously imported files, and appropriately re-sorts them into the directory hierarchy.
The command for importing existing files into your collection is simple:
~/elodie $ ./elodie.py import --trash --album-from-folder --destination=/mnt/media/Pictures/ /mnt/media/Pictures.old/Montreal\ Trip/
In this case, I’m importing vacation photos from
/mnt/media/Pictures.old/Montreal Trip. The destination folder is
/mnt/media/Pictures, and I’m using the
--album-from-folder option to tell Elodie that I want it to keep all of the pictures in this batch together in a folder called
Montreal Trip in the destination directory. We went on this trip in September of 2010, so the resulting folder structure looks like this:
mnt/ ├─ media/ │ ├─ Pictures/ │ │ ├─ 2010/ │ │ │ ├─ September/ │ │ │ │ ├─ Montreal Trip/ │ │ │ │ │ ├─ 2010-09-08_19-29-44-dsc00279.jpg │ │ │ │ │ ├─ 2010-09-09_20-15-44-dsc00346.jpg │ │ │ │ │ ├─ ...
There may be other pictures that were taken in September of 2010 in the
September folder, but they won’t be sorted into the
Montreal Trip folder unless they are marked as being a part of the Montreal Trip album.
It should be noted at this point that Elodie goes to great lengths to avoid maintaining a database of imported files. Instead, it reads metadata from and writes metadata to the image files that it is organizing. This ensures that the resulting directory structure is as cross-platform and broadly compatible as possible without locking your photo collection into a proprietary file format.
Elodie does offer one database-lite feature that helps to detect bit rot: The
generate-db operation records a cryptographic hash of every photo in your collection into a file. Months or years down the road, you can check if any of the files in your collection have become corrupted by running the
verify operation. This will recompute the hashes of all of the files in your collection, compare them against the previously recorded values, and let you know if anything has changed.
One place where not having a database falls short is if a small handful of images within a directory that contains hundreds or thousands of vacation photos have incorrect or missing EXIF data. In this case, it’s possible for those photos to be written to the wrong place in the target directory structure, and if you don’t catch the problem during the import, you’re not likely to find and fix the mistake. If Elodie were to maintain a database that tracked the source and destination locations for every imported photo, these mistakes would be easy to find. This in turn means that importing a large existing photo collection with Elodie becomes a job that requires human supervision.
Here’s a brief rundown of some of the issues that I ran into while importing a large collection of existing files:
- Missing metadata: Some photos, particularly those taken by pre-smartphone cameras, didn’t have the date on which the photo was taken in their EXIF data. When it encounters this problem, Elodie falls back to the lesser of the file created or file updated date. Because of the way that I managed my import, all of these files ended up in
Pictures/2021/January/Unknown Location/, but if you didn’t accidentally overwrite the file created date on all of your photos as a part of your import process, Elodie may put them into an unexpected location in your target directory tree.
If you happen to be importing an album (i.e. a bunch of photos that were taken as a part of a named event) and your target directory structure includes the album name, you can find the improperly organized photos by running
find /path/to/photos -type d "album name" -printto find directories in your collection that have the same name as the album. Once found, you can use Elodie’s
updateoperation to fix the problem:
~/elodie $ ./elodie.py /update --time="2014-10-22" --album="album name" /path/to/photos
- Incorrect metadata: In many ways, incorrect metadata is worse than missing metadata, because it causes photos to be organized into unexpected locations. As far as I can tell, the cause of this problem was a pre-smartphone camera that had its date and time set incorrectly. Remember when you used to have to set the date and time on all of your electronics? Boy, that sucked.
You’ll know this is the problem if you are importing photos from a vacation that you took in 2012, but they’re inexplicably being put into the 2008 directory. In this case, you can use
exiftool, the underlying library that Elodie uses to read and write metadata, to add or subtract some number of years, months, days, or hours to or from the dates recorded in the photos’ EXIF data.
- Geotag lookup doesn’t appear to work: As mentioned above, Elodie has the ability to convert GPS coordinates found in the EXIF data of photos into human-readable place names by way of a call to the MapQuest API. This, of course, only works if your camera was aware of its location at the time that the photo was taken, which wasn’t really a thing in the pre-smartphone era.
This isn’t really a problem for vacations, as I can import all photos from a trip into an album, and that album name will be used in the folder structure that Elodie creates. For example, if you import photos from a folder called
--album-from-folderoption, they’ll end up in a directory structure like this:
It does, however, get annoying for photos that were taken closer to home with a non-GPS-aware camera. These all get sorted into a
year/month/Unknown Location/directory. I can’t find anything in the Elodie docs or source code that allows this behaviour to be changed. I would rather that these photos end up in the root of the
year/month/folder, because I think that the extra
Unknown Locationdirectory is, well, less than helpful, but I recognize that this is a matter of preference. For now, I think I’ll solve this problem by writing a quick script to move these photos as I see fit.
Even with the help of a tool like Elodie, reorganizing tens of gigabytes worth of photos is not a quick task. I can hear the fans on my Drobo humming, and am only too aware that this kind of mass file migration is a great way to find the breaking point of old hard drives.
Once I’m done importing all of the pictures that I already have, I’ll move on to figuring out an import pipeline for photos that I haven’t yet taken. Off the top of my head, it needs to be mostly automatic, work well with the iPhones that my wife and I use to take most photos these days, and should ideally email me the Elodie log so that I can see what photos ended up where and whether or not I need to manually correct any mistakes.
I’m hopeful that once I get around to importing photos that were taken with a smartphone that has internet time and some notion of its location, the metadata problems that I catalogued above will become the exception instead of the rule.
Once I figure out that next step, I’ll write about it here. In the meantime, go organize all of those photos that are cluttering up your media server. You’ll feel better once you do. I promise.