The MIT 5k Dataset 1: Introduction

by Dan Margulis on November 11, 2017

This is the introduction to a series of posts I will make based on my work with files that are part of a remarkable archive.

Researchers at MIT and Adobe have recently made available the data from a massive project they have undertaken to study what people look for when they correct color. The researchers’ objective seems to be to improve methods of automated correction, and also to come up with a program that “learns” the preferences of an individual.

It is not clear whether they seek a routine that would emulate a top professional. If so, I’ve said for a long time that I don’t expect it to be possible to do a real-time such program until around 2025, due to lack of computing power today. Conceivably one that could process a gang of images overnight could be written today. The problem of developing a program that can equal the best human retoucher is similar to that of developing a program that can equal the best human chessplayer—which was solved a decade ago—but it has several complications absent from chess.

That’s not my main interest here. Instead, I focus on the tremendous contribution the team has made by making its original files available, together with a broad permission to use them in research. The problem that it addresses: anybody can find an image or two that purports to demonstrate anything they like. When you go to a presentation by me or someone else trying to make a point about a workflow, you are looking at images carefully chosen to prove a point.

This MIT study obliterates this objection by offering five thousand images. They come from many sources, professional and amateur. They represent every conceivable kind of scene. Many cameras were used, heavily biased in favor of SLR. The compositions range from simple to complex, as does the technical difficulty of the corrections.

That’s not all. They hired five students of photography, who nominally had broad experience in image preparation, and paid them to adjust each one of the 5,000 images to their liking. Thus, 25,000 more images to consider. The originals were presented to the retouching group in DNG format and were worked on in Lightroom. The results are also available online.

Such a massive archive enables investigation of two important points for those interested in PPW.

*How often is PPW actually useful? It is easy to find images on which it has an enormous advantage, but it is also easy to find ones in which the advantage is slim to nonexistent. If we choose, randomly, a large enough sample of these 5,000 images we can surmise how often the advantage makes a difference in the real world.

*Experimentation over the years has shown that blending separate corrections into one final version, even when done “stupidly” (i.e. a straight 50-50 blend) is surprisingly effective. This has led to my controversial position that people should work faster and do multiple versions rather than spending the time trying to perfect a single file. But there has been no convincing way to explain what has happening or to quantify how often it works. Having five versions of each correction lets us average them and find out how often the result surpasses the parents.

Originally I felt that a sample of 50 images would be sufficient, but found that this number was too small to draw conclusions from, so I increased the number to 100. We should first acknowledge certain drawbacks and limitations.

*The preparation of the archive took a long time. Most of these images are ten years old. Therefore to some extent they aren’t typical of photographs a retoucher might receive today. For example, no files were produced by smartphones.

*We know neither how the original DNGs were produced nor whether they got accurate white balance information from the camera.

*For these reasons, the original files have more gross color casts or weight issues than we would see today; on the other hand there are fewer files produced by obviously inferior equipment.

*The five persons correcting these files were not professionals, and it shows. They almost always succeeded in improving the originals and occasionally did really good work. There were, however, many more outright mistakes than one would find from professionals. This makes a fair comparison with my own corrections difficult.

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: