StudyMB 2.0

Subjective IQA of the interpolated images from the Middlebury benchmark

Current benchmarks for optical flow algorithms evaluate the estimation either directly by comparing the predicted flow fields with the ground truth or indirectly by using the predicted flow fields for frame interpolation and then comparing the interpolated frames with the actual frames. In the latter case, objective quality measures such as the mean squared error are typically employed. However, it is well known that for image quality assessment, the actual quality experienced by the user cannot be fully deduced from such simple measures. Hence, we conducted StudyMB 2.0, a subjective quality assessment crowdsourcing study for the interpolated frames provided by one of the optical flow benchmarks, the Middlebury interpolation benchmark.

StudyMB 2.0 contains interpolated frames from 155 methods applied to each of 8 contents. We collected forced-choice paired comparisons between interpolated images and corresponding ground truth. To increase the sensitivity of observers when judging minute difference in paired comparisons we introduced a new method to the field of full-reference quality assessment, called artefact amplification. From the crowdsourcing data (3720 comparisons of 20 votes each) we reconstructed absolute quality scale values according to Thurstone's model.

Re-ranking of the 155 participating algorithms w.r.t. the visual quality of the interpolated frames are contained in the .CSV file of "SCORES 2.0".
The artefact-amplified images are provided in the .ZIP file of "STUDYMB 2.0".
Ranking differences between StudyMB 1.0 and 2.0 are provided in the .CSV file of "RANK DIFFERENCES 1.0 vs 2.0".

Cite us

StudyMB 2.0 is freely available to the research community. If you use our database in your research, please cite the following reference:

@article{men2020subjective,
title={Subjective Annotation for a Frame
Interpolation Benchmark using Artifact
Amplification},
author={Men, Hui and Hosu, Vlad and Lin, 
Hanhe and Bruhn, Andr{\'e}s and Saupe, Dietmar},
journal={arXiv preprint arXiv:2001.06409},
year={2020}
}

Downloads

StudyMB 2.0 (.zip)

346.3 MB download

SCORES 2.0 (.CSV)

23.3 kB download

Rank Differences 1.0 vs 2.0 (.CSV)

43 kB download

The scores file

There are 8 sets of images: Mequon, Schefflera, Urban, Teddy, Backyard, Basketball, Dumptruck, Evergreen
The column of "Method" corresponds to the image number (from 1 to 155) in each set.
In each set, the image of number 0 indicates the ground truth.
The scores indicates the subjective quality of the images in "_orig" folder, not the "_Ampli" one.
The scores are in the range of [0,1], where 0 indicates the worst quality, and 1 the best.

StudyMB 2.0 Image Database

Subjective IQA of ​the interpolated images from the Middlebury benchmark

Cite us

Downloads

The scores file

Subjective IQA of the interpolated images from the Middlebury benchmark