The Visual Multimethod Assessment Fusion (VMAF) algorithm has recently emerged as a state-of-the-art approach to video quality prediction, that now pervades the streaming and social media industry. However, since VMAF requires the evaluation of a heterogeneous set of quality models, it is computationally expensive. Given other advances in hardware-accelerated encoding, quality assessment is emerging as a significant bottleneck in video compression pipelines. Towards alleviating this burden, we propose a novel Fusion of Unified Quality Evaluators (FUNQUE) framework, by enabling computation sharing and by using a transform that is sensitive to visual perception to boost accuracy. Further, we expand the FUNQUE framework to define a collection of improved low-complexity fused-feature models that advance the state-of-the-art of video quality performance with respect to both accuracy and computational efficiency.
Fusion-based quality assessment has emerged as a powerful method for developing high-performance quality models from quality models that individually achieve lower performances. A prominent example of such an algorithm is VMAF, which has been widely adopted as an industry standard for video quality prediction along with SSIM. In addition to advancing the state-of-the-art, it is imperative to alleviate the computational burden presented by the use of a heterogeneous set of quality models. In this paper, we unify “atom” quality models by computing them on a common transform domain that accounts for the Human Visual System, and we propose FUNQUE, a quality model that fuses unified quality evaluators. We demonstrate that in comparison to the state-of-the-art, FUNQUE offers significant improvements in both correlation against subjective scores and efficiency, due to computation sharing.
Many algorithms have been developed to evaluate the perceptual quality of images and videos, based on models of picture statistics and visual perception. These algorithms attempt to capture user experience better than simple metrics like the peak signal-to-noise ratio (PSNR) and are widely utilized on streaming service platforms and in social networking applications to improve users’ Quality of Experience. The growing demand for high-resolution streams and rapid increases in user-generated content (UGC) sharpens interest in the computation involved in carrying out perceptual quality measurements. In this direction, we propose a suite of methods to efficiently predict the structural similarity index (SSIM) of high-resolution videos distorted by scaling and compression, from computations performed at lower resolutions. We show the effectiveness of our algorithms by testing on a large corpus of videos and on subjective data.
The Structural Similarity (SSIM) Index is a very widely used image/video quality model that continues to play an important role in the perceptual evaluation of compression algorithms, encoding recipes and numerous other image/video processing algorithms. Several public implementations of the SSIM and Multiscale-SSIM (MS-SSIM) algorithms have been developed, which differ in efficiency and performance. This “bendable ruler” makes the process of quality assessment of encoding algorithms unreliable. To address this situation, we studied and compared the functions and performances of popular and widely used implementations of SSIM, and we also considered a variety of design choices. Based on our studies and experiments, we have arrived at a collection of recommendations on how to use SSIM most effectively, including ways to reduce its computational burden.
Many algorithms have been developed to evaluate the perceptual quality of images and videos, based on models of picture statistics and visual perception. These algorithms attempt to capture user experience better than simple metrics like the peak signal-to-noise ratio (PSNR) and are widely utilized on streaming service platforms and in social networking applications to improve users’ Quality of Experience. The growing demand for high-resolution streams and rapid increases in user-generated content (UGC) sharpens interest in the computation involved in carrying out perceptual quality measurements. In this direction, we propose a suite of methods to efficiently predict the structural similarity index (SSIM) of high-resolution videos distorted by scaling and compression, from computations performed at lower resolutions. We show the effectiveness of our algorithms by testing on a large corpus of videos and on subjective data.
Fourier Ptychography (FP) is a computational imaging technique which artificially increases the effective numerical aperture of an imaging system. In FP, the object is imaged using an array of Light Emitting Diodes (LEDs), each from a different illumination angle. A high resolution image is synthesized from this low resolution stack, typically using iterative phase retrieval algorithms. However, such algorithms are time consuming and fail when the overlap between the spectra of images is low, leading to high data requirements. At the crux of FP lies a phase retrieval problem. In this paper, we propose a Deep Learning (DL) algorithm to perform this synthesis under low spectral overlap between samples, and show a significant improvement in phase reconstruction over existing DL algorithms.
In the real world, agents often have to operate in situations with incomplete information, limited sensing capabilities, and inherently stochastic environments, making individual observations incomplete and unreliable. Moreover, in many situations it is preferable to delay a decision rather than run the risk of making a bad decision. In such situations it is necessary to aggregate information before taking an action; however, most state of the art reinforcement learning (RL) algorithms are biased towards taking actions extit{at every time step}, even if the agent is not particularly confident in its chosen action. This lack of caution can lead the agent to make critical mistakes, regardless of prior experience and acclimation to the environment. Motivated by theories of dynamic resolution of uncertainty during decision making in biological brains, we propose a simple accumulator module which accumulates evidence in favor of each possible decision, encodes uncertainty as a dynamic competition between actions, and acts on the environment only when it is sufficiently confident in the chosen action. The agent makes no decision by default, and the burden of proof to make a decision falls on the policy to accrue evidence strongly in favor of a single decision. Our results show that this accumulator module achieves near-optimal performance on a simple guessing game, far outperforming deep recurrent networks using traditional, forced action selection policies.
We present a transfer learning framework for no-reference image quality assessment (NRIQA) of tonemapped High Dynamic Range (HDR) images. This work is motivated by the observation that quality assessment databases in general, and HDR image databases in particular are “small” relative to the typical requirements for training deep neural networks. Transfer learning based approaches have been successful in such scenarios where learning from a related but larger database is transferred to the smaller database. Specifically, we propose a framework where the successful AlexNet is used to extract image features. This is followed by the application of Principal Component Analysis (PCA) to reduce the dimensionality of the feature vector (from 4096 to 400), given the small database size. A linear regression model is then fit to Mean Opinion Scores (MOS) using L2 regularization to prevent overfitting. We demonstrate state-of-the-art performance of the proposed approach on the ESPL-LIVE database.