Audiovisual Zooming: What You See Is What You Hear

Arun Asokan Nair, Austin Reiter, Changxi Zheng, Shree Nayar
Event ACM Multimedia 2019
Research Areas Computational Imaging

Best Paper Award at ACM Multimedia 2019

Abstract: When capturing videos on a mobile platform, often the target of interest is contaminated by the surrounding environment. To alleviate the visual irrelevance, camera panning and zooming provide the means to isolate a desired field of view (FOV). However, the captured audio is still contaminated by signals outside the FOV. This effect is unnatural—for human perception, visual and auditory cues must go hand-in-hand. We present the concept of Audiovisual Zooming, whereby an auditory FOV is formed to match the visual. Our framework is built around the classic idea of beamforming, a computational approach to enhancing sound from a single direction using a microphone array. Yet, beamforming on its own can not incorporate the auditory FOV, as the FOV may include an arbitrary number of directional sources. We formulate our audiovisual zooming as a generalized eigenvalue problem and propose an algorithm for efficient computation on mobile platforms. To inform the algorithmic and physical implementation, we offer a theoretical analysis of our algorithmic components as well as numerical studies for understanding various design choices of microphone arrays. Finally, we demonstrate audiovisual zooming on two different mobile platforms: a mobile smartphone and a 360◦ spherical imaging system for video conference settings.

CCS ConceptsInformation systems → Multimedia content creation

Keywords audiovisual zooming, beamforming, audio enhancement