Award Abstract # 1647559
SBIR Phase I: Low-latency polyphonic coding for interactive immersive applications

NSF Org: TI
Translational Impacts
Recipient: INFOCODING LABS, INC.
Initial Amendment Date: December 14, 2016
Latest Amendment Date: December 14, 2016
Award Number: 1647559
Award Instrument: Standard Grant
Program Manager: Peter Atherton
patherto@nsf.gov
 (703)292-8772
TI
 Translational Impacts
TIP
 Directorate for Technology, Innovation, and Partnerships
Start Date: December 15, 2016
End Date: April 30, 2018 (Estimated)
Total Intended Award Amount: $225,000.00
Total Awarded Amount to Date: $225,000.00
Funds Obligated to Date: FY 2017 = $225,000.00
History of Investigator:
  • Tejaswi Nanjundaswamy (Principal Investigator)
    tejaswi@infocodinglabs.com
Recipient Sponsored Research Office: INFOCODING LABS INC
12650 OJAI SANTA PAULA RD
OJAI
CA  US  93023-8327
(805)845-3997
Sponsor Congressional District: 26
Primary Place of Performance: INFOCODING LABS INC
12650 Ojai Santa Paula Rd
Ojai
CA  US  93023-8327
Primary Place of Performance
Congressional District:
26
Unique Entity Identifier (UEI): R3Q8NHLRPMQ6
Parent UEI:
NSF Program(s): SBIR Phase I
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 5371, 8032
Program Element Code(s): 537100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.084

ABSTRACT

The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project, and of its underlying technological innovation, is in enabling a diverse variety of applications involving interactive and immersive media, which are central to several sectors that are poised to grow substantially in the coming years. Specifically, the low-latency audio technology developed in the project is a critical enabler for the development of future products that are of significant value to several sectors of the technology industry including, most importantly, the enabling of fully immersive interactive media products for augmented reality games and related applications. Additional impact is expected in the advancement and support of musical collaboration over the internet and enabling remote music education, both with clear cultural and educational implications. Another significant impact is in enabling a truly realistic teleconferencing experience with considerable implications for both business and social networks, with the latter further providing a realistic alternative to fully interactive social gatherings of groups and families without recourse to costly travel.

This Small Business Innovation Research (SBIR) Phase I project develops a novel paradigm for coding and networking of polyphonic audio content at low-latency via efficient prediction, which is critical to numerous applications in the emerging field of interactive immersive hyper-realistic multimedia. Polyphonic audio, or the mixture of multiple periodic components plus noise, has long resisted effective prediction, thus forcing state-of-the-art coders to either employ long transformation that incurs substantial delay and is incompatible with applications requiring low latency, low complexity and low bitrate, or accept significantly degraded performance. This project develops technologies that approach optimal performance despite constraints on latency, complexity and bit rate, by effectively exploiting temporal redundancies in all periodic components of polyphonic audio signals. Specifically, the coding paradigm builds on the novel technique of cascaded long term prediction, which enables joint prediction of all periodic components in the mixture, at low delay. This prediction approach is complemented by the development of powerful low-complexity parameter estimation techniques to minimize resource requirements, effective adaptation to fundamental frequency changes, side information optimization to minimize bitrate costs, practical redesign of all coder modules to fully exploit the prediction capabilities, and enhanced error-resilience for streaming over lossy packet networks.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The project focus was on coding and networking of polyphonic audio content at low-latency via efficient prediction, which is critical to numerous applications in the emerging fields of interactive hyper-realistic multimedia, virtual and augmented reality, multimedia content delivery and next generation wireless headphones. The project set out to satisfy the conflicting central objectives of low latency, low complexity and low bitrate, by effectively exploiting the redundancies implicit in polyphonic audio. The core technology leverages a novel prediction paradigm called cascaded long term prediction (CLTP) which enables joint prediction of all periodic components of the audio signal, from the immediately preceding segment of samples, and hence at low delay. The project achieved significant practical enhancements of the CLTP paradigm, to enable successful commercialization, through the following main technical outcomes:

(i) A critical technical obstacle on the way to practical deployment was excessive encoder and decoder complexity, which was increased by CLTP by a factor of nearly 4000 for the wireless headphones target application. Clearly, run-of-the-mill code optimization could not have delivered the drastic complexity reduction needed, and hence creative unconventional algorithmic methods were developed, which are tailored to the polyphonic audio coding scenario. The encoder complexity was reduced by a factor of 80 via a low-complexity methodology that circumvents the extensive computations of the parameter estimation module. The (more critical) decoder complexity was reduced even more dramatically, by a factor of nearly 1100, through the development of a forward adaptive prediction approach, wherein the encoder provides the decoder with useful parameters as side information, thereby completely eliminating the decoder's main computational burden of parameter estimation. Further complexity reduction is expected when code optimization is ultimately performed for a target embedded platform.

(ii) To maximize the coding efficacy with forward adaptive prediction, novel side information encoding approaches were developed, which explicitly account for inter-frame parameter dependencies. This was achieved by matching the various filters from consecutive frames, predicting parameters and ultimately only transmitting parameter corrections to the decoder. The side information rate was further reduced by redesign of the entropy coding module and adjusting parameter estimation to optimize the overall rate-quality tradeoff. Further approaches were developed to handle rapid variations in prediction parameters due to non-stationary statistics. The above advances open the door for commercialization of a new generation of low delay audio coding technology that offers solutions to major bottlenecks faced by several multimedia and content delivery sectors. As preliminary indication of the broader impacts, it is noteworthy that a major wireless headphone bluetooth chip manufacturer, and a leading content delivery service provider via satellite, are in current discussions with the company regarding integration of the technology in their next generation line of products.

 


Last Modified: 06/29/2018
Modified by: Tejaswi Nanjundaswamy

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page