
NSF Org: |
TI Translational Impacts |
Recipient: |
|
Initial Amendment Date: | December 14, 2016 |
Latest Amendment Date: | December 14, 2016 |
Award Number: | 1647559 |
Award Instrument: | Standard Grant |
Program Manager: |
Peter Atherton
patherto@nsf.gov (703)292-8772 TI Translational Impacts TIP Directorate for Technology, Innovation, and Partnerships |
Start Date: | December 15, 2016 |
End Date: | April 30, 2018 (Estimated) |
Total Intended Award Amount: | $225,000.00 |
Total Awarded Amount to Date: | $225,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
12650 OJAI SANTA PAULA RD OJAI CA US 93023-8327 (805)845-3997 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
12650 Ojai Santa Paula Rd Ojai CA US 93023-8327 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | SBIR Phase I |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.084 |
ABSTRACT
The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project, and of its underlying technological innovation, is in enabling a diverse variety of applications involving interactive and immersive media, which are central to several sectors that are poised to grow substantially in the coming years. Specifically, the low-latency audio technology developed in the project is a critical enabler for the development of future products that are of significant value to several sectors of the technology industry including, most importantly, the enabling of fully immersive interactive media products for augmented reality games and related applications. Additional impact is expected in the advancement and support of musical collaboration over the internet and enabling remote music education, both with clear cultural and educational implications. Another significant impact is in enabling a truly realistic teleconferencing experience with considerable implications for both business and social networks, with the latter further providing a realistic alternative to fully interactive social gatherings of groups and families without recourse to costly travel.
This Small Business Innovation Research (SBIR) Phase I project develops a novel paradigm for coding and networking of polyphonic audio content at low-latency via efficient prediction, which is critical to numerous applications in the emerging field of interactive immersive hyper-realistic multimedia. Polyphonic audio, or the mixture of multiple periodic components plus noise, has long resisted effective prediction, thus forcing state-of-the-art coders to either employ long transformation that incurs substantial delay and is incompatible with applications requiring low latency, low complexity and low bitrate, or accept significantly degraded performance. This project develops technologies that approach optimal performance despite constraints on latency, complexity and bit rate, by effectively exploiting temporal redundancies in all periodic components of polyphonic audio signals. Specifically, the coding paradigm builds on the novel technique of cascaded long term prediction, which enables joint prediction of all periodic components in the mixture, at low delay. This prediction approach is complemented by the development of powerful low-complexity parameter estimation techniques to minimize resource requirements, effective adaptation to fundamental frequency changes, side information optimization to minimize bitrate costs, practical redesign of all coder modules to fully exploit the prediction capabilities, and enhanced error-resilience for streaming over lossy packet networks.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The project focus was on coding and networking of polyphonic audio content at low-latency via efficient prediction, which is critical to numerous applications in the emerging fields of interactive hyper-realistic multimedia, virtual and augmented reality, multimedia content delivery and next generation wireless headphones. The project set out to satisfy the conflicting central objectives of low latency, low complexity and low bitrate, by effectively exploiting the redundancies implicit in polyphonic audio. The core technology leverages a novel prediction paradigm called cascaded long term prediction (CLTP) which enables joint prediction of all periodic components of the audio signal, from the immediately preceding segment of samples, and hence at low delay. The project achieved significant practical enhancements of the CLTP paradigm, to enable successful commercialization, through the following main technical outcomes:
(i) A critical technical obstacle on the way to practical deployment was excessive encoder and decoder complexity, which was increased by CLTP by a factor of nearly 4000 for the wireless headphones target application. Clearly, run-of-the-mill code optimization could not have delivered the drastic complexity reduction needed, and hence creative unconventional algorithmic methods were developed, which are tailored to the polyphonic audio coding scenario. The encoder complexity was reduced by a factor of 80 via a low-complexity methodology that circumvents the extensive computations of the parameter estimation module. The (more critical) decoder complexity was reduced even more dramatically, by a factor of nearly 1100, through the development of a forward adaptive prediction approach, wherein the encoder provides the decoder with useful parameters as side information, thereby completely eliminating the decoder's main computational burden of parameter estimation. Further complexity reduction is expected when code optimization is ultimately performed for a target embedded platform.
(ii) To maximize the coding efficacy with forward adaptive prediction, novel side information encoding approaches were developed, which explicitly account for inter-frame parameter dependencies. This was achieved by matching the various filters from consecutive frames, predicting parameters and ultimately only transmitting parameter corrections to the decoder. The side information rate was further reduced by redesign of the entropy coding module and adjusting parameter estimation to optimize the overall rate-quality tradeoff. Further approaches were developed to handle rapid variations in prediction parameters due to non-stationary statistics. The above advances open the door for commercialization of a new generation of low delay audio coding technology that offers solutions to major bottlenecks faced by several multimedia and content delivery sectors. As preliminary indication of the broader impacts, it is noteworthy that a major wireless headphone bluetooth chip manufacturer, and a leading content delivery service provider via satellite, are in current discussions with the company regarding integration of the technology in their next generation line of products.
Last Modified: 06/29/2018
Modified by: Tejaswi Nanjundaswamy
Please report errors in award information by writing to: awardsearch@nsf.gov.