Context awareness technology is not new. However, the contextual inference of existing services is usually static and based on single-purpose signal-based data such as GPS coordinates, Wifi connection, device proximity, and gyroscope output. Not only do they lack the ability to infer abstract and dynamic context, but also fail to capture fine-grained environmental data such as ambience, social interaction and spontaneous events.
For instance, GPS-based systems stop working inside buildings; Wifi-based systems fail when network configuration changes; proximity-based systems require complementary detectors or inter-device communication; gyroscope-based systems only infer gesture and individual movement.
To tackle the problem of existing context recognition methods, we need a more adaptive and multi-purpose type of data for inferring more abstraxt, complex and dynamic context. Image, speech, sound and video are the most promising and widely available data types that satisfy the above requirement. However, image and video data are affected by line of sight - users need to point their devices towards a specific direction to recognize the context. And since speech recognition includes only verbal content, it does not carry enough information about the environment.
Zamplify is an integrated system consisting of five major components. The core of our project is a API, powered by a sound recognition model, that provides context awareness capability to connected applications. Other parts, including the Android app and IoT device, are peripherals built around the core API. They serve as the channels of audio input and demonstrate the use cases and potential of our API.
Sound Recognition Model
Our machine learning model takes processed audio as input and outputs the predicted context. The model processes data in two steps - a pretrained model first embeds the input audio into a vector representation of audio features; the classifier then transforms the representation into predicted context. The pretrained model only accepts raw audio (also known as bytecode) , compressed audio like MP3 needs to be first converted into raw format 14 before being fed into the model.
The role of Zamplify API is to handle recognition requests from client applications, and provide complementary application functionalities, such as user system, push notification, 3rd party integration, and so on. In fact, it is a collection of APIs that deal with different aspects of Zamplify. Its functionalities include but is not limited to the following.
Our Android app serves as a channel of audio data collection and, at the same time, demonstrates the use cases of our API. It records sound automatically every minute and sends it to the server for context recognition. It also allows users to define trigger-action pairs like in IFTTT.
The IoT device that we built is an Internet-connected, always-on sound recognizer that can be used in homes and offices. It demonstrates the power of sound recognition in the context of a smart home or smart office.
Our project result demonstrates that a sound-based context recognition system has huge potential, either as a standalone service or a complementary part of an integrated context awareness API. We cannot wait to see further application of our system. A brief demo of our project is available below.