Vocal Joystick Home

Engine Diagram

The VJ system consists of three main components: acoustic signal processing, pattern recognition and motion control. First, the signal processing module extracts short-term acoustic features, such as energy, autocorrelation coefficients, linear prediction coeffients and mel frequency cepstral coefficients (MFCC). Signal conditioning and analysis techniques are needed for accurate estimation of these features. Next, these features are piped into the pattern recognition module, where energy smoothing, pitch and formant tracking, vowel classification and discrete sound recognition take place. This stage involves statistical learning techniques such as neural networks and dynamic Bayesian networks. Finally, energy, pitch, vowel quality and discrete sound become acoustic parameters to be transformed into direction, speed and other motion related parameters. The application driver takes the motion control parameters and launches corresponding actions.

VJ architecture flowchart

Valid HTML 4.01 Strict Valid CSS!