Participation to Interspeech 2007 (Antwerpen , August 27-31)
The first day was dedicated to the tutorials. I attended one of them on voice transformation, in direct link with my thesis. The author was Yannis Stylianou, from Crete. He introduced the basics in voice conversion and voice morphing using the time domain as well as the frequency domain. Future potentialities in voice transformation were discussed.
The four next days were devoted to plenary (in the morning), oral and poster sessions. I firstly have to admit I was impressed by the importance of the event : a complete week only for speech processing with 7 sessions at the same time… All the names we use to write in our bibliographies were present.
Most of the time, I prefered to follow the poster sessions – because oral sessions inherently required a strong background in the concerned field. Even if all this people work in the same domain (speech processing), it is impossible to have enough knowledge for understanding everything.
I particularly focused my attention to the oral session about Multimodal Speech Recognition because two presentations were very close to my master thesis. This session and the discussions which ensue were very fruitful.
On the other hand, I followed the sessions which dealt with 2 fields of my PhD :
The Lombard effect and speech in noise
Speech Synthesis using Hidden Markov Models.
For the first subject, three presentations were quite interesting :
- « Lombard Speech Impact on Perceptual Speaker Recognition », Ikeno, Hansen, University of Texas. Hansen was known for having writtena famous paper about the importance of Lombard effect in speech recognition.
- « Two-Stage System for Robust Neutral/Lombard Speech Recognition », Boril, Fousek, Hoge, University of Prague. The authors are one of the rare groups which built a Lombard databse.
- « Speech Synthesis enhancement in noisy environments », Bonardo, Zovato, Loquendo in Torino.
The authors use a dynamic range controller for improving speech intelligibility.
As for the second subject, two presentations caught my attention :
« An HMM-based Speech Synthesis System Applied to German and Its Adaptation to a Limited Set of Expressive Football Announcements », Krstulovic, Hunecke, Schroeder.
The authors managed to reach a good voice quality even with little data for the training.
« Implementation and Evaluation of an HMM-based Thai Speech Synthesis System », Chomphan, Kobayashi, Tokyo Institute of Technology.
I had the opportunity to converse a lot with the author. It gave me a lot of ideas for implementing it in French.
Participation to MMSP 2007 (International Workshop on Multimedia Signal Processing- Chania, Crete October 1-3)
I was the first author of a paper dealing with feature selection (I wrote it at EPFL, Lausanne, Switzerland). Unfortunately, the date coincided with the beginning of my FNRS grant. One of my colleagues in Switzerland had the opportunity to go in my place for presenting a 20 minutes oral session (and for benefiting from the Greek beaches and sun). Here is the paper details:
Image & Video I
Tuesday, October 2, 10:13 - 10:26
RELEVANT FEATURE SELECTION FOR AUDIO-VISUAL SPEECH RECOGNITION
Thomas Drugman; Faculte Polytechnique de Mons
Mihai Gurban; Ecole Polytechnique Federale de Lausanne (EPFL)
Jean-Philippe Thiran; Ecole Polytechnique Federale de Lausanne (EPFL)
We present a feature selection method based on information theoretic measures, targeted at multimodal signal processing, showing how we can quantitatively assess the relevance of features from different modalities. We are able to find the features with the highest amount of information relevant for the recognition task, and at the same having minimal redundancy. Our application is audio-visual speech recognition, and in particular selecting relevant visual features. Experimental results show that our method outperforms other feature selection algorithms from the literature by improving recognition accuracy even with a significantly reduced number of features.
Seminar of Information Technology research center (FPMs , Mons, Belgium, October 11th)
The Lombard effect: analysis and applications: the Lombard effect refers to the speech changes due to the immersion of the speaker in a noisy environment. These modifications are observed on an acoustic, phonetic as well as an articulatory point of view. Through an hyper-articulation (unconsciously most of the time), the speaker placed in a communicative context aims at maximizing the intelligibility of his utterances. After an analysis of the different changes produced, hindrances induced in automatic speech recognition and future potential applications in speech synthesis will be discussed.
Lecture at Computational Intelligence and Learning doctoral school (Louvain-la-Neuve, November 5th)
Kernels on graph nodes and their application to link analysis : In this elementary tutorial, he presented an interpretation of Kandola et al.'s von Neumann kernels in the context of link analysis, with an emphasis on their relationship to the HITS importance ranking method. He then talked about the effect of 'topic drift,' a problem which was first observed with HITS, but affects the von Neumann kernels as well. The property of the von Neumann kernels is also compared with the kernels based on the Laplacian matrix.
Introduction to conditional random fields and other discriminative sequence labeling methods : In recent years, the conditional random field (CRF) have become a popular method in natural language processing. It has not only served as an effective alternative to the hidden Markov model in sequence labeling problems, but also provides a generic framework that are applicable to a wide range of applications. This lecture started with a tutorial on the basics of CRFs, and their alternative algorithms that are more light-weight. Some natural language processing tasks were described to which these algorithms have been applied.
Tutorial on Quartz Composer and Isadora (FPMs, Nov 28th and 29th PM)
Raphaël Sebbe and Celine Mancas-Thillou, both doctors in Image Processing, presented tutorials on famous visual programming environments.
Quartz Composer is a node based visual programming language provided as part of the Xcodedevelopment environment in Mac OS X v10.5 "Leopard" for processing and rendering graphical data.
Isadora is a proprietary graphic programming environment for Mac OS X and Microsoft Windows, with emphasis on real-time manipulation of digital video. It has support for OpenSound Control.
Tutorial on Max-MSP (FPMs, December 6th PM)
Nicolas D’Alessandro, PhD Student in Singed Voice Synthesis, presented a tutorial on Max-MSP, a real-time sound processing programme.
Max is a graphical development environment for music and multimediadeveloped and maintained by San Francisco-based software company Cycling '74. It has been used for over fifteen years by composers, performers, software designers, researchers and artists interested in creating interactive software.
Blender is a free software3D animation program. It can be used for modeling, UV unwrapping, texturing, rigging, skinning, animating, rendering, particle and other simulating, non-linear editing, compositing, and creating interactive 3D applications.