Umit Guz Receives the TUBITAK Career Award

January 5, 2008
Umit Guz, Visiting Postdoctoral Researcher in 2007, received The Scientific and Technological Research Council of Turkey (TUBITAK) CAREER Award for his project, "Extracting and Using Prosodic Information for Turkish Spoken Language Processing". Guz has returned to Turkey where he will conduct this research over the next two years, advised by ICSI's Dilek Hakkani-Tur and SRI's Gokan Tur and Mural Akbacak. Here is a description of the project:

In this project, generally, extracting and using the prosodic and lexical features of the spoken language (Turkish) in spoken language processing are aimed. More specifically, this includes sentence segmentation of an automatic speech recognizer output.

The text which the output of the Automatic Speech Recognition (ASR) system lacks especially punctuation, differences in the capitalization and the parameters related to the speaking such as stress, tone, pitch, pause cause some differences in the meaning. Enrichment of this output or another words to gain this features back to the output will provide either reading and understanding of the humans or processing of the machines easily. The aim of this project is doing this enrichment and the process of gaining back by using the prosodic features of the spoken language.

In this subject, there are some studies and achievements especially in English and the similar languages. In this project, the Turkish is preferred because of its unstudied language and our native language. Furthermore, it is aimed that, sharing the results and the database obtained at the end of this research with the other academicians and researchers who study in the area of Turkish language processing and using for the further studies and applications in this area.

Many useful results have been obtained by using prosodic information for language understanding tasks in English (and similar languages) - for example for sentence and topic segmentation and emotion detection. However, languages which display a substantially different behavior than English, like Turkish, and Hungarian (in that, they have agglutinative or inflective morphology and relatively free constituent order) have not been examined extensively. In this proposal, we would like to examine the extraction and use of prosodic information in addition to lexical features for spoken language processing of Turkish. Specifically, we would like to research the use of prosodic features for sentence segmentation of Turkish speech. Another outcome of the project will be a database of prosodic features at the word and morpheme level, which can be used for other purposes such as morphological disambiguation or word sense disambiguation.

Turkish is an agglutinative language, that is, given a word in its root form, we can derive a new word by adding an affix (usually a suffix) to this root form and then derive another word by adding another affix to this new word, and so on. This iteration process may continue several levels. A single word in an agglutinative language may correspond to a phrase made up of several words in a non-agglutinative language. Thus, the text should be analyzed morphologically in order to determine the root forms and the suffixes of the words before further analysis.

In the framework of this project, we also would like to examine the interaction of prosodic features with morphological information.

The role of sentence segmentation is to detect sentence boundaries in the stream of words provided by the Automatic Speech Recognition module for further downstream processing. This is helpful for various language processing tasks, such as parsing, machine translation and question answering. We formulate sentence segmentation as a binary classification task. For each position between two consecutive words the system must decide if the position marks a boundary between two sentences or if the two neighboring words belong to the same sentence.

The sentence segmentation process will be established by combining the Hidden Event Language Models (HELMs) with discriminative classification methods. The HELM takes into account the sequence of words and the output discriminative classification methods such as decision tree that is based on prosodic features such as pause durations. The new approach combines the HELMs for exploiting lexical information, with maximum entropy and boosting classifiers that tightly integrate lexical, as well as prosodic, speaker change and syntactic features. The boosting-based classifier (using words and all prosodic features as input) alone performs better than all the other classification schemes. When combined with a hidden event language model the improvement is even more pronounced.