|
|
|
This paper and the projects it describes would not have been possible without the many talents of Hiromi Takahashi Yampol, who served as both translator and voice. Thanks also to Todd Yampol and James Giangola
for their help with this paper. |
[2] Mangajin's Basic Japanese through Comics, Part 2. 1996, pp. 54-59. |
Given names, titles, and ambiguityDespite our attention to the format of full names, the Japanese do not use a person's given name in business. Their family name and title (Fig. 1) is used instead. This means that callers would not be likely to use a person's full name, even if they know it. There was no plausible way to politely ask a caller to give a person's full name. There would have to be so much explanation and apology that the call would be quite long. Even considering that Japanese callers tolerate more wordy explanations from machines and tend to show more cooperation, we decided that we could not expect people to break with tradition. We decided to ask for the name in the simplest terms possible, even if this would lead to callers omitting the given name. Because the dialer was designed to deal with thousands of employees, the logic to handle ambiguity was already in place, but it would be exercised more in the Japanese version. In the English version, the grammar had only full names, but in the Japanese version, given names were an optional part of the name. In the English version, it was assumed that since callers would say full names, any ambiguity would be resolved by saying a person's department or location. In the Japanese version, given names could be the most common disambiguating factor. |
Figure 1. |
||||||||||||||||
[3] Kodansha Encyclopedia of Japan. 1983, Vol 5, pp. 324-5. |
To make matters worse, the most popular Japanese family names (Fig. 2) are very popular. In Japan, the names Sato and Suzuki each account for more than 1.5% of the population. A Japanese company of just 72 employees is more likely than not to have two Satos or two Suzukis! Certain German names are also quite commonboth family and given namesso there were many German employees with the same full names. Japanese custom offers some help, but our database did not. Though Japanese business people omit the person's given name, they give the person's title, for example "President Tanaka" and "Department Head Yamamoto." Fortunately, these titles are shorter in Japanese than in English, just a couple of syllables. Unfortunately, title information was unavailablemy client was not a Japanese company, and so did not assign these traditional titles. A Japanese company would use these titles, and a properly designed auto-attendant ought to use the titles to distinguish people with the same family names but different corporate positions. Asking callers which President Tanaka they mean would not be a good idea. |
Sato
Figure 2. |
||||||||||||||||
NumbersLike many Asian languages, Japanese does not make a distinction between singular and plural for nouns and verbs. This simplifies some things for both speech input and output. Otherwise, Japanese numbers are much more complicated, especially for creating natural-sounding speech output. In Japanese, there are multiple ways of counting. For counts of less than ten, there is a generic system, but otherwise each class of object to be counted has its own counting suffix, and fluent-sounding Japanese requires them. So there is one set of numbers for long, slender things like pencils, another for people, and days of the month have their own counting system (Fig. 3). This means that many versions of the numbers must be recorded, even for small applications. For speech input, the counter can be used to resolve ambiguity. For example, in making a hotel reservation, the number of people is easy to distinguish from the number of nights and the number of people. Even if they all represent the same number, they are distinct words with distinct sound. |
[4] Lampkin, Rita L. Japanese Verbs and Essentials of Grammar, 1997. pp. 110-117. |
Figure 3. |
Phone NumbersJapanese digits are different from the counting forms of numbers. The traditional form of the digit four sounds the same as the word for death, and is the first syllable of the word for seven. The form shi, therefore, is almost never heard in phone numbers, replaced with yon. Shichi and nana are both used for seven, even though shichi contains shi and sounds like ichi (one) and hachi (eight). Zero is heard as both zero and rei. This leads to a more complex recognition grammar. When saying a phone number, many Japanese people use the particle no between the parts of the number (Fig. 4), more commonly between the prefix and last four digits than after the area code. Japanese area codes have varying length, and the length of the phone number varies depending on the area code. Though it is tempting to allow variable length phone numbers and an optional no in any position, variable length digit strings result in less accurate recognition than more constrained recognition grammars. Moreover, no sounds like go, the digit five. For phone numbers read back by the dialer, we omitted the no, using a pause instead. This practice is also common, and sounds modern and efficient. We hoped callers would imitate it. For the US, a simple grammar for ten digit numbers is sufficient, and it does not need to be modified as new area codes appear. For Japan, the grammar must account for each area code, and this grammar must be maintained, since area codes are sometimes split, and additional digits are added as cities grow. This is also the practice in the United Kingdom and elsewhere. |
(03) 3224-5000, US Embassy, Tokyo
(025) 245-3331, Hotel Niigata, Niigata
(0476) 28-1010, English directory, Narita airport
Figure 4. |
|||||
RecommendationsGood software engineering practice leads to separating the language-dependent code and data from the language-independent componentsnamely, the outgoing speech (prompt) logic, the recognition targets (grammars or vocabulary lists), and even the application logic itself. Early attention to all target languages gives you the best chance to avoid problems. Some systems allow outgoing speech to be customized by providing a series of blanks to be filled in. Unless it allows the order of these blanks to be switched around for various languages, this will surely be insufficientit wouldn't even be able to handle Japanese names, for example. Even filling in blanks is sometimes insufficient. In Japanese, the type of object being counted is needed in order to produce a natural sounding numberthis is an additional parameter to the prompting logic. Furthermore, context information may need to be carried from one question to anotherfor example, the grammatical gender of an item may need to be known in order to ask "how many of them?" The parsing (or natural language) capabilities of the underlying system go a long way in abstracting the differences among languages and deriving the meaning in a form useful to the rest of the application. Since the parser deals with ambiguity, keep in mind that some forms may be ambiguous in one country but always clear in another. The 12 hour clock, for example, results in ambiguous times, but it is less common outside the US, where a 24 hour clock is used. The flow of an application may need to be adjusted for localization. In the dialer, Japanese callers are asked more often to disambiguate employees with matching family names. For a smaller company, this disambiguation logic would not have been necessary in English. Even so, additional effort to accommodate users can benefit speakers of many languages, and allow the software to adapt to more situations. Finally, since a particular language may require context information that other languages don't use (Japanese titles, for example), the languages you target may affect the design of the underlying database, and affect the cost of collecting, encoding, or maintaining the data (as with Japanese area codes). Knowing which languages and cultures you are targeting, and knowing something about them must be part of the early design stages, and not postponed until after the first language is complete. Since improved customer service and the promise of friendly applications is driving the adoption of speech recognition applications, developers need to pay special attention to the finer points of language and culture. Comfortable callers are likely to be more cooperative. Not only does this mean that they will be more successful using the application, but also that they will choose it over more expensive alternatives or competitors' services.
|