One of the difficulties in computer translation, says Donia Scott, a computational linguist who is head of the Information Technology Research Institute (ITRI) at Brighton University, is that of ambiguity. A document to be translated is a representation of information that must be expressed in the second language, Scott says. Not only can the same piece of information be expressed in many different ways in either language, frequently the same sentence can have several different meanings.

In spite of this, modern machine translation systems can be very reliable. At least one multinational computer and electronics company uses computers to translate its instruction manuals into foreign languages. The ambiguity problem is solved by writing the documents in controlled languages a subset of the original language which contains no ambiguities. A sentence in a controlled language can only have one meaning so it can be translated directly into a foreign controlled language without errors.

However, producing a high quality translation remains time-consuming and costly. Companies that need manuals in many languages are turning to computational linguists for new ways to reduce the time spent on translation and still achieve good quality manuals in several languages.

More problems occur when manuals need to be modified. They are particularly acute in the aircraft industry. Aircraft have huge quantities of documentation. According to Phil Marchant of British Aerospace Defence at Warton in Lancashire, the bid to produce the replacement for the Nimrod maritime patrol aircraft weighed more than 3 tonnes.

Any modifications to an aircraft make it necessary to change the manuals. Sometimes the changes must be made decades after the manuals were written. The Canberra, still in service after more than 50 years, has undergone three modifications in the last two years, Marchant says.

Finding all the parts of the manuals that need to be changed is a huge problem which British Aerospace Defence are trying to solve with innovation and technology, says Marchant. They recently collaborated with the French company Dassault Aviation and Edinburgh University to produce a prototype system called Ghostwriter, which enables a computer to do a large part of the work of producing and modifying manuals.

The same problem is being addressed at ITRI, who are collaborating with software companies and technical writers to produce Drafter which generates drafts of software user manuals in French and English.

Ghostwriter and Drafter consist of two main parts: a domain model, which represents all the technical information needed to service the aircraft; and a text generator, which translates the appropriate parts of the model into human language.

Different text generators are used to produce manuals in English and French. Ghostwriter and Drafter use well-established techniques for representing knowledge in computers and for turning knowledge into natural language. Their novelty lies in the fact that the different language versions are produced without translation. Systems like Ghostwriter and Drafter may not save much time in producing the first version of a manual. Expert document writers are still needed to help build the domain model. However, it is easier to keep manuals up to date, because any change made to the domain model automatically causes changes in all relevant manuals. New language versions can be produced easily by adding text generators.

The text generators in Ghostwriter and Drafter depend on analyses of large bodies of appropriate text to produce, among other things, a lexicon of the appropriate words and situations for their use. The same analyses can be applied to different bodies of language, sometimes with interesting results.

Adam Kilgarriff and Roger Evans of ITRI analysed the 5m words of conversation in the British National Corpus, a collection of spoken and written English gathered in the UK in the last few years. They looked for words that men use more than women and vice versa. The recorded conversations were made by volunteers drawn equally from different age groups, from social groupings AB, C1, C2 and DE, and from 38 areas round the UK.

The mens top 10 words were: grand (meaning 1,000), That bloke, against, a, fast, Da (as in Da di da), The, Jesus and Engine. The womens list was: she, her, cooking, shopping, lovely, kitchen, likes, apples, thought and made. Vive la difference.

More From This Section

First Published: Feb 28 1997 | 12:00 AM IST

Next Story