Van Wijngaarden's "Linguist" in 1952

Dated: 

October 1952

The father of Dutch computing, Adriaan van Wijngaarden, was an engineering student during World War II. With his electric Marchant calculator stored in his attic, he ground out numbers to solve a turbulence problem in aerodynamics. Becoming increasingly aware of the social relevance of his numerical calculations, he joined the newly founded Mathematical Centre in Amsterdam in 1947 [1] and became the head of its Computing Department. Well aware of the rapid technological developments that were taking place in England and the USA, van Wijngaarden spent most of 1947 visiting computer experts in both those countries [2, p.106].

As early as 1946, the British researcher Andrew Booth and the American scientist Warren Weaver had met to discuss the idea of mechanically translating one language into another. Weaver had discovered that, for a wide variety of languages, the basic logical structures have important common features; that is, languages share certain invariant properties. Weaver believed that all languages contain basic elements, and that code-breaking technology (from World War II) could help detect those basic elements. He expressed these ideas in writing in his 1949 memorandum Translation [3, Chapter 1] which, in turn, sparked intense American interest in the possibility of Machine Translation (i.e. automatic language translation).

Weaver advocated going “deeply into the structure of languages as to come down to the level where they exhibit common traits” [3, p.23]. Instead of trying to directly translate Chinese to Arabic or Russian to Portuguese, Weaver supported the idea of an indirect route: translate from the source language into an “as yet undiscovered universal language” and then translate from that language into the target language. In the words of Locke & Booth from 1955:

[There has] been speculation about the possibility of a multilingual machine in which the input would be translated into any of a number of output languages or vice versa. This line of thought usually involves an interlanguage into which the input is first translated. [3, p.12, my emphasis]

From an epistemological perspective, one can say that the “universal interlanguage” of the linguists was a precursor to the programmer's “intermediate machine-independent” programming language — a claim that I will substantiate in a forthcoming publication.

Andrew Booth and Aad van Wijngaarden were linguistic friends. Although van Wijngaarden was not a professional linguist, he did demonstrate his close acquaintance with automatic language translation in his 1952 inaugural lecture, entitled `Computing and Translating' [4]. In his speech, van Wijngaarden discussed a scenario in which two Dutch-speaking people converse with each other by means of a telegraph cable or a letter. In the interest of time or money, the message should be as short as possible. Therefore, the message should preferably be expressed in what van Wijngaarden called “the minimal Dutch language” [5].

In reality, of course, Dutch speakers do not use such an ideal language. Therefore van Wijngaarden suggested to consider the more realistic scenario in which two people communicate with each other in the normal (“redundant”) Dutch language but with the help of two computing machines. The message of the first person is translated by the first machine into the “machine's language” and subsequently sent via a digital communication channel to the machine of the second person. That machine, in turn, translates the received message back into the redundant Dutch language for the second person. In such a setting it is, in the interest of time and money (but at the expense of two machines), important that the machines use the minimal Dutch language to communicate.

In an even more realistic setting, both people do not necessarily speak the same language. Consider, then, the more general problem of communication between a Dutch speaker and an English speaker. In order to have a computing machine automatically translate a Dutch text into English, a Dutch-to-English dictionary was required. Such a dictionary, while necessary, was insufficient to guarantee automatic translation. For, as van Wijngaarden noted, several “semantic ambiguities” arise during a word-to-word translation from Dutch to English: many Dutch words have more than one English translation, and it is often not clear which translation should be preferred. For example, the Dutch word “bank” can be translated either to the English word “bank” or to “bench”, depending on the context in which the Dutch word is used.

One way to resolve the problem of semantic ambiguity was to include the human in the translation process. For example, one could let the computer generate all possible translations, along with some notes, and then have a human post-editor manually produce the correct translation from the notes. Another possibility was to have a human pre-editor take the necessary manual steps such that semantic ambiguities could not arise during the automatic processing. But, to solve the problem in a completely automatic manner, van Wijngaarden suggested that the machine should be able to “learn” which word-to-word translation is the right one given the context of the word at hand; that is, given the neighboring words of the word under consideration. To do this, syntactic knowledge about both the source language (Dutch) and the target language (English) had to be encoded into the machine. This task, in turn, conflicted with the limited state of the art in 1952. In van Wijngaarden's words:

Linguistics today is, from a mathematical point of view, still in its infancy. We have some intuitive ideas of syntax, but we hardly have any idea about the rules that we apply. [4, my translation]

In order to derive the rules and to use them in mechanical translation, van Wijngaarden suggested that the linguist and the programmer should work closely together.

From the 1950s onwards, van Wijngaarden's appeal to linguistic beauty inspired Edsger W. Dijkstra and other computer programmers at the Centre, paving the way for the design and implementation of the programming languages ALGOL60 and ALGOL68.

 

[1] G. Alberts, Jaren van berekening. PhD thesis, Universiteit van Amsterdam, 1998.

[2] G. Alberts and H.T. de Beer. De AERA. Gedroomde machines en de praktijk van het rekenwerk aan het Mathematisch Centrum te Amsterdam. Studium, 2:101-127, 2008.

[3] W.N. Locke and A.D. Booth, editors. Machine Translation of Languages. The Technology Press of The Massachusetts Institute of Technology and John Wiley & Sons, Inc. New York, 1955.

[4] A. van Wijngaarden. Rekenen en vertalen. Technical Report DR 8, Mathematisch Centrum Amsterdam, 1952.

[5] My footnote:

Such an ideal language amounts to having the 26 most frequently used words be one letter long, the following 676 most frequently used words be two letters long, etc. Moreover, van Wijngaarden noted that only words of at most 4 letters suffice in order to capture all Dutch conversations. Such a language would be minimal in the sense that it discards all redundant combinations of letters which prevail in real spoken languages (cf. [4]). I hasten to remark here that van Wijngaarden was well aware that this was only half of the story. He knew that adding well-chosen redundant bits to an encoded message could also help increase reliability.

Tags: