Mummy's Little Grammatical genius

How babies learn to use language.

A special note of caution to students.  My description of how a baby probably acquires language is speculative and is not a part of any well established theory.   If you are studying for exams you should not rely on any of my 'facts' except where I have given links to external documentation.

How Do Babies Learn Language ?

A baby is born with no language abilities, other than the ability to make a few simple sounds.   Within a few years it has learned to hold a conversation. How is it that a baby is able to learn such a skill?   Any scientific explanation of how a baby learns language must incorporate explanations in at least three areas of linguistics: phonology, syntax and semantics. At a higher level - the level of socially effective communication - other factors such as psychology and pragmatics need to be incorporated into the theory.

Many of the technical terms used by linguists are used differently in different schools of linguistic thought. For that reason, I shall clarify my own specialist use of technical terms as they arise. Phonology deals with the sounds of speech, syntax deals with the sequences of sounds in speech and semantics deals with the transmission of information by the use of speech. Human language is an information transmission system. Language does not have meaning - it carries meaning.

What is language, that a baby may learn it?

Human language is a socially developed tool.   Every sound, every word, every meaning assigned to a sound, word or phrase is merely a social convention, a social habit. We are born into a language-using society and so we automatically and unthinkingly accept our society's conventional ways of using language. Those conventions arose over the course of centuries. In any human group, people tend to copy some aspects of each other's behaviour so as to form a group identity. As we conform to social norms, by that very act we reinforce social norms.   An aspect of language evolution that is too often overlooked needs to be mentioned: a language can never evolve through adult use into such a complex mechanism that babies are unable to learn it.  In a very real sense, a new-born baby is the final arbiter of 'correct' use of language.

Language is just one specific form of socially normalised human behaviour. But it is so different from other forms of behaviour that we tend to see it as a 'thing' having a separate existence from its users. Language is also seen, not as a tool of society or a product of society but as the identifying characteristic of a society. Human groups identify members and non-members primarily by the use of language, for good or ill. Social groups might use a special password, but for language-using groups, language use is itself the password, or shibboleth.

Then said they unto him, Say now Shibboleth: and he said Sibboleth: for he could not frame to pronounce it right. Then they took him, and slew him at the passages of Jordan: and there fell at that time of the Ephraimites forty and two thousand.

The fact that different groups speak different languages, or at least different dialects, is strong evidence that babies are not 'hard-wired' to speak any particular variety of language. A genetically coded, hard-wired language would be universal, but there is little of human language that is truly universal. All human babies produce the same range of speech sounds. The sound production mechanism, or phoneme module, is a language universal. A baby doesn't learn the 'extra' sounds of a particular language, but rather, the production of sounds which are not in the parent language falls into disuse. In some cases, sounds which are not heard as different in the parent language may be produced indifferently, the classic example being the sounds of l and r in Japanese. In Japanese, the difference in the sounds of l and r is not of any semantic value - the difference is truly meaningless.

There have been various suggestions about how a baby might have a universal mechanism for acquiring syntax in similar fashion to phonemes. The suggestion is that a baby has a universal grammar which can learn the word sequencing rules of the parent language and discard or disregard the rules which don't apply. I do not subscribe to that, or any broadly similar theory. I suggest that a baby is born with an ability to acquire the sounds of a language, the regular patterns of those sounds that we call words and the sequences that we call syntax. There is, I believe,  no 'hard-wired' grammar.

The bootstrap problem.

If a baby has no hard-wired grammar, how can it ever learn more than mere babbling? How can it learn to put sounds into sequences to form words, and words into sequences to form sentences? We are so used to the idea of learning new words through explanation or demonstration that it appears impossible to learn the meaning and use of any word without using words as a learning tool. Learning language without using any tool of language seems like trying to pick yourself up by your own boot straps. That is why the problem of a child's first language acquisition steps is called the bootstrap problem.

A baby is not a miniature adult. The gestation period of any animal is optimised by natural selection - optimised as the best match between the needs of a mother and the needs of a baby for survival. A too immature baby is too vulnerable, but an over-mature baby is larger - it imposes a multiple extra burden on the mother. The mother has to carry and internally nurture the baby for longer; the baby is heavier, posing a greater mechanical stress on the mother; the baby is larger, posing a greater birth risk to the mother. A human baby is no exception to the evolutionary rule: it is helplessly immature when born. Most specific to the topic of language acquisition: a baby's speech production organs are not mature enough at birth to make all possible speech sounds.

There are two sides to language use: speaking and hearing. We know that a baby, even if 'hard-wired' could not produce many of the sounds of adult speech. What about hearing? A baby cannot tell us what it finds interesting, but a baby's sucking rate correlates interestingly with its attention to novelty in the environment. Various experiments based on this High Amplitude Sucking rate observation suggest that a baby's hearing is adequate to discriminate meaningfully different speech sounds, that is to say, phonemes. This suggests an approach to the bootstrap problem.


A stochastic approach to bootstrapping.

Building a sentence is a matter of selecting words according to the information to be conveyed and assembling them to form syntactic units - conventional sound strings. One of the simplest facts to grasp about language is that all people tend to use some words much more frequently than they use others. George Kingsley Zipf used the tools analogy - tools which are used most frequently are kept witihin easy  reach, and tend to be smaller. There is a feedback mechanism here - words which are heard more frequently tend to get used more frequently. This introduces an element of predictability or probability into language use. Language is not just a matter of syntax and semantics - it is also a stochastic process.

In all languages the distribution of words by frequency of occurence lies within a closely bounded area. In general, if we list all of the words in a large enough sample by frequency of occurence, there is a strong correlation between position in the list, or rank, and frequency. This property of human language is known as Zipf's Law. It has proven to be a valid observation about all human languages.  It was once thought not to apply to Chinese, but this was due to a false assumption that each Chinese character is a stand-alone word.   That error has since been corrected - Chinese language conforms to Zipf's Law.

I suggest that any computer model aimed at simulating a child's acquisition of language must use a stochastic model. This does not imply the use of mathematical functions to determine probabilities - I very much doubt that the human brain performs sophisticated mathematical functions during ordinary language processing at any level.

Buckets of words.

There is a standard test of language comprehension which uses templates, a sort of 'words and slots' pattern. For example, the incomplete sentence "The girl went to the shop to __ a newspaper." might be used. The examinee fills in the blank. The examiner expects the examinee to provide a verb. This template method can be used to analyse text corpora to find nouns, adjectives, verbs, adverbs etc.

Using the template method it is possible to have a computer program produce highly accurate lists of parts of speech from unedited text corpora. The program can do this job unsupervised, that is to say, with no user intervention. The reason it can work unsupervised is that the templates, provided by the human programmer, function in the place of overt human supervision.

Babies are, I am certain, not born with their tiny brains filled with templates. And yet they sift through a huge amount of input language and sort the words into appropriate syntactic categories. We know that they sort words into mental buckets, but where do they get the buckets? I conclude from my experiments with computer programs that the answer lies mainly in the stochastic element of language.

All research on child acquisition of words focuses on the child's overt use of, and focus on, the content words of language - the nouns, verbs, adjectives and adverbs. I suggest that a baby can only discriminate these words from a stream of speech sounds by first finding the grammar words. This is counterintuitive - children have no intuitive grasp of the distinction between grammar words and content words. However, we know that children group words into the categories which we might call nouns, verbs etc. They have the categories - it is only the labelling of those categories that they lack. There is a valid reason for this - a baby can have a set of word buckets into which words are dropped according to simple stochastic rules, and any labelling of the buckets is irrelevant.

Building the buckets.

The human short-term memory can hold only limited data. It appears to operate in such a way that the most recently stored items are the most easily accessed. I suggest that, for language, there exists a short term memory for storing phoneme strings as words and a separate short term memory for storing word strings as phrases. In each of these memories, it is the last part of the string that is first matched as a pattern with items in long-term memory.


The LAD program - a Language Acquisition Device

A computer simulation of the stochastic method matches patterns of sentence-final word pairs and the letters at the ends of those words. A book in ASCII text format is scanned into computer memory. The words in the book are read into a lexicon and a counter for each word keeps track of the frequency of occurence of that word. The lexicon is sorted by frequency and only the few highest frequency words are retained as an ordered list. These high-frequency words are used with the end of sentence marker - the stop or period - to find high frequency sentence-final word pairs. A second program function searches for high-frequency word-final letter sequences - the suffixes of the target language.

The method has been tested for English, French, German, Spanish, Italian and Latin. In each case, it creates word categories which a human observer would identify as components of the grammar of the target language. But the program itself is purely stochastic - it has no built-in words or parts of words, no syntax and absolutely no semantics.

A baby has a tremendous advantage over a computer. A baby has eyes and hands. A baby can see an object and frequently can grasp the object. The combination of vision with the haptic senses is exceedingly powerful. A baby can build an internal model of an object. Having already acquired some words and stored them, the baby can begin to learn how to use these words to label internal models. The internal model is visuo-haptic, so when a baby uses a word it knows what the word 'means'. 'Cup' means the visuo-haptic sensation aroused by handling a cup, or the mere memory trace of having handled a cup.

Concluding remarks:

I conclude from my studies and my computing experiments that the first-stage acquisition of language is a stochastic process of first distinguishing the words and suffixes which convey grammatical information. This creates the templates, the word-buckets into which content words naturally fall.   Stochastic analysis of input speech samples is sufficient to account for the acquisition of content words.  It is not, however, sufficient to account for the semantic application of content words.

A baby can do something that a computer program cannot yet do - it can put words to use as tools. All that is needed is to match a string of phonemes, already stored in a category, to an object, action or property observed in the environment. The matching of a word to a meaning is a semantic function. A baby is not just a stochastic language-analysis machine. A baby is a world-class expert in semantics!