A Thinking Machine: On metaphors for mind

The real question is not whether machines think but whether men do. The mystery which surrounds a thinking machine already surrounds a thinking man.”–B. F. Skinner.

The study of mind begins with a metaphor.

In the 20th century (and now on into the 21st) the metaphor that has dominated our study of mind is the computational metaphor.  The mind, they say, is like a computer.

But in what way like a computer?  In what respect, and in which dimensions?

This post was chosen as an Editor's Selection for ResearchBlogging.orgThe answer that has often been given – by Chomsky, say, and others – is that the mind is like a spreadsheet: a prefabricated architecture that follows a strict, rule-based program, that rigidly structures its inputs and outputs in just such a way.

If you should open a spreadsheet on your computer, you will find that all the functions they perform have already been coded in by a team of programmers.  Once you, as the user, have learned what these functions are, you can type in a series of inputs, press a button (or three), and the spreadsheet will spit you back an output.

There is nothing unpredictable about a spreadsheet.  To the contrary, everything is determinate.  Should you ask the spreadsheet to calculuate a square root for you, or give you a statistical mean, it will give you the precise answer every time.  But, should you accidentally type “w” when you meant to type “2,” the program will fail.  (#EH?!#)

To read Chomsky, is to understand that he envisions human language in just such a way.

“It is fair to say,” he writes, “that in any domain in which we have any understanding about the matter, specific and often highly structured capacities enter into the acquisition and use of belief and knowledge.”

The parallel is even clearer when he writes on universal grammar:

“We may think of the language faculty as a complex and intricate network of some sort associated with a switch box consisting of an array of switches that can be in one of two positions.  Unless the switches are set one way or another, the system does not function.  When they are set in one of the permissible ways, then the system functions in accordance with its nature, but differently, depending on how the switches are set.  The fixed network is the system of principles of universal grammar; the switches are the parameters to be fixed by experience.”

Human language, as so described, is underpinned by a universal structure; a pre-programmed grammar that organizes its inputs accordingly.  As a child, the process of learning a language is akin to flipping a series of switches – or defining a set of variables for a spreadsheet.  The spreadsheet already contains the logical structure; all that is being input are the particulars.

By this suggestion, when we hear or read language, the computational principles of this innate grammar conduct a series of logical operations, which parse the incoming stream according to its component parts, and so yield understanding.

But what if this is simply the wrong metaphor?

What if – say – language is more like a search engine?

This is the subversive and potentially revolutionary idea posed by Michael Ramscar in this month’s edition of Cognitive Science.

A search engine is a probabilistic, predictive learning machine.  Unlike spreadsheets, search engines do not engage with their input in a determinate, preprogrammed manner.  Instead of rigidly structuring incoming information according to some prefabricated set of rules, they discover structure within information.

To use a search engine, you need not have memorized a laundry-list of rules.  Instead, you can learn to better your search over time, by narrowing down which inputs are likely to yield the results you want.  At the same time, a search engine learns to optimize outputs for its users by determining (probabilistically) what results are expected or desired given a particular search term.  For example, should you search for “mldy dye,” Google will still find me for you.

A search algorithm is powerful because of its ability to learn from – and sift through – a massive stream of data.  But it relies on relatively simple learning algorithms to accomplish this, rather than a sophisticated, rule-governed architecture.  And it produces probabilistic, rather than determinate output.

This metaphor gives rise to a fundamentally different view of language: one in which language acquisition relies on relatively simple – yet powerful – learning mechanisms, and in which language comprehension and production is fundamentally predictive, rather than determinate.  In marked contrast to the traditional metaphor of mind, it does not suggest that a complex innate hardware (a “universal grammar” or “language acquisition device”) must be assumed to account for how children learn language.  Rather, it suggests that structure resides in the linguistic environment and is there to be discovered.

Of course, as Michael readily notes – which metaphor works best cannot be resolved by fiat – it is an open empirical question.

Even posing the question, however, goes against the grain of the last five decades in linguistic and psychological research.  Since the dawn of the “cognitive revolution” led by Chomsky in the fifties, many psychologists and linguists have assumed that a universal grammar (UG) is necessary to account for language acquisition, development and use.  Popular books such as “The Language Instinct,” would lead you to believe that the debate is long since up and finished.  But this is far from the case.

The supposition that there is a universal grammar has been repeatedly challenged by findings in comparative linguistics, in computational linguistics (and in particular, natural language processing), in corpus analyses, in philosophy, in neural network modeling, in work on learning theory, and in experimental psychology.  And yet, the approach and assumptions that characterize the Chomskian approach to language, and the old computational metaphor, continue to pervade research into language and learning, particularly in psychology.  The old guard has not given up the ghost.

Why is this important?

The ways in which we set up the problems that we research is governed by the assumptions that we bring to bear on them.  And when we set up the problems in fundamentally the wrong way,  it becomes near impossible to make substantive progress.  Think, for a moment, about how hard it would become to prepare a tasty meal if you completely disregarded quality and freshness of ingredients when cooking.  Even with the best of the recipes, it’s hard to get around the fact that you’re working with rotten meat and spoiled milk.  Similarly, we can throw all the math and all the money at research we want, but if we’re tooling around with bad ideas, it’s unclear what we’ve set ourselves out to accomplish.

The decades of research into the supposed mental architecture of language has yielded little in the way of practical import.  As Chomsky himself was quick to admit, he didn’t think that “modern linguistics [could] tell you very much of practical utility.”  And yet, it is clear that applying the simple principles of learning theory to language cannot only yield great insight into the workings of language – which may be of interest to philosophers and linguists – but can also substantially contribute to our understanding of how children learn language and point to practical interventions to speed language learning.

For example, in the lab I work in at Stanford with Michael, we have found how a simple property of word-learning can help children rapidly learn color and number words.  Both colors and numbers are notoriously hard for children to grasp, and typically take children many years to master.  Yet in both studies, we found that we could effect improvements that would normally take place over a time scale of months within a fifteen-minute training period.  We have applied the same principles to help three-year olds learn pass the DCCS and tests of false belief understanding, and have shown how learning models can explain how infants come to succeed at the A-not-B task.  We have similarly applied these principles to show why perfect pitch is so rare in the general population; why adults have so much difficulty learning new languages; and why children go through a stage in which they tend to over-regularize irregular plurals.  If all of that sounds like “so much Greek” to you, let’s just say that we’ve found answers to and solutions for a number of seemingly insoluble learning problems that have long vexed developmental psychologists.  And it's not just because we're clever: we’ve done it with simple models and simple behavioral interventions that anyone could adopt.

In theory, the models we use could be applied to a near-endless list of problems and questions in language learning.  And yet –  we are one of the only labs in the states to apply learning theory to language.  For the most part, learning theory has been completely forgotten by modern developmental psychology.  This is a shame, given that its practical utility is undeniable.  Claims that language as such is “unlearnable,” continue to dissuade most researchers from looking into the question.  And yet, these claims stem from a computational metaphor that is outdated, and for the most part, wholly ignored by modern research into natural language processing (when you dial GOOG-411 or any other automated voice processor, you can bet they’re not using generative, rule-based models of language to parse what you’re saying).

If you're interested in the debate that's been raging, I've listed some introductory reading below.  But in the upcoming weeks, I'm less interested in covering the blow by blow of nativist-empiricist arguments, than in introducing you to what I think are some of the exciting new research avenues in child language research.  The practical import of these discoveries is far more compelling than even the most persuasive arguments I might think to elucidate.

Recommended Reading

Chomsky, N. (2000). New horizons in the study of language and mind. Cambridge, England: Cambridge University Press.

Cowie, F.  (2008)  Innateness and Language.  Stanford Encyclopedia of Philosophy.

Evans N, & Levinson SC (2009). The myth of language universals: language diversity and its importance for cognitive science. Behavioral and Brain sciences, 32 (5) PMID: 19857320

Ramscar, M. (2010). Computing Machinery and Understanding (PDF). Cognitive Science, 34 (6), 966-971.

Rescorla, R. (1988). Pavlovian conditioning: It's not what you think it is. American Psychologist, 43 (3), 151-160 DOI: 10.1037//0003-066X.43.3.151

Roediger, R.  (2004)  What Happened to Behaviorism?  APS Observer.

Scholz, Barbara C. and Geoffrey K. Pullum (2006) Irrational nativist exuberance. In Robert Stainton (ed.), Contemporary Debates in Cognitive Science, 59-80. Oxford: Basil Blackwell.

24 responses so far

  • physioprof says:

    Cool-ass post! When I was a grad student in the late 1980s, we argued about this shit all the fucken time. I'm glad to see people still arguing about it!

  • Ryan says:

    As a linguistics undergrad with an intense interest in theory, I have to say that I really enjoyed this post, and I do hope there's more like this to come!

  • Jason G. Goldman says:

    It isn't clear to me how this probabilistic model is necessarily contrary to an innate mechanism. A mind could be prepared with a very small set of rules that it can then use to extract the linguistic patterns from its environment. That is, there could be an innate mechanism in place that then undergoes "tuning" on the basis of experience.

    • melodye says:

      @JG. Let me clarify: it's definitely not contrary to an innate mechanism. It's contrary to the supposition of a universal grammar, which is a fairly specific idea about what that innate endowment amounts to. The claim that there is an innate grammar also typically leads to the claim that there are innate concepts (c.f. Chomsky or Fodor).

      Here's some more classic Chomsky on the topic:

      "It is often argued that experience, rather than innate capacity to handle information in certain specific ways, must be the factor of overwhelming dominance in determining the specific character of language acquisition, since a child speaks the language of the group in which he lives. But this is a superficial argument. As long as we are speculating, we may consider the possibility that the brain has evolved to the point where, given an input of observed Chinese sentences, it produces (by an induction of apparently fantastic complexity and suddenness) the rules of Chinese grammar, and given an input of observed English sentences, it produces (by, perhaps, exactly the same process of induction) the rules of English grammar; or that given an observed application of a term to certain instances, it automatically predicts the extension to a class of complexly related instances. If clearly recognized as such, this speculation is neither unreasonable nor fantastic; nor, for that matter, is it beyond the bounds of possible study. There is of course no known neural structure capable of performing this task in the specific ways that observation of the resulting behavior might lead us to postulate; but for that matter, the structures capable of accounting for even the simplest kinds of learning have similarly defied detection."

      "there is good reason to suppose that the [nativist] argument is at least in substantial measure correct even for such words as carburetor and bureaucrat, which, in fact, pose the familiar problem of poverty of stimulus if we attend carefully to the enormous gap between what we know and the evidence on the basis of which we know it. The same is often true of technical terms of science and mathematics, and it surely appears to be the case for the terms of ordinary discourse. However surprising the conclusion may be that nature has provided us with an innate stock of concepts, and that the child’s task is to discover their labels, the empirical facts appear to leave open few other possibilities."

      These are arguments (or claims) that are very much in line with the traditional spreadsheet metaphor.

      • Note, however, that the position in the first paragraph in no way implies the claim in the second (to my eye, the second paragraph is a wild extrapolation of the first, not really even suggested bv it), nor the existence of a fixed 'Universal Grammar' that is like a grammar of an ordinary language such as Spanish, but supposedly universal. A bias to discover certain kinds of rules in the presence of evidence does not imply that there's any particular set of rules that will always be discovered (if the evidence for them is not there), and a list of innate concepts need not include things like 'carbureter'. For example Anna Wierzbicka's 'Natural Semantic Metalanguage' has about 60 innate concepts, but one like 'carburetor' would be defined in terms of these.

        That there must be a bias is a logical necessity, since there’s no single way to go from a finite body of experience to general system that applies to an infinite range of situations. But how best to describe whatever it is that’s producing the bias is, of course, a very hard problem, and current ideas about UG might well be severely underpowered, like trying to explain chemical valence with hooks and eyes.

        • melodye says:

          Reasonable as your last paragraph is, there's another alternative -- which is that UG is simply asking the wrong questions, because its characterization of language is wrong. Pullum and Scholz have done some excellent scholarship on the claims that underpin UG (see e.g., Recursion and the Infinitude Claim; Irrational Nativist Exuberance; Empirical assessment of stimulus poverty arguments, and so on).

          • One problem with the empirical assessment paper is that it seems to ignore the point that no finite amount of data can determine an infinite system without some sort of bias, which is I think why Baker et.al. talked about the "logical problem of language acquisition" in the early 80's, an issue that P&S seem to have trouble with. It's a 'logical problem' because you can't fix it by showing that more and better data is available to the child, because no matter how much there is and how high its quality, it's still not enough to determine the actual system acquired without some kind of bias (a sort of point made by Wittgenstein the the early parts of the Investigations, iirc).

        • melodye says:

          Here's a direct quote pulled from Pullum's article "Creation myths of generative grammar..."

          "It is very widely believed that Syntactic Structures gives a proof that English is not finite-state. This is not true. A few informal suggestions are made to support the assertion that ‘English is not a finite state language’ so that ‘it is impossible, not just difficult, to construct a device of the [finite automaton] type . . . which will produce all and only the grammatical sentences of English’ (p. 23). But there was no proof; and it is not clear that a proof anything like the one Chomsky seems to have had in mind can succeed." (p. 242)

          • Good paper, but it's about creation myths, not current work. Even Howard Lasnik's current work isn't dependent on them, let alone some of the more radical reformulations of generative theory such as LFG and HPSG (both centered at Stanford), many forms of categorial/type-logical grammar, Role and Reference Grammar, etc. Well, that there are so many of these is itself a problem, but a different one ...

          • As for finite state grammars, the real reason they're not enough for syntax is that they don't provide a basis for generalizing from data such as (glosses for some imaginary SOV language similar to say, Lakhota):

            Mary dog the chased `Mary chased the dog'
            cat the dog the chased `the cat chased the dog'


            cat the Mary chased `Mary chased the cat'

            The finite state word-class grammar for the first two sentences would be {Det CN|PN}CN PN V, providing no basis for expecting the third sentence, whose pattern is CN Det PN V. Any decent field linguist would expect (c), and usually they would be right (they would also check for it, to make sure that nothing wierd was going on).

            If you try to do a finite state grammar for a simple fragment of English, you wind up with a fairly horrible mess in which a fairly complicated pattern gets repeated in about 5 or 6 places (the main positions where NPs occur), so it looks pretty bad, and is clearly a hopeless account of what is learned. Also, it's not all clear how to decent semantic interpretations for recursive NP structures like the ones on the entertaining 'lakes and islands' page:


            (hope the comment software doesn't break the link)

          • oops messed up the example above, gloss should be 'the cat chased Mary'

          • and the grammar for the first two is {Det CN|PN}CN Det V, I need an edit button ...

  • Paul Murray says:

    This is really astonishing, that scientists and philosophers would discount the "growth" part of, well, of growth. Our DNA doesn't have a map of every nerve and blood vessel encoded into it. Our retina and it's nerves grow into place, and our brains *learn* to interpret the sensations from them as sight. We *learn* to walk - working out which micro acts of will cause our limbs to move as we wish.

    The idea of a universal grammar - surely it must have occurred to them that part of the reason human languages have underlying similarites is that we all deal with the same external reality: the physical universe, and other people. If that's true, then we would expect that languages invented by beings in completely different nonphysical realities would have structural differences. If a computer program ever achieved sci-fi "sentience", it would think in ways very different to us because it's universe is not this physical universe of objects that we inhabit. If it invented a language (let's say these sentient programs started communicating), that language would potentially be structurally different. Nouns, for instance. Whenever we talk about anything - well, my use of the word "thing" gives it away. We use "physical object" as a metaphor for *everything*. What will a language built by communicators that didn't lear to pick up and handle objects in babyhood look like?

  • GrayGaffer says:

    Makes a lot of sense. Clearly we do not all speak precisely the same language, nor share precisely the same understandings. This underlies the continued failure of 100% machine speech recognition, alongside the elusive contributions of "common sense" in speech understanding.

    I don't think one can construct a probabilistic machine on top of a deterministic rule-based machine. I think it is more of an associative machine with shades of input-output relationships (aka meanings). Auto-correlation, not if-then. Which makes it all the more amazing (to me) that we humans have managed to build these remarkable disciplines of thought and knowing (like Logic, Mathematics, Scientific Method) on top of such shifting sand foundations.

    Search engine. So one of these days is Google going to achieve self-awareness? Asking and answering its own questions? I feel a story coming on ... (but in a way already done, see "Stand On Zanzibar by John Brunner - Jeez! what an imagination I've got!)(and let's not forget Dwar Ev's experience;).

  • Will says:

    I find this post strange to read. I think I have a more flexible view of the 'universal grammar', and that means that I see no contradiction between that 'Universal Grammar' and learning - it is a false dichotomy. The confusion extends to the examples as well: I could program a spreadsheet to implement a search engine or other learning rule. 'Learning' can be a deterministic process: given a particular set of inputs (an agent's entire history) you might always get the same output (what that agent 'knows'). (Yes, you can make non-deterministic learning rules too, but your discussion above doesn't rule out deterministic ones.)

    If one says interprets the 'universal grammar' as a learning bias, or Bayesian prior, over the space of possible languages, then learning and 'universal grammar' cleanly fit together. The question then becomes 'What prior do humans use for learning language structure?', and "How do humans do their language processing efficiently?".

    Your quote from Chomsky in bold in the comments seems to be completely consistent with this viewpoint. There is a prior over the space of languages (which will include a prior over the space of language structures). If you give someone examples of one particular language then they'll converge to knowing/using the corresponding language structure.

    • melodye says:

      Will -- darling. A Bayesian! On *my* blog. This is really something. I feel that I am a 19th century imperialist and have just caught myself a rare and exotic peacock. (I may have been drinking Pernod in advance of this posting).

      But in all seriousness: Your comment (above) is a classic example of how the way in which we set up the question dictates the answers we expect. Psychologists and linguists have long thought of language learning as a question of how a child learns the syntax of a language, while ignoring outright the question of how that same child learns meaning. "Universal grammar" assumes that language is like a spreadsheet, and that learners converge on the same communication system based on the same determinate syntax.

      But the article referenced above is not about whether learning is deterministic -- it's about whether communication is; and the article argues that it isn't. If children don't learn a determinate syntax, and if communication isn't deterministic, then many of the strong claims about innateness that have been made (which are based on the assumption that communication is deterministic), well -- they may be somewhat besides the point!

      To put it another way, the kind of prior that you assume a child has with respect to learning is going to depend (quite a lot) on what you think that child has to learn. In language learning, this will hinge on what you take language to be in the first place. The article is about this question. To be fair, this question is rarely considered at all -- the usual debate in psycho-linguistics seems to take determinate syntax for granted, and go round and round about whether it is learnable or not.

      *I <3 Bayesians. But it's not the case that pretty-math helps solve problems when they's not set up right. Hence the ingredients rant you'll find above..

  • Dorid says:

    The idea of a universal grammar – surely it must have occurred to them that part of the reason human languages have underlying similarities is that we all deal with the same external reality: the physical universe, and other people. If that’s true, then we would expect that languages invented by beings in completely different nonphysical realities would have structural differences.

    I haven't read Chomsky, but I'd be fairly willing to wager that not only is the concept of "universal grammar" flawed because language similarities are a natural outcropping of similarities in culture, but that the languages studied didn't come from a terribly diverse cultural environment... nor was there any attempt to correlate the differences in grammar to differences in culture.

    I'll be pulling some of his papers now to test that theory.

  • skagedal says:

    Thank you, interesting post and nice blog! You introduce the post with a Skinner quote. I'd love to hear more about how Ramscar's theory relates to Skinner's "Verbal Behavior", and also to relational frame theory.

    A note: The link to the DOI for Ramscar's article under "Recommended Reading" is broken.

  • Andrew says:

    For what it's worth I applaud the sentiment in this post. But as a perception-action researcher I feel obliged to add 'and all this implies getting serious about how perception actually works'. If you're going to tell a story about how a child can get to the structure of a language, you're going to need a good story about how perception could possibly provide you access to it.

    Unfortunately, cognitive psychology theories about perception suffer from claims about the 'poverty of stimulus' as much as language did for Chomsky; killing that idea off will be very hard work.

    Given your angle with respect to language, I'd be interested to hear your take on perception - do you have a take? Or do you not take a stand? My hunch is the latter, because most of the cognitive people (whatever flavour) I know never think about how perception affects what problems their clever representational structures needs to solve, let alone whether whether they're even required. But I'd be interested to hear.

  • Efrain says:

    Great post Andrew. I just wrote an honors thesis on topics like this called "Neuroscience and Ethics" where I took the perception-action (or as I called it embodied dynamicist) perspective and argued against the cognitivist tradition for many reasons. I myself am a neuroscientist after graduating my undergraduate work.


    all I can really say, is that if you're still reading and arguing based on Chomsky, you're not going to go too far in the research world since you're basically limitting yourself to the ideas of the past. neuroscience research is about making new ideas! hence all of my work is based on philosophy done within the last 5-10 years, not decades ago. If you exanded your horizons a little, you may begin to read about why the chomskian framework won't take you far. but then again, cognitivists aren't too willing to move beyond the archaic ideals laid down by descartes.

    read more contemporary philosophy!

    • Andrew says:

      I would never advocate only reading the recent stuff; I firmly believe you need to know your history or else be doomed to repeat it. But yes, you need to know when it's time to move on as well 🙂

  • Ian Leslie says:

    Very interesting to this non-expert. Would this be relevant? http://bbc.in/9TBRqr

  • Spot on with this write-up, I seriously believe this site needs a lot more
    attention. I'll probably be back again to read through more, thanks for the information!