Epic Failures in Language as Prediction

Nov 20 2010 Published by under From the Melodye Files

Am happy to announce that my month-long hiatus from blogging -- spent on a whirlwind tour of Chicago, New York, Portland & St. Louis -- is finally coming to an end. ย Before jetting for Psychonomics, Prof Plum and I filmed a spot for Science Saturdays, which is now online! (Good!) But -- which has been more or less universally panned by a clique of rabid commenters over at BloggingHeads (Less happy-making -- mrarm, but possibly deserved).

So, with some hesitation, I am linking to the video, with time notations listed below. If you do end up watching, please be aware that there were some serious technical difficulties midway through the conversation, particularly in the 37-51 minute mark. Plum's headset was -- how shall I say? -- utter crap, and made a very noisy channel all the noisier. And then there was construction and trash trucks. And surely things weren't helped by my undeniably terrible listening comprehension skills; in grade school, I standardly scored below the 10th percentile on listening comp, alongside the ESL kids.*

What's interesting about all this is we didn't spend much time talking about language as prediction -- but the video is, in its own gentle way, a study in how comprehension is a function of prediction (and how failure to predict = failure to comprehend). There is research to suggest that in most phone conversations, people use highly predictable, repetitive speech -- recycling canned phrases, talking about well worn (familiar) topics, and so on. The reason? The limits of the channel. In a phone call, not only is the sound wave attenuated or even broken-down and delayed in places, but we miss all of the attendant visual cues -- the movement of the lips, subtle changes in expression, gesticulation with the hands, etc. All of which is to say that BloggingHeads is an interesting experiment in communication, because it demands conversation of a certain caliber and complexity, while imposing real limits on the two communicators.

In the video we recorded, Prof Plum decided to focus on a topic (capacity limits) that we had actually never spoken about before, in the three years we've worked together. For me, I thought he was going some place completely different with the conversation, which, as you'll see, leads to all sorts of predictive errors. "So you mean... duh duh duh" "No, no, actually I mean... x y z" "Huh, but I thought..." and so on. I'm not going to argue that this makes for compelling video, because a lot of these communication breakdowns get in the way of either one of us getting to the point. However, having rewatched the video -- with undistorted audio and intact visuals -- I actually understand what Plum's trying to get at, and it's pretty fascinating stuff.

The annotations I've provided below are actually meant to help you understand what's going on, as you're watching. Unfortunately, a lot of the topics get picked up, but then never make it past second-base, in the mire of "what did you say's" and "pardon?" So if you have any topics or questions that you'd like addressed, succinctly, feel free to use the comments section of this post, and I'll do my best to respond!

Time Notations

01:12 "How do we learn?" : One of the most interesting questions in cognitive science
02:10 The unpopularity of learning theory in psychology
02:49 How learning works and what we know about it
03:10 We accept our biological continuity with animals, but not our cognitive continuity
03:54 Darwin and "The Expression of the Emotions in Man and Animals"
04:36 Using animal models to understand how humans learn
06:00 By ignoring animal models, we misestimate the potential of human learning
06:21 The forgotten revolution in learning theory in the late 1960's
07:18 Rescorla's classic "background rate" experiment and error-driven learning
10:38 Learning meaningful relationships in the world
11:32 Universal grammar versus general learning mechanisms
13:11 Most arguments for universal grammar come out of misunderstanding of how learning works
13:26 Divergent literatures: the behavioral neurosciences versus developmental psychology
15:20 Our contemporary computational view of the mind is based on outdated computer systems
16:35 Informativity and the workings of cognition
17:34 Conscious attention and unconscious processing
19:09 Capacity limitations? Or learned 'inattention'?
20:45 How do we predict what's for dinner?
21:54 Information comes in systems
22:31 When we talk about "capacity limitations" we often confuse input with output capacity
23:52 Learning is about filtering for a purpose
24:33 We distort the world in service to our cognitive needs
25:50 How do we choose the right computational metaphor for understanding language and cognition?
27:53 Imposing 'truth' on the world
28:33 In the history of ideas, it's not unreasonable to want to see truth in the world
29:52 The view that the world is rule-based arises out of early notions of computation
32:24 Is the brain finely, innately structured or is it good at discovering structure within the environment?
33:34 Which metaphor is better for mind: spreadsheet or search engine?
37:19 Children's delay in color learning
39:10 Understanding word learning in terms of "informativity" rather than as a "mapping" between words and the world
41:26 Folk psychological confusions
42:27 The ubiquity of color is precisely what makes it difficult to learn
47:23 You don't learn a "word," you learn a system
51:18 The lack of detail in psychology means that many "basic" concepts never get cashed out
51:47 Equality of access to early learning
52:40 The tension between what's optimal in an established communicative system and what's necessary for learning
54:21 The practical import of learning models in education
56:28 How do we learn to discriminate wines? Learning is driven by difference, not similarity
59:30 The Coke / Pepsi Test
1:02:18 How does your name influence your personality?

*Curious about why? I had severe, recurrent ear infections as a toddler which meant that for many of the years in which I was developing language, my hearing was significantly impaired. In fact, to this day, while I'm no longer hard of hearing, I still show auditory-comprehension deficits. For example, in group conversations, I've trained myself to look for social cues as to when to laugh or react, because I often mishear crucial details in stories, or punchlines in jokes. (When it comes to song lyrics, it gets all the more absurd). This also manifests in everything from my steady avoidance of books-on-tape, to my (seemingly bizarre) habit of watching movies -- in English -- with subtitles. Often I wish the world came subtitled; I like reading so much better.

15 responses so far

  • John Smith says:

    Hi all,

    Very nice post, despite the technical problems. The idea of language processing as being heavily dependent upon prediction is intuitively appealing to most of us. But has anyone actually attempted to give or articulate an explicit, computational account of what exactly prediction is doing during sentence processing?

    Any papers/blog posts that someone might share would be of inestimable help to me, as I'm contemplating this at the moment.

    Is the poster of this article the first to make this argument about prediction being necessary to process a noisy/degraded signal? If not, could you point me to some paper(s) that discuss(es) this?

    John Smith

    • melodye says:

      There are many papers you might look to that show predictive processes at work (neurally) during sentence processing.

      Jos van Berkum's lab is a great place to start. For a 2008 review, see e.g.,

      You might also look to work done in the visual world paradigm and sentence processing fields, by e.g., Gerry Altmann, Yuki Kamide, Mike Tanenhaus, Delphine Dahan, Jim Magnuson, Richard Aslin, Sarah Brown-Schmidt, Seana Coulson, Kara Federmeier, Marta Kutas, Thomas Urbach, and so on (by no means an exhaustive list). Here are a few examples:

        Kutas, M., Federmeier, K.D. Event-related brain potential (ERP) studies of sentence processing. In: G. Gaskell (Ed.), Oxford Handbook of Psycholinguistics, Oxford: Oxford University Press, 2007, pp. 385-406.

      In terms of offering a computational account of how prediction operates in language processing, Prof Plum (seen in video) and I have several papers demonstrating (and modeling, computationally) that predictive, temporal processes are at work in language learning. The ideas of prediction and prediction-error in language are not new (see e.g., Elman, 1990), but most of the work that's been done has been built within connectionist architectures, which (to my mind, anyway) load quite a number of questionable assumptions about learning into their models. By contrast, we work with a simple, error-driven learning model for which there is abundant neural evidence (see e.g., Waelti, Dickinson & Schultz, 2001).

      Our work, broadly construed, is information theoretic. If you are not too familiar with work in that vein, I would recommend looking at some of the recent work by Florian Jaeger & Roger Levy. Florian provides a nice overview in his recent Cog Psych paper:

      Hope this is helpful! ๐Ÿ™‚

      • The work you've cited here is evidence that people do some predictive processing during comprehension. There are lots of ways this might be helpful: allowing the listener to have more information earlier, providing some redundancy given the noisiness of the channel, helping check comprehension, etc.

        Is that what you are talking about when you talk about "language as prediction?"

        • melodye says:

          I mean, no. We want to make the stronger claim that prediction (how well predicted something is) is actually what comprehension amounts to, and that language learning is driven, in large part, by prediction and prediction-error. We know that other domains of human learning and decision-making are; I don't think it's so much of a logical leap to think that language might be.

          To be fair, however, I don't think I could possibly do justice to that kind of claim in a comment; it's why we're writing a lengthy BBS-style article on the topic, at the moment.

          One thing I would say is that our interest is in taking general learning-mechanisms (e.g., error-driven learning) and looking at the extent to which that can explain certain learning phenomena. As you know, my fundamental problem with a lot of nativist arguments, is that they set up the problem as if error-driven learning couldn't possibly (i.e., logically) explain a range of complex phenomena. But that's an empirical question -- and I do think a lot of the 'logical' arguments come out of a limited view of the explanatory power of basic learning mechanisms.

          Too often psychologists use 'universal grammar' the way some believers use 'God' -- as a catch all for what they don't understand. "Here's this thing that doesn't make sense, therefore let's posit X." My approach to this is quite nearly the opposite -- let's not worry about the space of what we don't understand as yet; instead, let's look to what we can fruitfully explain. I think the basic Chomskian approach is plagued by uncertainty and discomfort with that uncertainty, and it tries to impose an order on language -- but I think in so doing, it trades away science for bad philosophy.

          Then again, all of this depends on your description of what language is (i.e., what the learned end state is and what communication amounts to). And we emphasize that it's predictive and probabilistic on both dimensions, and that much of the theoretical suppositions about language are flawed. This goes back to Wittgenstein --

          "We want to say that there can't be any vagueness in logic. The idea now absorbs us, that the ideal 'must' be found in reality. Meanwhile we do not as yet see how it occurs there, nor do we understand the nature of this "must." We think it must be in reality; for we think we already see it there. ...It is like a pair of glasses on our nose through which we see whatever we look at. It never occurs to us to take them off.

          It's worth noting, by the way, that a lot of these predictive processing studies fly in the face of, for example, Miller & Chomsky (1963). If Chomsky had been right, people shouldn't be able to track the probabilistic distributions of words in a given stream. This was one of his basic arguments against general learning. Yet there is ample evidence that they do, and that children do (see e.g., Bannard & Matthews, 2008 and our replication of it).

          • "predictions is what comprehension actually amounts to"

            That's what I figured the claim was, based on previous papers from the lab. I admit I have no guesses as to what this is supposed to mean, so it'll be nice to see a full-length treatment.

            I know your beef with your definition of nativism. I'm skeptical that anyone advocates the kind of nativism you have a problem with. Certainly, nobody approaches nativism by saying "hey this doesn't make sense" and positing some innate feature; rather, the approach is to take an ending state, take a learning mechanism, and work out what necessarily must be in the starting state. This is pure scientific method.

          • melodye says:

            @gameswithwords Read the Cognitive Psychology paper we just published. My disagreements are not illusory. Moreover, unless you've proven that something needs to be posited (i.e., some form of innate structure), it's not clear what positing innate structure does, other than displace the problem. If you say, "we couldn't figure out how this could be learned, so it must be innate," what does that advance?

          • *Everybody* agrees there is innate structure, even you. Either that, or you think the only difference between yourself and a lump of coal is the environment in which you were raised. So how does addressing the question of what differences are there between you and a lump of coal advance science? I'll let you figure that out for yourself.

            And stop it with this claim that anyone goes around saying "we couldn't figure out how this could be learned, so it must be innate." It's unsophisticated and silly. I agree that it's useful to present your own ideas in the strongest possible terms in order to generate debate. Mischaracterizing others' ideas in order to create straw men is just wasting everyone's time.

            Learning is a four-variable problem. We must determine the starting state, the ending state, the input, and the learning algorithm. If you know any three of these, you can prove what the other must be. These days, we have a pretty good handle on the input as well as -- Tomasello-like claims to the contrary -- the ending state, so most of the debate concerns the learning algorithms and the starting state. So yes, you can prove, given a particular learning algorithm, what the starting state must be.

            This is science 101. Refusing to engage in the process....well, it sounds a lot like a lump of coal.

          • dan says:

            "These days, we have a pretty good handle on the input as well as โ€” Tomasello-like claims to the contrary โ€” the ending state, so most of the debate concerns the learning algorithms and the starting state. So yes, you can prove, given a particular learning algorithm, what the starting state must be."

            the cognitive psychology article is a nice demonstration of the fact that you (in particular) have no idea what the end state is.

            which in turn means that if you follow your own recipe, your "proofs" are guaranteed to be empty.

            seriously: why not try reading and thinking a little bit more, and posing a bit little less.

  • John Smith says:

    (Aside from the paper that's in preparation, of course!)

  • razib says:

    screw the haterz ๐Ÿ˜‰

    • melodye says:

      Thanks, doll. Apparently I'm "annoying" and "inexperienced" and he's an "arrogant asshole." It's kind of like -- actually, why don't you try explaining the last ten years of your research to a lay audience across a crap phone line, with ten minutes to spare to catch a flight?

      I simply don't understand why online commenters feel entitled to be so nasty --although, I suppose it all goes back to the sniveling empowerment of anonymity. I can only imagine the academic who would be brave enough to call Prof Plum an "arrogant asshole" to his face. It's not something one would say lightly to a 6'3" former rugby player.

      • razib says:

        BHTV has an unfortunate comment culture IMO. a large minority think they'd be better "heads" than the heads. the personal comments are childish, and are the main reason i never engaged in the comments when i had some appearances, and probably the main reason i'm not inclined to want to do that again. it was fun while it lasted ๐Ÿ˜‰

  • "Shecky R." says:

    Way back in the 70s when I closely followed psycholinguistics there were a number of language 'shadowing' studies which I think well indicated the predictive nature of speech processing (I believe William Marslen-Wilson did several of them, but I haven't kept up with the field or with his work in particular, but thought he did excellent work at the time).

  • VMartin says:

    I listened a video couple of minutes but became perplexed. I wonder what words like "cognition" really means and what the pretty woman and a guy with tousled hair are talking about.

    Firstly, one needs to define basic concepts. Cognition? - "Cogito ergo sum" meant Descartes. On the other hand there is obviously a difference between "cogito" and "intellego" because I can "cogito" that I am not - even though I "intellego" that I am (Anselm vs. Gaunil).

    This differences and nuances between "cogito", "intellego" and "scio" are pretty important I would say. Now those Latin words which scholastics (or even linguists at the beginning of the 20th century like neo-Kantian Anton Marty and his "innere Sprache" or partly Husserl) distinguished obviously fused into "cognition" and the word is so "rude" nowadays that it has not the capacity for subtile reasoning anymore. Unless you have subtile surgery devices do not operate, and if we don't have precise words we shouldn't speculate about such a subtile problem like "memory". Otherwise it may ends up like some Pinker's neo-darwinian fancies how "our brains process irregular verbs".
    English verbs, not Latin of course.