Intelligent Nihilism

Sep 30 2010 Published by under Forget What You've Read!

I wanted to register a quick reply to some of the comments on last week's post "The question is : are you dumber than a rat?"  In the comments there, and in posts on other blogs, our research program has been accused of intelligent nihilism.  By one such characterization, our position is that "we don't how the brain could give rise to a particular type of behavior, so humans must not be capable of it."  Though I think the label is quite witty -- and would love to have badges made for the lab! -- I think this misrepresents our stance rather badly ; our argument is that many of the properties that linguists have attributed to language are either empty theoretical constructs (hypotheses that are not supported by the empirical evidence) or are conceptually confused (and have been shown to be so; by Wittgenstein, Quine and many others).  We are not denying that language -- and linguistic behavior -- are complex; rather, we are rejecting a particular  stance towards language that we think is theoretically and empirically vacuous.  This does not lead us to nihilism, but rather to a different conception of language and how language is learned.

In any case, the comments on last week's post prove to be fertile ground for discussion, so I've posted them (in pared down fashion) along with a brief response.  The full comment thread can be found at the original post.

"One more time for the road: there are some simple learning models that can (maybe) account for learning concrete nouns. I’ll even give you some concrete adjectives like blue. I’m not sure whether that counts as complex linguistic behavior, but it certainly isn’t “much of complex linguistic behavior.” It’s *certainly* not what motivates Chomsky’s or Pinker’s work.

This is a little like saying you don’t need all that fancy quantum mechanics to explain perfect spheres moving in a vacuum — classical mechanics will do just fine. That’s essentially true, but it misses the point. If you’ve got a “simple learning model” that can account for things like verbs and quantifiers, let’s see it."

And then a follow-up:

"I’m sorry, but there is no model of general learning that captures human language. For example, if you look at the best speech recognition systems, as when you phone the bank and the “lady” apologizes because she could not understand that and would you please say something or hold for the operator, it is via a model of mapping acoustic patterns onto pre-determined lookup tables; albeit in a probabilistic, fuzzy way.

Sure there are teeny models for some horribly obvious relations in the world and in some speech tokens; but beyond that all of general learning is just a promissory note. Of COURSE parts of language interface with other cognitive systems and of COURSE they rely on perception. But equally, this is not enough, as a striking lack of talking rats might suggest.

In the end I care about which theory makes the better predictions – not for learning simple word meanings, but for accounting for the organization of the core grammatical properties of language like constituency, categories, recursion and systematicity."

The question I would pose is : How does assuming that semantics and "syntax" are somehow "innate" explain anything?

To date, the output of both Chomsky and Pinker  has been a great deal of rhetorical flourish tacked on to bad faith arguments, many of which violate the most basic principles of scientific reasoning.  I'm not sure what motivates their contemporary research program or those of their adherents; at times, I'm tempted to think that it's one of the most brilliant academic Ponzi schemes to be mounted in modern times.

Unfortunately, the usual response -- and indeed both of your responses -- are illustrative of a dubious line of reasoning that far too often rears its head on that side of the debate.  The response, of course, is never to list the great empirical discoveries of Chomsky or Pinker or linguistic nativism at large.  Instead, it is to argue that because an alternative theory doesn't explain everything, it must be far worse than a theory that explains nothing.  The hope seems to be that the finer details of this logic will get lost amidst the scuffle.  (That this is the kind of reasoning used is not surprising; as I mentioned in "But Science Doesn't Work That Way," the logic underpinning Miller and Chomsky (1963) is similar; compare "we haven't got one model working, therefore it is categorically impossible for any such model to work" to "these models haven't explained everything yet, which means they never will").

The aims of our research program are straightforward : Ramscar's models start out with basic perceptual representations, and explain and predict some very puzzling aspects of the way nouns and adjectives are learned. They also predict previously unobserved phenomena. I find this interesting.  I also find little reason to believe that these kinds of models couldn't be productively applied to a much broader range of linguistic phenomena.

Compare the concrete models in Ramscar's papers, and the concrete psychological predictions they make, to Pinker's "Words and Rules" theory -- perhaps the most obvious example of linguistic theory as intelligent design. When it was first put forth, the theory was that "only irregulars are stored" and that regulars were computed by rules. Then, when it was clear that this wouldn't allow the "model" (which has no implementational detail whatsoever) to account for the true scope of regular data, bing! suddenly some regulars -- just enough to fit the data -- were allowed to be stored.  Of course, this has had the handy effect of making the model unfalsifiable; it simply incorporates whatever the latest learning model says it has to, plus a rule that is somehow supposed to "explain" everything else. Each time it can be shown that another aspect of how inflection works is better fit by a learning model, the story simply switches to how there are still "phenomena that haven't been accounted for," and the claim is made that somehow (somewhere) this counts as evidence for hardwired rules.  Of course, in the Pinker model, these rules have no computational or representational properties other than those that are immediately convenient; Pinker is, of course, the ultimate intelligent designer of his theories.

Is this, I wonder, to be our preferred model of "science"?

As for "the core grammatical properties of language like constituency, categories, recursion and systematicity" -- these are descriptive hypotheses about the way language works, not actual (demonstrable) properties of language. The history of the last 50 years of linguistics and psychology is that, like Pinker's "rules," "constituency, categories, recursion and systematicity" are fun buzz-words to to type into an argument, but Lord help anyone who has to actually cash out what they mean or how they work; these theoretical constructs hide a horrible mess of confused intuitions that don't seem to fit with how people actually process language at all.

What does it even mean to believe in "constituency, categories, recursion and systematicity" anyway? Geoff Pullum and Barbara Scholz have an amusing (and quite brilliant) paper about systematicity claims, that points out that most claims about systematicity are neither well defined, nor even -- dare I say -- systematic. Recursion is a property of one way of modeling language, but it begs a number of questions, not the least of which is how 'categorical' language really is.  Consider, for starters, the observable patterns of how words distribute in natural language.  These distributional patterns make clear that there are no naturally occurring categories.  To make a grammatical category meaningful, you would need to be able to enumerate a set of rules that applied (i.e., generalized) across every word in it, but the linguistic data show these kinds of categories are little more than useful fictions; indeed, much of the careful analysis that has been done in corpus linguistics shows (quite plainly) that such rules simply don't exist.  Positing these kinds of linguistic categories is a useful shorthand when talking about language, but I think it's a mistake to forget that what they actually amount to is a kind of theoretical squinting.

All of which leads me into the most frustrating part about all of this -- which is that most researchers who want to claim that language depends on "constituency, categories, recursion and systematicity," prefer to ignore the fact that these are such vague notions that no one can actually tell you what "constituency, categories, recursion and systematicity" actually are.

Are we supposed to have faith that someday, someone out there will figure it all out for us?

The problem with faith-based linguistics, is that it's about as friendly to open scientific inquiry as the inquisition, and people who take the faith-based approach spend a lot of time tuning out (or shouting over) the 'noise' of reasonable dissent.  I suppose it's easier than figuring out what  "constituency, categories, recursion and systematicity" are, or whether these ideas even make sense.

Why do you look at the speck of sawdust in your brother's eye and pay no attention to the plank in your own?

Intelligent Eliminativism

Pullum, Geoffrey K., & Scholz, Barbara C. (2010). Recursion and the infinitude claim. In Harry van der Hulst (ed.), Recursion in Human Language (Studies in Generative Grammar 104), 113-138 DOI: 10.1515/9783110219258.111

Pullum, Geoffrey K., & Scholz, Barbara C. (2007). Systematicity and natural language syntax. Croatian Journal of Philosophy, 7 (21), 375-402

Pullum, G., & Rawlins, K. (2007). Argument or no argument? Linguistics and Philosophy, 30 (2), 277-287 DOI: 10.1007/s10988-007-9013-y

Gross, M. (1979). On the Failure of Generative Grammar. Language, 55 (4) DOI: 10.2307/412748

Culicover, P. (1999) Syntactic Nuts: Hard Cases in Syntax (Oxford University Press)

A Mess of Clarifications

"If you think the Chomskyan model is like a spreadsheet, the Ramscar model like a search engine, or even if you think in terms of neural nets, for now these are all just variants of computer programs. I don’t see the paradigm shift in comparing your notions of language to computer program A vs program B."

If you want a complete answer to this, you should read my post on this subject and the original papers, Ramscar, Yarlett, Dye, Denny & Thorpe (2010) and Ramscar (2010).  I think you would be hard-pressed to conflate the two 'paradigms,' particularly if you read the discussion section in Ramscar et al.  But for now, here's a brief reply : While both of these are computational metaphors of mind, they suggest very different things about how humans process information.  A spreadsheet rigidly structures the information it's given according to a strict, rule-based program; a search engine is probabilistic and discovers structure within its environment.  To bring this back to language : Chomsky argues that we have sophisticated preprogrammed mechanisms for learning and processing language, and that children use this hardwired template to impose structure on verbal input ; by contrast, Ramscar suggests that that structure already exists in the available linguistic information (and our environments) and that children use domain-general learning mechanisms to discover it.  I'm not sure how you can argue that imposing structure and discovering structure are the same thing; that would be a funny sort of language game.

"Mo’s argument is that what you can learn is constrained by the representations you have. I’m not sure which part of that computational “metaphor” is out of “date”.  Are you arguing that you can have a learning theory that doesn’t have representations? Certainly all the Ramscar models have representations (a perceptual representation is still a representation), and it’s hard to imagine what such a model would look like.  Or are you arguing that you have model that can learn *anything* regardless of the representations it uses? For instance, it could learn differential equations entirely in terms of colors. That’d be one hell of a model and I’d like to see it!"

Representations have to be learned either by the individual or by the species, via evolution.  We have a good idea of how the mechanisms that give rise to perceptual representations have evolved.  These are different in kind from the made-up representations that populate Chomsky's fantasy world of linguistic perfection (which, as far as I can tell, change their fundamental character at least once a decade -- no mean evolutionary trick!)  I certainly think that what you learn is constrained by the representations you have ; that's why I think that understanding has to be probabilistic.

19 responses so far

  • [...] This post was mentioned on Twitter by Melodye, ScientopiaBlogs. ScientopiaBlogs said: Intelligent Nihilism [...]

  • Sean says:

    Wow. I consider this article to be a strong synchronicitous event, since I happen to run the website Hope all is well for everyone.

  • GuessHandsOn says:

    String theory currently has no empirical support either. It's also 'theoretically and empirically vacuous.'

    When Albert Einstein came up with the theory of general and special relativity, there was no empirical support for it either. Scientists kept clamoring about how Einstein was wrong. See here:

    And yes, the entire discipline of linguistics is wrong about language acquisition. Thanks to Michael Ramscar for saving us all.

    [An ad hominem in this comment has been redacted. All joking aside, implying that I -- or any of my collaborators -- are anti-semitic is way out of line.]

  • If you're going to appropriate my terminology and accuse me of being part of a Ponzi scheme in the same post, you could at least do me the favor of linking to my post.

    Comment: It's interesting that you think that building on a theory in response to new empirical findings is bad. Just sayin'.

    Correction 1: Formal linguistic models make tons of predictions. So does Word & Rules. That's why they change from time to time: some of the predictions turn out to be wrong, and the theories must be amended.

    Correction 2: The Ramscar models you cite don't use basic perceptual representations -- at least, the representations aren't part of low-level or even mid-level vision. They are part of what people in the community would call "high-level vision," that is, merging into the kind of stuff "concepts" people study.

    Nihilism: I think your argument above meets my definition of intelligent nihilism pretty well. You say there are no complex representations and deny the existence of the behavior (such as recursion) they are meant to describe. But it's a denial by stipulation: let's see you take a typical sentence used as an example of recursion, or of a verb argument alternation, etc., and explain the phenomenon without recursion or argument structure. Maybe you can do it, but it won't be by using a model that has only primitive perceptual information (assuming such a model even exists).

    • melodye says:

      Re: Correction 1 : So, "Words and Rules" actually makes an interesting test case, because Pinker starts off by acknowledging that irregulars have to be learned via an associative learning mechanism. At the same time, he wants to distinguish regulars from irregulars, and claim that regulars (special class that they are) are rule-based. Here's from the abstract :

      "The distinction may be seen in the difference between regular inflection (e.g., walk-walked), which is productive and open-ended and hence implicates a rule, and irregular inflection (e.g., come-came, which is idiosyncratic and closed and hence implicates individually memorized words."

      The problem is that if it can't be shown that regulars are a closed class and have the properties he's claiming they do (and he keeps backing up on this one), then his entire theory is shot through. In the recent Huang & Pinker (2010) paper, the objections he raises to Ramscar (2002,2003) and Moscoso del Prado Martin & Baayen (2005) are a feeble grabbing at straws. Worse for him, many of the objections he raises re: the problems of learning don't apply to discriminative models. These days, it's no longer clear just what his argument is.

      Maybe I should do a brief post about "Words and Rules" -- if that would be helpful? I mean, some of Pinker's statements (e.g., "The meaning of a sentence is computed from the meanings of the individual words and the way they are arranged") have been widely accepted as false since the 1950's -- by cognitivists like George Miller, no less.

      Re: Correction 2: We use Rescorla-Wagner, which is one of the most widely used (and studied) models in animal learning. And of course we study concepts! (See e.g., the first fifty pages of Ramscar et al (2010))

      But more to the point : I'm confused about why you're still talking about representation in this way. The perceptual priors we start with are nothing like the ones assumed by those who would espouse a generative grammar. For instance, the representations in Ramscar et al (2010) can be derived from simple visual receptive fields that link to color maps. That they were modeled at a higher level so readers could understand them is irrelevant... In any case, it's hardly the same as assuming that concepts like carburetor are innate, a la Chomsky. (I know Pinker wrote a tediously long book about why he doesn't believe that -- the problem is, his "solution" doesn't do what a generative grammar needs; instead, it just distances Pinker from some of the more crackpot consequences of the position he has committed himself to.)

      As a nativist yourself, you might want to consider that these are problems that haven't been solved -- it seems that you should be more concerned with the theoretical underpinnings of your chosen school of thought and less with the representational details of learning models you're going to reject out of hand anyway. But that's just my two cents... :

      Re : Nihilism : Eliminativism is not the same as Nihilism.

      • Regulars are an open class. That's the phenomenon. But perhaps you meant something else by "open class" other than "class with an unbounded number of entries," which is how it's usually used. The graduate change in Pinker-like theories is that they originally gave a low estimate for how much redundant information would be stored. THere's some neat Bayesian modeling work in progress now that confirms that it actually *is* optimal to store a fair amount of composed forms redundantly, even when you could compose them on the fly. But it's still not optimal to store everything.

        You should check your references, btw, the carburetor example comes originally from Fodor. If Chomsky ever used it, it was a quote. Just the way the argument about negative evidence simply *doesn't* come from Chomsky, it comes from Braine. You quoted Pinker making exactly this statement, but still you continue to attribute the argument to Chomsky.

        Color maps will only work for your 2010 model as long as you live in a world in which nothing changes orientation or position or is ever occluded. As soon as you let that into your world, your model will either require the kind of high-level constructs you included in the paper version, or it won't work. As I've said before, I have no particular interest in concrete nouns, which are possibly the easiest problem in language (this isn't backtracking from anything -- remember that generative grammar wasn't postulated to explain concrete nouns). The point is that even when you deal with the simplest phenomenon in language, you still end up having to postulate complex structure.

        This is why I find this insistence by some people in the literature to argue that you don't need structure simply odd. Let alone that such accounts don't work; I don't even think the people who are putting them forward actually believe them. Structure is everywhere. It may be highly abstract and complex. It may be relatively simple and constrained. The interesting question is: what structures must we postulate to explain the data. That's what generative grammar has always been about, and that's why it changes: new data comes along arguing for or against the necessity of various structures. That's why Pinker's theory has changed over time. The fact that they don't get all the data right immediately isn't really an argument against the approach. At least they are trying to account for complex data of the sort no connectionist or associationist or discriminative model has ever been applied to.

        • dan says:

          "At least they are trying to account for complex data of the sort no connectionist or associationist or discriminative model has ever been applied to"

          Some connectionist models work functionally as association networks, and some work as discriminative models. Theoretically and computationally they are worlds apart, even if the people who make them haven't always gotten that completely figured out. Which means I'm not sure what you are trying to say with this blanket claim. Are you?

          Let's stick to specifics:

          The learning model in Ramscar et al 2010 is a perceptron. It is employed in a discrimination network based on prediction. The learning model in this is a perceptron. It is employed in a discrimination network based on prediction (the training set is much richer, but it is still sparse compared to the training data a child gets). The learning model in this is a perceptron too, although the training set is more abstract. (And yes, there are "high-level representations" in use in these models, but there's nothing about them that couldn't in principle be learned.)

          I suppose you could say that these are just parsing models. However, when it comes to accounting for complex data, the Pinker "model" (which has no formal properties at all) is a sad joke in comparison to even the simplest.

          One last point relating to this: No one is arguing that language and learning are unstructured. There is structure in time and in space, for example, and once you introduce learning, these can be translated into a lot of structured information. The big questions are what *kind* of structures in this information are people sensitive to, and what/how do they learn from it?

          Do you really think that making stuff up about models you don't understand somehow helps shed light on these questions?

          • melodye says:

            Oh, stop showing off, Dan. Busting out the comp ling papers left and right. You'll scare the psychologists! 😉

          • Actually, these aren't even "just parsing models" -- one is a word-identification model. I'll give that it probably does a better job of word-identification than a generative grammar, but only in the sense that a lawnmower makes a bad skillet.

            The other one is slightly more interesting, in that it's learning the dependencies between abstract grammatical structures. One modeling assumption is that "input sentences have been automatically [POS]-tagged in a pre- processing step." To the extent that the goal of the model appears to be learning tree structure and it assumes abstract nodes already, it's in many ways a model of generative grammar, with one part of the computation being handles by a perceptron.

            They're nice papers, but I'm not sure how they support your argument. Perhaps you can explain more.

          • yourfriendlyneighborhoodcompscidept says:

            People are still using perceptrons in comp ling? Maybe Minsky will come along and kill the entire field for 20 years again 😀

        • What did you mean "open class" to mean? I wasn't nitpicking? I simply have no idea what you're talking about.

      • VMartin says:

        “The distinction may be seen in the difference between regular inflection (e.g., walk-walked), which is productive and open-ended and hence implicates a rule, and irregular inflection (e.g., come-came, which is idiosyncratic and closed and hence implicates individually memorized words.”

        I've written it previous time and I would like to repeat it. This is a special case of English verbs as everybody knows. Pinker should adress also Latin, the language educated Europaan scholars were thinking in for millenium. Perhaps he could elucidate some "rules" not only on simplistic go - went - gone vs. walk-walked-walked , but on Latin irregular "to go" - eo, ire, ii, itum vs regular audio,audire,audivi,auditum.

        Sorry, but maybe here are someone who doesn't know:

        present/imperfect/future/perfect/pluperfect/future perfect

        eo, ibam, ibo, ivi, iveram, ivero for Active Indicative first person
        eam, irem, -,iverim, ivissem,- for Active Subjunctive first person

        is,ibas,ibis,ivisti,iveras,iveris second person
        eas,ires,-,iveris,ivisset,- second person Subjumctive

        ( I am not going to write all persons here.)

        and compare it using some "rule" against a regular verb of the fourth class:
        then multiply it by 3 - I mean Subjunctive, verbs passive Indicative/Subjunctive.

        How are these rules learned, or better - are there any rules at all which the intellect is aware of? So or so the way a Latin scholar processed verbs was different to how Pinker processes English verbs in his mind (or he thinks he processes).

        Don't take me wrong, but I am kind of "creotard". Pinker is a hard-core Darwinists and this reductionists view on verbs reminds me of "gene-centric" Dawkin's view, where for instance all pleiotropy or epistasis are neglected (and which probably plays the crucial role in expressing genes) . Complexity is obviously unwelcome phenomena in the thinking of a reductionist. It may be mentioned but not dealt with.

  • Dorothea says:

    I'm enjoying this series, Melody. In a past life I was a graduate student in historical Iberian linguistics. I gave up on Chomsky when a Chomskybot prof in the linguistics department told us flat out that utterances I had heard clear as day were impossible because they didn't fit Chomskyan linguistic theory.

    I refuse to deny my own ears for Chomsky's sake.

  • Dan says:


    did i just miss the part in melodye's post where she said there were "no complex representations"?

    it's quite clear that there are complex representations in rats and pigeons and dogs. i don't see you falling over yourself to affirm that fido has an innate concept of carburetor, or that fido's ability to understand "dubleyoo ay el kay" is evidence for universal canine grammar.

    i also didn't see melodye "deny the existence of the behavior (such as recursion) they are meant to describe" either. recursion is a way of describing the data. it's neat and elegant when we use it in computer science, and it works because we get to define our data types. but it's ugly and problematic in linguistics because every data type you decide on goes 'phut' as soon as you try to generalize it to other data. this is why even recursion doesn't do a very good job of explaining the linguistic behavior some people like to call "recursion."

    what i did see melodye say was this:

    "The problem with faith-based linguistics, is that it’s about as friendly to open scientific inquiry as the inquisition, and people who take the faith-based approach spend a lot of time tuning out (or shouting over) the ‘noise’ of reasonable dissent. I suppose it’s easier than figuring out what ”constituency, categories, recursion and systematicity” are, or whether these ideas even make sense."

    now that sounds like nihilism to me.

  • KEN says:

    gee, heaven forbid somebody misrepresent somebody's stance. as if chomsky really thought children didn't learn the word "door" from exposure. ridic.

    • melodye says:

      Cool, well --

      Here are two illustrative Chomsky quotes from The Managua Lectures:

      "Evidently, the language faculty incorporates quite specific principles that lie well beyond any “general learning mechanisms,” and there is good reason to suppose that it is only one of a number of such special faculties of mind. It is, in fact, doubtful that “general learning mechanisms,” if they exist, play a major part in the growth of our systems of knowledge and belief about the world in which we live–our cognitive systems. …It is fair to say that in any domain in which we have any understanding about the matter, specific and often highly structured capacities enter into the acquisition and use of belief and knowledge." (47-8)

      ""Now that can only mean one thing. Namely, human nature gives us the concept "climb" for free. That is, the concept "climb" is just part of the way in which we are able to interpret experience available to us before we even have the experience. That is probably true for most concepts that have words for them in language. This is the way we learn language. We simply learn the label that goes with the preexisting concept. So in other words, it is as if the child, prior to any experience, has a long list of concepts like "climb," and then the child is looking at the world to figure out which sound goes with which concept." (191)

      Ridic? Yeah, I'd say so.

  • Avery Andrews says:

    "To make a grammatical category meaningful, you would need to be able to enumerate a set of rules that applied (i.e., generalized) across every word in it"

    This ceases to be true once there is a notion of 'subclass' available. So for example 'nouns' can be divided into count and mass, with some properties in common and others different, with the possibility of further subdivisions and cross-classifications. This is why inheritance hierarchies have become so popular. Any individual rule could be evaded by some subclass. It's also unclear to me what the relevant notion of 'rule' would be in construction grammar or even LFG.

    And I have not a clue what would replace recursion in any reasonable account of what's going on in the 'lakes and islands' web-page I mentioned once before, or the complex prenominal modifiers of Ancient Greek, where you can get a genitive NP, or a PP, in between the article and the noun:

    to epi to:i lopho:i megaron
    the on the hill palace
    'the palace on the hill'

    ho tou Kurou hippos
    the the(G) Cyrus(G) horse
    'Cyrus' horse'

    The fact that it's quite clear that the famous Pirahã don't do this, or anything like it, makes it even clearer that we do and the ancient Athenians did (it doesn't seem to happen in Homer, or in Modern Greek; not sure when it disappeared).

  • What I wanna know is whether Ramscar's hair is anywhere near as fancy as Pinker's!

  • Avery Andrews says:

    The Pullum & Scholz infinitude paper, which can be gotten off his web-page, doesn't challenge recursion at all in the way it is standardly used in ordinary linguistics, ie defining NP's in such a way that they can contain NPs, with the result that the only way to make the generated language finite is to slap on some extraneous numerical constraints (c.f. the last paragraph).

    It does however contain some stuff that strikes me as rather odd (the claim that transducers and constraint-based grammars aren't generative, and the vaguely described possibility of a category-theoretic remake of Harris' Transformational Analysis), along with some quite good stuff, such as the discussion of Amazonian languages.