The Observer's Paradox

Mar 12 2011 Published by under Forget What You've Read!

If you have been following the debate on acceptability judgments and other linguistic methods, you may want to check out computational linguist Mark Liberman's (mini)-argument against self-observation on Language Log.  The tongue-in-cheek response is in reply to comments on Bill Poser's post about the pronunciation of the word "tsunami."  Mark writes:

"This gap between phonetic intuition and phonetic fact is a special form of the observer's paradox. Just as we behave differently when we're aware of being observed by others, we also behave differently when we imagine observing ourselves."

In doing some follow-up reading on the history of the paradox, I found a brilliant essay by the famous sociolinguist, William Labov, on the declining methodological standards in linguistics research (see p. 105-8 for what he thinks of the role of 'intuition').  The essay was published in 1972.  Labov wrote then:

"If new data has to be introduced, we usually find that is has been barred for ideological reasons, or not even been recognized as data at all, and the new methodology must do more than develop techniques.  It must demolish the beliefs and assumptions which rules its data out of the picture.  Since many of these beliefs are held as a matter of deep personal conviction, and spring from the well-established habits of a lifetime, this kind of criticism is seldom accomplished without hard feelings and polemics, until the old guard gradually dissolves into academic security and scientific limbo."

He could well have been writing about corpora.

For those of you who read my post last week about language change, you may be amused to note that in the same essay, Labov wrote: "We are forced to ask whether the growth of literacy and mass media are new factors affecting the course of linguistic change that did not operate in the past."  Well, Mr. Greene and I certainly never claimed to be the first to articulate these ideas!

Cheers, William.

On a different note, I have been extremely troubled by the reports about what is going on in Japan.  One of my best friends left the country a day before the earthquake.  To everyone there - or with friends and family there - my heart goes out to you.  If any readers are interested in giving to the disaster relief fund, you can find more information here.

9 responses so far

  • daedalus2u says:

    That is a very nice essay. While I was reading it, I remembered something from a book I read a long time ago, the Right Stuff.

    What I remembered is that essentially all airline pilots have adopted a kind of southern twang that Chuck Yeager speaks with. That might be something you could get data on. If it is still going on, you might be able to get prospective data from military recruits before they start training to become pilots.

  • daedalus2u says:

    Steve Novella over at Neurologica has a recent post on intuition.

    It can be useful but it should not replace rational thought.

    • Avery Andrews says:

      My belief that is that you can't use yourself as an informant for anything more than the simplest stuff (box 1 in that thought experiment article from a few posts ago is correct), but that you can ask other people informally even if they might understand your theory (box 2 doesn't usually apply to linguistic work; even if they understand it, that doesn't mean they want it to be true, and even if they could understand it if it was explained to them, that doesn't mean that they immediately perceive its application to your sentences), and, in most cultures, they won't tell you want you want to hear (box 3 is especially inapplicable to linguistics, Clever Hans is AWOL, possibly especially far away when you're presenting sentences to Aussie students in a 'Seppo' accent).

      Forex my original claims about nonnominative subjects in Icelandic, presented in the mid seventies, were based on a very small number of informant interviews, not very well managed, but they held up well in small surveys done a few years later, and seem to have stood the test of time. (The Modern Icelandic Syntax (Academic Press 1990) has the first paper, and a later one presenting some of the survey results).

      The practice of doing small experiments when a dispute arises about the data seems to be catching on, e.g.:

    • Avery Andrews says:

      Addendum: The Labov article is quite old, and I think a lot of people got the message; it could have been among the reasons I did my surveys.

      NYT science just tweeted an interactive map of the tsunami affected area:

  • Aviad says:

    A nice way to connect your interest in generative methodology with variation would be to look at various cases where interspeaker variation has been reported in the generative syntax literature. The variation, as you may know, is typically either tossed aside or attributed to the existence of different grammars (without any supporting evidence).

    I bring up this type of data, exhibiting interspeaker variation, because the whole discussion around the Gibson & Federenko paper has overlooked it. Nice examples include weak crossover and inverse scope, which have been known to show variation since the first work on these topics. Ask anyone who's taught syntax to undergrads, and they'll tell you that they get different responses from students regarding these phenomena. Interestingly, WCO and inverse scope are not mentioned in the Adger textbook which Sprouse & Almeida tested.

    In my humble opinion, phenomena like WCO and inverse scope illustrate the true Achilles' Heel of generative syntax. The fact that the variation in these domains has essentially been ignored does not mean, of course, that simply testing judgments among multiple speakers will solve the problem; we'll just get a distribution. However, it does point to the lack of seriousness on the part of generative syntacticians towards their data.

    The same lack of seriousness leads syntacticians to assert that the judgments they find in these cases reflect ungrammaticality. In this context, I found particularly amusing a claim by Karthik Durvasula from March 5th whereby "no respectable linguist is going to immediately infer ungrammaticality from a sentences that are unacceptable". This is, in fact, what *every* syntactician does - ungrammaticality (i.e. a syntactic problem) is always taken for granted.

    I highly recommend reading Newmeyer's various publications, which have consistently pointed out the problems with generative syntax from the point of view of an insider. Already in 1983 he raised the possibility that "all hypothesized idiosyncratic dialects are merely reflections of speakers' differing contextualizations of possible readings for sentences that are ambiguous in their grammar" (p. 57). Unfortunately, almost 30 years have passed and no syntactician has bothered to check whether he might have been right.

  • Avery Andrews says:

    It hasn't really been ignored; Guy Carden wrote a thesis about varation in quantifier scope a long time ago, and Joan Bresnan has changed direction to spend most of her time working on varation and related topics in spoken syntax, and the theory of Stochastic OT is largely motivated by its ability to provide an account of it. But it is fair to say that it hasn't been integrated into the rest of syntax in a generally accepted way.

    A thought occasioned by Aviad's posting is that it might be useful to distinguish what might be systematic, persistent variation, as with WCO or parasitic gap phenomena, from variation that reflects a change in progress, as with the "New Passive" (or "New Impersonal", or just "New Construction") in Icelandic, whose investigation started with Maling and Sigurjónsson's work in 2000, 2001, involving a survey of 1736 10th graders, and at least two other large surveys since. To establish the relevant facts about a change in progress clearly does require large surveys, while most basic, stable facts about a language do not (although using only yourself as an informant/experimental subject clearly does not work). Maling gave a plenary at the most recent LSA about her work, so that's pretty high profile.

    • Alex Clark says:

      Yes, the methodological concern is deeper. Going back to the debate with Gibson and Almeida/Sprouse here

      Diogo says "In this example of faulty judgment data, Filmore (1965) states that sentences like (1), in which the first object in a double object construction is questioned, are ungrammatical:

      (1) Who did you give this book?

      Langendoen, Kalish-Landon & Dore (1973) tested this hypothesis in two experiments, and found many participants (“at least one-fifth”) who accepted these kinds of sentences as completely grammatical. Wasow & Arnold note that this result has had little impact on the syntax literature." (pp. 13-4)

      And it shouldn't. If only one fifth of the sample in Langendoen et al. (1973) failed to show the expected contrast, the results are not problematic at all. In fact, they are actually highly signifcant, and overwhelmingly support the original proposal: A simple one-tailed sign test here would give you a p-value of 1.752e-09 and a 95% CI for the probability of finding the result in the predicted direction of (0.7-1)). "

      This is a case of what Avaid is talking about. The conclusion drawn isn't that there are two idiolects in this population and one judges this acceptable and one not.

      • Avery Andrews says:

        Hmm yes, actual variation in the language vs variable responses to a test need to be distinguished, large surveys presumably required in all but the most obvious cases, such as syntactic differences between Australian and American English. Lots of interesting stuff in that discussion!

      • Diogo Almeida says:

        I apologize in advance for the length of this comment 🙂

        I frankly think that a lot of the distrust regarding the methodology in linguistics is actually a spillover from differences in intuitions about the meaning of variation in acceptability judgments rather than on the status of the acceptability data itself.

        For instance, Aviad mentions that the presence of interspeaker variability is somehow (1) a real problem and (2) this problem is systematically ignored in linguistics (or just tossed into the "different dialect" bin).

        I think this kind of opinion, which is suprisingly widespread, pressuposes a very strong view of what a theory grammar is a theory of. The intuition behind this view seems to go something like this: We use acceptability judgments to inform our grammatical theories, and therefore the goal of grammatical theories is to explain acceptability judgments; moreover, most grammars are formulated as a categorical system, but acceptability data is continuous and variable. Therefore, the theory of grammar is generally wrong, because it can only predict categorical data, and it will systematically fail in its primary role of explaining acceptability data.

        I find that most syntacticians I have talked to don't subscribe to this view at all. They think that (i) although acceptability judgments are used to inform the theory, grammatical theories are not theories of acceptability judgments, and (ii) that variation is something to be expected by the very nature of what an acceptability judgment is, ie, the output of a performance system, that takes into account not only what the grammar has to say, but also all kinds of other things, such as memory limitations, world knowledge, parsing strategies and whatnot.

        Now, under this view, variation in acceptability judgment is something to be expected, but not necessarily something deserving of a theoretical explanation, the same way that variation in reaction time data is expected, but not necessarily something that needs an explanation beyond "sampling error".

        Others might have a different opinion, and think that at least some part of the variation in acceptability judgments is meaningful, and deserving of scientific study. This could be due to the fact that the variation seems systematic in some way, indicating either that a different dialect/sociolinguistic variable/performance issue, rather than just random noise, might be at play.

        Crucially, the two views are not mutually exclusive. However, there seems to be the perception that they are.

        To give a concrete example, on the paper by Langendoen et al (1973) mentioned by Alex, the question was how to best account for the fact that questions out of the first object of a double object construction are generally considered bad. Jackendoff & Culicover (1971) had already argued that a grammatical account was problematic, purely on the basis of theory internal considerations, and they tried to provide a parsing strategy explanation. Langendoen et al (1973) built on that same logic and provided not only a processing explanation, but also tried to tie the observed variation in acceptability (the fact that one fifth of their sample seemed to be ok with the supposedly unacceptable questions) to different parsing strategies (two parsing "dialects", if you will). Wasow & Arnold (2005) and Gibson & Fedorenko (2010) interpreted the same data differently: the fact that one fifth of the sample seemed to accept the supposedly unacceptable questions was taken as evidence that the original claim by Fillmore (1965) was wrong.

        None of these proposed alternatives, however, has anything to do with the status of the data. Nobody ever presented any evidence disputing the fact that speakers of English find these questions by and large to be unnaceptable. The whole issue is how to interpret the central tendency of the data and the observed variation around it. Do you favor a grammatical (Fillmore, 1965) or a non-grammatical explanation (Jackendoff & Culicover, 1971)? If you favor a grammatical explanation, and you think that the only possible prediction a grammar can make about acceptability judgements is a categorical one, then the fact that one fifth of the sample found these questions to be ok is probably going to be a problem. If, on the other hand, you think that variation in acceptability is something to be expected just by virtue of sampling error, you could reach the exact opposite conclusion. As for the variation, is it something that can be ignored as sampling error, or is there some regularity that is deserving of an explanation, like Langendoen et al (1973) suggested for their own data?

        These are all legitimate questions, and it is important to acknowledge that there are no easy answers here. Crucially, the real work to be done is not in establishing what the data is (in fact, the data is pretty uncontroversial), but rather in figuring out the most reasonable way of interpreting it.