The Development of Causal Reasoning: On Optimal Search in the A-not-B task

Aug 16 2010 Published by under Forget What You've Read!, From the Melodye Files

I am a horribly forgetful girl.

Which is a funny thing to say, really, because I’m not quite sure whether it’s my memory that’s bad or my attention.  Recently, for instance, I spent several hours searching for my phone to no avail, only to find (the following morning) that I had left it in my underwear drawer.  It reminded me of when I left my driver’s license in the refrigerator with my passport; or the time I put a bowl of ice-cream in the oven for safekeeping.

It is desperately hard to ‘find’ things again once I’ve committed such an error, because there is simply no logical way to retrace my steps.  “Ah yes, the oven!  A perfect place to stow the ice-cream…”

It’s demented, really (or early-stage dementia, quite probably).

But in any case, living with myself – and my frightful follies – has entailed learning strategies to find things again.  My preferred strategy is asking my friends and flatmates where they last saw X – since wherever I’ve put X is almost invariably not a place that X’s go.  Sneakers?  “On the kitchen table.”  Purse?  “In the bathroom, on the floor.”  Sunglasses?  “In the downstairs pantry—again, Melody?— ”

Looking for an object in the last place it was seen is often a good way to find it.  But there are other methods.  For instance, sometimes it’s better to look where I typically scatter my things: the entry way and living room seem to be frequent collectors of sandals and overdue library books; the bathroom collects keychains and mail; the passenger seat of my car seems to be a favored spot to leave my wallet…

What’s interesting about this is that there isn’t simply one cookie-cutter recipe for finding what I’ve lost.  Which strategy I adopt depends both on what it is I’m missing and what my mental state was when I lost it.  Was I mostly alert?  Spaced-out?  Spacey with a chance of a meatballs?

Choosing the best search strategy is usually a matter of context.  This much is obvious.  However, the simple fact that this is true for me – as it is for much higher-functioning adults – can give us a rather interesting insight into how children learn to search.

In 1954, Piaget first described what are classically termed ‘A-not-B’ errors.  What Piaget found was that eight to twelve month-old infants do not yet appear capable of ‘rational’ search.  In his experiments, a researcher would first entice the infant with an attractive object – say, a bright red, shiny apple.  The researcher would then show the object disappearing at a particular location (‘A’) and reappearing from that same location (for instance, the researcher might put the apple behind a screen and then bring it back out again).  This game would be repeated several times, until the infant was gamely searching for the apple at A.  Then, the researcher would switch up the game, and hide the apple at a new location (‘B’).  What was surprising was that while the infant would happily follow the hiding and reappearing act at A, when the apple was hidden at B, the infant would continue to look (or ‘search’) at A, as if it should reappear at A.  This was the case even though the infant had physically seen the experimenter re-hide the apple at B.

In light of this, Piaget theorized that the infant had not yet developed an understanding of ‘object permanence,’ which must ‘dawn’ at a later stage of development.  In the years since, theoretical accounts of A-not-B have tried to pin the errors on slow neurological development : for example, it has been suggested that A-not-B errors may result from immature executive function or limited working memory, and may resolve over time as a result of neural maturation.

Another possibility, however, is that children need to learn which search strategies are appropriate in which context [1].  This may seem counterintuitive: shouldn’t it be obvious that if an object is hidden at X, it reappears at X?  The crazy thing is – probably not!

As adults, we reason ‘causally’ about the world all the time.  Given how unconscious and implicit much of our knowledge of the world is, it’s not surprising that we take the various physical (and temporal) workings of reality for granted.  But the surprising behavior of infants in the A-not-B task may be our best evidence to date that our understanding of those relations is not – as we might imagine – a given, but rather, is governed by what experience teaches us about the world.

Think for a moment about the sheer number of different relationships between objects and events that a young child needs to learn about.  Here are a handful of examples:

When Mittens ‘hides’ outside, he scampers out through the back door, but ‘reappears’ through the window.

When Mommy ‘hides’ batter in the oven, it ‘reappears’ as cake.

When Daddy ‘hides’ batter in the oven, the smoke alarm goes off and Mommy screams.

To my knowledge, no psychologist has ever posited ‘batter impermanence’ as a developmental stage. But it's not clear that ‘object permanence’ is so different, after all.  What is clear from these examples is that while we may generalize certain causal relations across instances, context will always play a role, and we need to learn to discriminate which rules apply across which instances.

Which is just another way of saying – the fact that babies commit A-not-B errors may simply be the misapplication of an otherwise logical strategy.

It’s easy enough to see why this would be the case.  As an infant, you’ve seen the apple hidden at A four times and (more critically) you’ve seen it reappear at A four times.  That means you have strong evidence that you will find it again at A.  Now it gets hidden at B.  You have zero evidence for it reappearing at B – and haven’t yet learned that context counts.  Best bet’s on A.

So far, so good.  That explains why kids fail the task.  But how do they eventually learn to pass it?  The easiest way to frame this is in terms of expectation.  If you expect something will happen, and it doesn’t (repeatedly), you will begin to revise that expectation.  So if the researcher keeps playing the game with you, and the apple kept reappearing at B –five times, say— then that’s five times that your expectation that it might appear at A was violated and five times that you saw it reemerge from B.  By now you should have a lot of ‘negative evidence’ for A, and a lot of ‘positive evidence’ for B – meaning that you’ll have ‘unlearned’ A as the best search spot.  Now you switch your line.  Bet’s on B! you decide, and get the next one right.

“Ah, but there’s a hitch!” exclaim the developmental psychologists…  Now that you’re set on B, if we switch back to A, you’ll be stuck again.  You’ll have to ‘unlearn’ B as the best response, and slowly cycle back again to A.  But by then, we’ll switch it up on you again!  You’ll never learn!  Learning is impossible!  BATTER IMPERMANENCE!!  [2]

…Ahem.

Not so fast.  This would be true if we thought that all that children were learning about were the two possible relationships ‘Hiding game = look at A’ and ‘Hiding game = look at B.’  But what if they took context into account?  What if, for example, in addition to ‘H = look at A’ and ‘H = look at B’ they learned about ‘H at A = look at A’ and ‘H at B = look at B’?  In other words, what if they were trying to figure out which strategy worked best?

If you think back to the search dilemma I described at the beginning, I mentioned that I’ve had to learn when it’s best to apply a “look in the most frequent spot” (H = look at A) versus a “look in the last spot”  (H at A = look at A) strategy.  Might infants be faced with the same puzzle?

Three of my colleagues – graduate student Hanna Muenke Popick and professors Michael Ramscar and Natasha Kirkham – decided to investigate this possibility, by modeling learning using a simple, widely used reinforcement rule [3].  They found that the model could easily account both for why infants initially adopt frequency-based strategies in the A-not-B task, and for why context-based strategies eventually win out over frequency-based strategies in the long run.  In short, the weight of evidence the child gets over the course of the game ultimately favors context-based strategies.

An eye-tracking study of nine-month olds offered support for this hypothesis, showing that children who initially searched at A incrementally switched their search to B over the course of learning trials, in line with the model’s predictions [4].

So in other words : yes.  There is good evidence that A-not-B is explicable in terms of simple learning mechanisms [5].  I should be quick to add that this certainly does not rule out the contribution of other developmental factors, such as working memory or inhibitory control [6].  Nor does it show that A-not-B is necessarily resolved via learning (though the empirical results provide strong evidence in that direction).

However, there is good reason to think that learning should help explain performance in A-not-B.  Our brains appears to be wired to learn about (and predict) complex relations within our environment.  But there is little reason to suppose that the ‘content’ of those relations is already hardwired.  For example, the conditional ‘if Daddy has been drinking, Mommy will be sad,’ has to be learned, because it might just as easily be ‘if Daddy has been drinking, Mommy will put Rick James on the tapedeck and dance foolishly.’  If it’s possible for children to learn these kinds of complex causal relations – and react accordingly – why rule out the contribution of learning in A-not-B a priori?

I mean, I can wager a guess... It's unnerving to think that 'object permanence' isn't part of our innate endowment.  It suggests we could have just as easily learned to apprehend a world where teleportation and telekinesis were the norm, and in which the physical laws we take to be self-evident were flipped on their heads.  But I prefer to think that that speaks to the incredible power of our learning architecture, rather than to the strange (and possibly incidental) quality of our reality.

The Daily Fact Check

[1]  Popick et al’s article is certainly not the only one to look into context.  Here’s a brief excerpt from the literature review: “…infant behavior in these tasks is still, in many ways, context-dependent. For instance, while many 9 month old infants can successfully complete [a] towel pulling task (Aguiar & Baillargeon, 2000), they still fail the standard A-not-B task (Piaget, 1954), even though these tasks appear structurally similar. Further, Adolpho (2000) found that what an infant learns in one context does not always extend easily to another (see also Thelen, Schoner, Scheier, & Smith, 2001; Smith & Thelen, 2003). Thus, infants do not initially appear to learn abstract, generalized “search.”  Rather, infant search learning is sensitive both to kind (pulling, reaching, etc) and context.”

[2]  As usual, I'm fully exaggerating.  However, it has been widely suggested that learning models cannot explain how infants progress in the A-not-B task.  For example, the Wiki page on the task has this to say: “There are also behaviorist accounts that explain the behavior in terms of reinforcement… However, this account does not explain the shift in behavior that occurs around 12 months.”  From what I can make out, statements like these stem from a misunderstanding of how learning models work (whether this misunderstanding is on the part of the ‘behaviorists’ or the critics, it’s hard to say).  If you don’t know much about learning models, I would highly recommend reading Rescorla (1988), Pavlovian Conditioning: It’s Not What You Think.

[3]  For the technically minded, the model is Rescorla-Wagner (1972) and can be implemented with a single free parameter – learning rate.  Critically, the model makes the same prediction – that context trumps frequency-based cues – regardless of how you set the parameter; the question is simply how long it takes.  If you set a slow learning rate, it could take eons; a fast learning rate, and it might take a handful of trials.  In Popick et al.’s paper they set the parameter to reflect the speed at which one might plausibly expect infants to learn in the task.

[4]  You may be wondering why we didn’t have the infants sit through round after round of games until they mastered A-not-B, as predicted by the model.  There’s a very simple reason for that: they’re infants.  Getting infants to sit through a twenty minute study is difficult enough, let alone a study of double or even triple that length.  A multi-day training study might make for an interesting follow-up.

[5]  Unlike other mathematical models of human learning – which may or may not be psychologically plausible – reinforcement learning has been studied widely in both humans and animals.  Models of such learning have been found to accurately predict a diverse range of learning phenomena and behavior, and have also been shown to reflect real neural processes (specifically, dopanminergic patterns of response to error and reward; c.f. Waelti, Dickinson, & Schultz, 2001).  There’s no question that humans can and do employ reinforcement learning.  What’s debatable is in which domains and to what extent.  This question becomes particularly contentious with regards language.

[6]  For various maturation accounts, see e.g., Baillargeon, Graber, Devos, & Black, 1990; Diamond, 1988; Diamond, Cruttenden, & Neiderman, 1994; Munakata, 1997; Thelen, Schöner, Scheier, & Smith, 2001.

9 responses so far

  • Sam says:

    And genies in the sky.

  • Talktome says:

    Thanks for sharing it.

  • Andrew says:

    It seems disingenuous to describe the Thelen et al account of the A-not-B error as being just about 'context'; and downright incoorrect to lump it in with 'maturational' accounts from people like Baillargeon and Diamond, given that Thelen et al spent so much time directly opposing exactly those accounts. Could you be more specific about what you mean by those terms?

    You haven't addressed any of the numerous empirical results from the field model work, either; how well does a simple reinforcement model account for these robust effects? Given that the error is intrinsically about reach behaviour, any model that doesn't address that seems of limited use.

    • melodye says:

      Andrew -- thanks for the comment! I really like Smith and Thelen's work and shouldn't have taken a bite-size chunk out of the paper's literature review without fleshing it out better.

      I do think a dynamic systems approach is one of the most compelling ways to date of modeling and understanding behavior in the A-not-B task (and in other developmental domains). I would encourage any interested reader, expert or no, to peruse the Smith & Thelen 2003 TiCS article (open access PDF), which is accessible to a general audience.

      There is nothing incompatible with a dynamic systems and a learning-based approach. Learning is, of course, one element in a dynamic system. What the work I describe above shows, is that reinforcement learning is a powerful and simple means of showing how children interact with A-not-B (first in an errorful way and later in a 'strategic' way). As I mentioned, this does not rule out the influence of other vectors in a maturing system. What it shows is that you do not need to take these elements into account to produce a model that accounts --with some precision-- for the behavior. Since we know children make use of reinforcement learning, this seems like a logical fit with the problem.

      The choice of an eye-tracking rather than a reach-paradigm was deliberate; it allowed Hanna to more thoroughly and precisely track attentional shifts (instead of the binary reach here / reach there). If you're interested, I can send you a copy of the paper -- it goes into some detail about the choice of paradigm and how it relates to the reach literature. Would be curious to hear your thoughts!

  • Andrew says:

    I'd certainly be interested in the paper, thanks. I have some other thoughts but I'd need to read what was actually done to see if they're worth mentioning 🙂

  • Justin says:

    I feel obnoxious for pointing this out as your article was very informative and well written, but:

    "When Mommy ‘hides’ batter in the oven, it ‘reappears’ as cake."

    "When Daddy ‘hides’ batter in the oven, the smoke alarm goes off and Mommy screams."

    Is there not some degree of oblique sexism inherent in that excerpt? I know you intended no malice (the example may not even be yours), and I am in NO way offended, however any writer would be castigated for making a remark about how when "Mommy" drives she crashes into the mailbox and "Daddy" screams.

    Sorry to hijack the conversation for something so silly, but I've been on a little bit of a crusade lately after seeing so many TV shows and commercials routinely cast some insensate clod as the father. Anyway, it was a really wonderful article.

    • melodye says:

      For much of my life, I was raised in a single parent family with my father. And back in the day, it was my father who taught my mother how to cook. The line was meant as a play on a bad stereotype.. : )

  • Jack says:

    I like the way this article was written. It has made me a cheerful person. A matter of seriousness has been handled in a light manner.