I am a horribly forgetful girl.
Which is a funny thing to say, really, because I’m not quite sure whether it’s my memory that’s bad or my attention. Recently, for instance, I spent several hours searching for my phone to no avail, only to find (the following morning) that I had left it in my underwear drawer. It reminded me of when I left my driver’s license in the refrigerator with my passport; or the time I put a bowl of ice-cream in the oven for safekeeping.
It is desperately hard to ‘find’ things again once I’ve committed such an error, because there is simply no logical way to retrace my steps. “Ah yes, the oven! A perfect place to stow the ice-cream…”
It’s demented, really (or early-stage dementia, quite probably).
But in any case, living with myself – and my frightful follies – has entailed learning strategies to find things again. My preferred strategy is asking my friends and flatmates where they last saw X – since wherever I’ve put X is almost invariably not a place that X’s go. Sneakers? “On the kitchen table.” Purse? “In the bathroom, on the floor.” Sunglasses? “In the downstairs pantry—again, Melody?— ”
Looking for an object in the last place it was seen is often a good way to find it. But there are other methods. For instance, sometimes it’s better to look where I typically scatter my things: the entry way and living room seem to be frequent collectors of sandals and overdue library books; the bathroom collects keychains and mail; the passenger seat of my car seems to be a favored spot to leave my wallet…
What’s interesting about this is that there isn’t simply one cookie-cutter recipe for finding what I’ve lost. Which strategy I adopt depends both on what it is I’m missing and what my mental state was when I lost it. Was I mostly alert? Spaced-out? Spacey with a chance of a meatballs?
Choosing the best search strategy is usually a matter of context. This much is obvious. However, the simple fact that this is true for me – as it is for much higher-functioning adults – can give us a rather interesting insight into how children learn to search.
In 1954, Piaget first described what are classically termed ‘A-not-B’ errors. What Piaget found was that eight to twelve month-old infants do not yet appear capable of ‘rational’ search. In his experiments, a researcher would first entice the infant with an attractive object – say, a bright red, shiny apple. The researcher would then show the object disappearing at a particular location (‘A’) and reappearing from that same location (for instance, the researcher might put the apple behind a screen and then bring it back out again). This game would be repeated several times, until the infant was gamely searching for the apple at A. Then, the researcher would switch up the game, and hide the apple at a new location (‘B’). What was surprising was that while the infant would happily follow the hiding and reappearing act at A, when the apple was hidden at B, the infant would continue to look (or ‘search’) at A, as if it should reappear at A. This was the case even though the infant had physically seen the experimenter re-hide the apple at B.
In light of this, Piaget theorized that the infant had not yet developed an understanding of ‘object permanence,’ which must ‘dawn’ at a later stage of development. In the years since, theoretical accounts of A-not-B have tried to pin the errors on slow neurological development : for example, it has been suggested that A-not-B errors may result from immature executive function or limited working memory, and may resolve over time as a result of neural maturation.
Another possibility, however, is that children need to learn which search strategies are appropriate in which context . This may seem counterintuitive: shouldn’t it be obvious that if an object is hidden at X, it reappears at X? The crazy thing is – probably not!
As adults, we reason ‘causally’ about the world all the time. Given how unconscious and implicit much of our knowledge of the world is, it’s not surprising that we take the various physical (and temporal) workings of reality for granted. But the surprising behavior of infants in the A-not-B task may be our best evidence to date that our understanding of those relations is not – as we might imagine – a given, but rather, is governed by what experience teaches us about the world.
Think for a moment about the sheer number of different relationships between objects and events that a young child needs to learn about. Here are a handful of examples:
When Mittens ‘hides’ outside, he scampers out through the back door, but ‘reappears’ through the window.
When Mommy ‘hides’ batter in the oven, it ‘reappears’ as cake.
When Daddy ‘hides’ batter in the oven, the smoke alarm goes off and Mommy screams.
To my knowledge, no psychologist has ever posited ‘batter impermanence’ as a developmental stage. But it's not clear that ‘object permanence’ is so different, after all. What is clear from these examples is that while we may generalize certain causal relations across instances, context will always play a role, and we need to learn to discriminate which rules apply across which instances.
Which is just another way of saying – the fact that babies commit A-not-B errors may simply be the misapplication of an otherwise logical strategy.
It’s easy enough to see why this would be the case. As an infant, you’ve seen the apple hidden at A four times and (more critically) you’ve seen it reappear at A four times. That means you have strong evidence that you will find it again at A. Now it gets hidden at B. You have zero evidence for it reappearing at B – and haven’t yet learned that context counts. Best bet’s on A.
So far, so good. That explains why kids fail the task. But how do they eventually learn to pass it? The easiest way to frame this is in terms of expectation. If you expect something will happen, and it doesn’t (repeatedly), you will begin to revise that expectation. So if the researcher keeps playing the game with you, and the apple kept reappearing at B –five times, say— then that’s five times that your expectation that it might appear at A was violated and five times that you saw it reemerge from B. By now you should have a lot of ‘negative evidence’ for A, and a lot of ‘positive evidence’ for B – meaning that you’ll have ‘unlearned’ A as the best search spot. Now you switch your line. Bet’s on B! you decide, and get the next one right.
“Ah, but there’s a hitch!” exclaim the developmental psychologists… Now that you’re set on B, if we switch back to A, you’ll be stuck again. You’ll have to ‘unlearn’ B as the best response, and slowly cycle back again to A. But by then, we’ll switch it up on you again! You’ll never learn! Learning is impossible! BATTER IMPERMANENCE!! 
Not so fast. This would be true if we thought that all that children were learning about were the two possible relationships ‘Hiding game = look at A’ and ‘Hiding game = look at B.’ But what if they took context into account? What if, for example, in addition to ‘H = look at A’ and ‘H = look at B’ they learned about ‘H at A = look at A’ and ‘H at B = look at B’? In other words, what if they were trying to figure out which strategy worked best?
If you think back to the search dilemma I described at the beginning, I mentioned that I’ve had to learn when it’s best to apply a “look in the most frequent spot” (H = look at A) versus a “look in the last spot” (H at A = look at A) strategy. Might infants be faced with the same puzzle?
Three of my colleagues – graduate student Hanna Muenke Popick and professors Michael Ramscar and Natasha Kirkham – decided to investigate this possibility, by modeling learning using a simple, widely used reinforcement rule . They found that the model could easily account both for why infants initially adopt frequency-based strategies in the A-not-B task, and for why context-based strategies eventually win out over frequency-based strategies in the long run. In short, the weight of evidence the child gets over the course of the game ultimately favors context-based strategies.
An eye-tracking study of nine-month olds offered support for this hypothesis, showing that children who initially searched at A incrementally switched their search to B over the course of learning trials, in line with the model’s predictions .
So in other words : yes. There is good evidence that A-not-B is explicable in terms of simple learning mechanisms . I should be quick to add that this certainly does not rule out the contribution of other developmental factors, such as working memory or inhibitory control . Nor does it show that A-not-B is necessarily resolved via learning (though the empirical results provide strong evidence in that direction).
However, there is good reason to think that learning should help explain performance in A-not-B. Our brains appears to be wired to learn about (and predict) complex relations within our environment. But there is little reason to suppose that the ‘content’ of those relations is already hardwired. For example, the conditional ‘if Daddy has been drinking, Mommy will be sad,’ has to be learned, because it might just as easily be ‘if Daddy has been drinking, Mommy will put Rick James on the tapedeck and dance foolishly.’ If it’s possible for children to learn these kinds of complex causal relations – and react accordingly – why rule out the contribution of learning in A-not-B a priori?
I mean, I can wager a guess... It's unnerving to think that 'object permanence' isn't part of our innate endowment. It suggests we could have just as easily learned to apprehend a world where teleportation and telekinesis were the norm, and in which the physical laws we take to be self-evident were flipped on their heads. But I prefer to think that that speaks to the incredible power of our learning architecture, rather than to the strange (and possibly incidental) quality of our reality.
The Daily Fact Check
 Popick et al’s article is certainly not the only one to look into context. Here’s a brief excerpt from the literature review: “…infant behavior in these tasks is still, in many ways, context-dependent. For instance, while many 9 month old infants can successfully complete [a] towel pulling task (Aguiar & Baillargeon, 2000), they still fail the standard A-not-B task (Piaget, 1954), even though these tasks appear structurally similar. Further, Adolpho (2000) found that what an infant learns in one context does not always extend easily to another (see also Thelen, Schoner, Scheier, & Smith, 2001; Smith & Thelen, 2003). Thus, infants do not initially appear to learn abstract, generalized “search.” Rather, infant search learning is sensitive both to kind (pulling, reaching, etc) and context.”
 As usual, I'm fully exaggerating. However, it has been widely suggested that learning models cannot explain how infants progress in the A-not-B task. For example, the Wiki page on the task has this to say: “There are also behaviorist accounts that explain the behavior in terms of reinforcement… However, this account does not explain the shift in behavior that occurs around 12 months.” From what I can make out, statements like these stem from a misunderstanding of how learning models work (whether this misunderstanding is on the part of the ‘behaviorists’ or the critics, it’s hard to say). If you don’t know much about learning models, I would highly recommend reading Rescorla (1988), Pavlovian Conditioning: It’s Not What You Think.
 For the technically minded, the model is Rescorla-Wagner (1972) and can be implemented with a single free parameter – learning rate. Critically, the model makes the same prediction – that context trumps frequency-based cues – regardless of how you set the parameter; the question is simply how long it takes. If you set a slow learning rate, it could take eons; a fast learning rate, and it might take a handful of trials. In Popick et al.’s paper they set the parameter to reflect the speed at which one might plausibly expect infants to learn in the task.
 You may be wondering why we didn’t have the infants sit through round after round of games until they mastered A-not-B, as predicted by the model. There’s a very simple reason for that: they’re infants. Getting infants to sit through a twenty minute study is difficult enough, let alone a study of double or even triple that length. A multi-day training study might make for an interesting follow-up.
 Unlike other mathematical models of human learning – which may or may not be psychologically plausible – reinforcement learning has been studied widely in both humans and animals. Models of such learning have been found to accurately predict a diverse range of learning phenomena and behavior, and have also been shown to reflect real neural processes (specifically, dopanminergic patterns of response to error and reward; c.f. Waelti, Dickinson, & Schultz, 2001). There’s no question that humans can and do employ reinforcement learning. What’s debatable is in which domains and to what extent. This question becomes particularly contentious with regards language.
 For various maturation accounts, see e.g., Baillargeon, Graber, Devos, & Black, 1990; Diamond, 1988; Diamond, Cruttenden, & Neiderman, 1994; Munakata, 1997; Thelen, Schöner, Scheier, & Smith, 2001.