How working memory works: What you need to know

A New Yorker cartoon has a man telling his glum wife, “Of course I care about how you imagined I thought you perceived I wanted you to feel.” There are a number of reasons you might find that funny, but the point here is that it is very difficult to follow all the layers. This is a sentence in which mental attributions are made to the 6^th level, and this is just about impossible for us to follow without writing it down and/or breaking it down into chunks.

According to one study, while we can comfortably follow a long sequence of events (A causes B, which leads to C, thus producing D, and so on), we can only comfortably follow four levels of intentionality (A believes that B thinks C wants D). At the 5^th level (A wants B to believe that C thinks that D wants E), error rates rose sharply to nearly 60% (compared to 5-10% for all levels below that).

Why do we have so much trouble following these nested events, as opposed to a causal chain?

Let’s talk about working memory.

Working memory (WM) has evolved over the years from a straightforward “short-term memory store” to the core of human thought. It’s become the answer to almost everything, invoked for everything related to reasoning, decision-making, and planning. And of course, it’s the first and last port of call for all things memory — to get stored in long-term memory an item first has to pass through WM, where it’s encoded; when we retrieve an item from memory, it again passes through WM, where the code is unpacked.

So, whether or not the idea of working memory has been over-worked, there is no doubt at all that it is utterly crucial for cognition.

Working memory has also been equated with attentional control, and working memory and attention are often used almost interchangeably. And working memory capacity (WMC) varies among individuals. Those with a higher WMC have an obvious advantage in reasoning, comprehension, remembering. No surprise then that WMC correlates highly with fluid intelligence.

So let’s talk about working memory capacity.

The idea that working memory can hold 7 (+/-2) items has passed into popular culture (the “magic number 7”). More recent research, however, has circled around the number 4 (+/-1). Not only that, but a number of studies suggest that in fact the true number of items we can attend to is only one. What’s the answer? (And where does it leave our high- and low-capacity individuals? There’s not a lot of room to vary there.)

Well, in one sense, 7 is still fine — that’s the practical sense. Seven items (5-9) is about what you can hold if you can rehearse them. So those who are better able to rehearse and chunk will have a higher working memory capacity (WMC). That will be affected by processing speed, among other factors.

But there is a very large body of evidence now pointing to working memory holding only four items, and a number of studies indicating that most likely we can only pay attention to one of these items at a time. So you can envision this either as a focus of attention, which can only hold one item, and a slightly larger “outer store” or area of “direct access” which can hold another three, or as a mental space holding four items of which only one can be the focus at any one time.

A further tier, which may be part of working memory or part of long-term memory, probably holds a number of items “passively”. That is, these are items you’ve put on the back burner; you don’t need them right at the moment, but you don’t want them to go too far either. (See my recent news item for more on all this.)

At present, we don’t have any idea how many items can be in this slightly higher state of activation. However, the “magic number 7” suggests that you can circulate 3 (+/-1) items from the backburner into your mental space. In this regard, it’s interesting to note that, in the case of verbal material, the amount you can hold in working memory with rehearsal has been found to more accurately equate to 2 seconds, rather than 7 items. That is, you can remember as much as you can verbalize in about 2s (so, yes, fast speakers have a distinct advantage over slower ones). You see why processing speed affects WMC.

Whether you think of WM as a focus of one and an outer store of 3, or as a direct access area with 4 boxes and a spotlight shining on one, it’s a mental space or blackboard where you can do your working out. Thinking of it this way makes it easier to conceptualize and talk about, but these items are probably not going into a special area as such. The thought now is that these items stay in long-term memory (in their relevant areas of association cortex), but they are (a) highly activated, and (b) connected to the boxes in the direct access area (which is possibly in the medial temporal lobe). This connection is vitally important, as we shall see.

Now four may not seem like much, but WM is not quite as limited as it seems, because we have different systems for verbal (includes numerical) and visuospatial information. Moreover, we can probably distinguish between the items and the processing of them, which equates to a distinction between declarative and procedural memory. So that gives us three working memory areas: verbal declarative; visuospatial declarative; procedural.

Now all of this may seem more than you needed to know, but breaking down the working memory system helps us discover two things of practical interest. First, which particular parts of the system are the parts that make a task more difficult. Second, where individual differences come from, and whether they are in aspects that are trainable.

For example, this picture of a mental space with a focus of one and a maximum of three eager-beavers waiting their turn, points to an important aspect of the working memory system: switching the focus. Experiments reveal that there is a large focus-switching cost, incurred whenever you have to switch the item in the spotlight. And the extent of this cost has been surprising — around 240ms in one study, which is about six times the length of time it takes to scan an item in a traditional memory-search paradigm.

But focus-switch costs aren’t a constant. They vary considerably depending on the difficulty of the task, and they also tend to increase with each item in the direct-access area. Indeed, just having one item in the space outside the focus causes a significant loss of efficiency in processing the focused item.

This may reflect increased difficulty in discriminating one highly activated item from other highly activated items. This brings us to competition, which, in its related aspects of interference and inhibition, is a factor probably more crucial to WMC than whether you have 3 or 4 or 5 boxes in your direct access area.

But before we discuss that, we need to look at another important aspect of working memory: updating. Updating is closely related to focus-switching, and it’s easy to get confused between them. But it’s been said that working memory updating (WMU) is the only executive function that correlates with fluid intelligence, and updating deficits have been suggested as the reason for poor comprehension (also correlated with low-WMC). So it’s worth spending a little time on.

To get the distinction clear in your mind, imagine the four boxes and the spotlight shining on one. Any time you shift the spotlight, you incur a focus-switching cost. If you don’t have to switch focus, if you simply need to update the contents of the box you’re already focusing on, then there will be an update cost, but no focus-switching cost.

Updating involves three components: retrieval; transformation; substitution. Retrieval simply involves retrieving the contents from the box. Substitution involves replacing the contents with something different. Transformation involves an operation on the contents of the box to get a new value (eg, when you have to add a certain number to an earlier number).

Clearly the difficulty in updating working memory will depend on which of these components is involved. So which of these processes is most important?

In terms of performance, the most important component is transformation. While all three components contribute to the accuracy of updating, retrieval apparently doesn’t contribute to speed of updating. For both accuracy and speed, substitution is less important than transformation.

This makes complete sense: obviously having to perform an operation on the content is going to be more difficult and time-consuming than simply replacing it. But it does help us see that the most important factor in determining the difficulty of an updating task will be the complexity of the transformation.

The finding that retrieval doesn’t affect speed of updating sounds odd, until you realize the nature of the task used to measure these components. The number of items was held constant (always three), and the focus switched from one box to another on every occasion, so focus-switching costs were constant too. What the finding says is that once you’ve shifted your focus, retrieval takes no time at all — the spotlight is shining and there the answer is. In other words, there really is no distinction between the box and its contents when the spotlight is on it — you don’t need to open the box.

However, retrieval does affect accuracy, and this implies that something is degrading or interfering in some way with the contents of the boxes. Which takes us back to the problems of competition / interference.

But before we get to that, let’s look at this issue of individual differences, because like WMC, working memory updating correlates with fluid intelligence. Is this just a reflection of WMC?

Differences in transformation accuracy correlated significantly with WMC, as did differences in retrieval accuracy. Substitution accuracy didn’t vary enough to have measurable differences. Neither transformation nor substitution speed differences correlated with WMC. This implies that the reason why people with high WMC also do better at WMU tasks is because of the transformation and retrieval components.

So what about the factors that aren’t correlated with WMC? The variance in transformation speed is argued to primarily reflect general processing speed. But what’s going on in substitution that isn’t going on in when WMC is measured?

Substitution involves two processes: removing the old contents of the box, and adding new content. In terms of the model we’ve been using, we can think of unbinding the old contents from the box, and binding new contents to it (remember that the item in the box is still in its usual place in the association cortex; it’s “in” working memory by virtue of the temporary link connecting it to the box). Or we can think of it as deleting and encoding.

Consistent with substitution not correlating with WMC, there is some evidence that high- and low-WMC individuals are equally good at encoding. Where high- and low-WMC individuals differ is in their ability to prevent irrelevant information being encoded with the item. Which brings me to my definition of intelligence (from 30 years ago — these ideas hadn’t even been invented yet. So I came at it from quite a different angle): the ability to (quickly) select what’s important.

So why do low-WMC people tend to be poorer at leaving out irrelevant information?

Well, that’s the $64,000 question, but related to that it’s been suggested that those with low working memory capacity are less able to resist capture by distracting stimuli than those with high WMC. A new study, however, provides evidence that low- and high-WMC individuals are equally easily captured by distracters. What distinguishes the two groups is the ability to disengage. High-capacity people are faster in putting aside irrelevant stimuli. They’re faster at deleting. And this, it seems, is unrelated to WMC.

This is supported by another recent finding, that when interrupted, older adults find it difficult to disengage their brain from the new task and restore the original task.

So what’s the problem with deleting / removing / putting aside items in focus? This is about inhibition, which takes us once again to competition / interference.

Now interference occurs at many different levels: during encoding, retrieval, and storage; with items, with tasks, with responses. Competition is ubiquitous in our brain.

In the case of substitution during working memory updating, it’s been argued that the contents of the box are not simply removed and replaced, but instead gradually over-written by the new contents. This fits in with a view of items as assemblies of lower-level “feature-units”. Clearly, items may share some of these units with other items (reflected in their similarity), and clearly the more they compete for these units, the greater interference there will be between the units.

You can see why it’s better to keep your codes (items) “lean and mean”, free of any irrelevant information.

Indeed, some theorists completely discard the idea of number of items as a measure of WMC, and talk instead in terms of “noise”, with processing capacity being limited by such factors as item complexity and similarity. While there seems little justification for discarding our “4+/-1”, which is much more easily quantified, this idea does help us get to grips with the concept of an “item”.

What is an item? Is it “red”? “red cow”? “red cow with blue ribbons round her neck”? “red cow with blue ribbons and the name Isabel painted on her side”? You see the problem.

An item is a fuzzy concept. We can’t say, “it’s a collection of 6 feature units” (or 4 or 14 or 42). So we have to go with a less defined description: it’s something so tightly bound that it is treated as a single unit.

Which means it’s not solely about the item. It’s also about you, and what you know, and how well you know it, and what you’re interested in.

To return to our cases of difficulty in disengaging, perhaps the problem lies in the codes being formed. If your codes aren’t tightly bound, then they’re going to start to degrade, losing some of their information, losing some of their distinctiveness. This is going to make them harder to re-instate, and it’s going to make them less distinguishable from other items.

Why should this affect disengagement?

Remember what I said about substitution being a gradual process of over-writing? What happens when your previous focus and new focus have become muddled?

This also takes us to the idea of “binding strength” — how well you can maintain the bindings between the contents and their boxes, and how well you can minimize the interference between them (which relates to how well the items are bound together). Maybe the problem with both disengagement and reinstatement has to do with poorly bound items. Indeed, it’s been suggested that the main limiting factor on WMC is in fact binding strength.

Moreover, if people vary in their ability to craft good codes, if people vary in their ability to discard the irrelevant and select the pertinent, to bind the various features together, then the “size” (the information content) of an item will vary too. And maybe this is what is behind the variation in “4 +/-1”, and experiments which suggest that sometimes the focus can be increased to 2 items. Maybe some people can hold more information in working memory because they get more information into their items.

So where does this leave us?

Let’s go back to our New Yorker cartoon. The difference between a chain of events and the nested attributions is that chaining doesn’t need to be arranged in your mental space because you don’t need to keep all the predecessors in mind to understand it. On the other hand, the nested attributions can’t be understood separately or even in partitioned groups — they must all be arranged in a mental space so we can see the structure.

We can see now that “A believes that B thinks C wants D” is easy to understand because we have four boxes in which to put these items and arrange them. But our longer nesting, “A wants B to believe that C thinks that D wants E”, is difficult because it contains one more item than we have boxes. No surprise there was a dramatic drop-off in understanding.

So given that you have to fill your mental space, what is it that makes some tasks more difficult than others?

The complexity and similarity of the items (making it harder to select the relevant information and bind it all together).
The complexity of the operations you need to perform on each item (the longer the processing, the more tweaking you have to do to your item, and the more time and opportunity for interference to degrade the signal).
Changing the focus (remember our high focus-switching costs).

But in our 5^th level nested statement, the error rate was 60%, not 100%, meaning a number of people managed to grasp it. So what’s their secret? What is it that makes some people better than others at these tasks?

They could have 5 boxes (making them high-WMC). They could have sufficient processing speed and binding strength to unitize two items into one chunk. Or they could have the strategic knowledge to enable them to use the other WM system (transforming verbal data into visuospatial). All these are possible answers.

This has been a very long post, but I hope some of you have struggled through it. Working memory is the heart of intelligence, the essence of attention, and the doorway to memory. It is utterly critical, and cognitive science is still trying to come to grips with it. But we’ve come a very long way, and I think we now have sufficient theoretical understanding to develop a model that’s useful for anyone wanting to understand how we think and remember, and how they can improve their skills.

There is, of course, far more that could be said about working memory (I’ve glossed over any number of points in an effort to say something useful in less than 50,000 words!), and I’m planning to write a short book on working memory, its place in so many educational and day-to-day tasks, and what we can do to improve our skills. But I hope some of you have found this enlightening.

References

Clapp, W. C., Rubens, M. T., Sabharwal, J., & Gazzaley, A. (2011). Deficit in switching between functional brain networks underlies the impact of multitasking on working memory in older adults. Proceedings of the National Academy of Sciences. doi:10.1073/pnas.1015297108

Ecker, U. K. H., Lewandowsky, S., Oberauer, Klaus, & Chee, A. E. H. (2010). The Components of Working Memory Updating : An Experimental Decomposition and Individual Differences. Cognition, 36(1), 170 -189. doi: 10.1037/a0017891.

Fukuda, K., & Vogel, E. K. (2011). Individual Differences in Recovery Time From Attentional Capture. Psychological Science, 22(3), 361 -368. doi:10.1177/0956797611398493

Jonides, J., Lewis, R. L., Nee, D. E., Lustig, C. a, Berman, M. G., & Moore, K. S. (2008). The mind and brain of short-term memory. Annual review of psychology, 59, 193-224. doi: 10.1146/annurev.psych.59.103006.093615.

Kinderman, P., Dunbar, R.I.M. & Bentall, R.P. (1998).Theory-of-mind deficits and causal attributions. British Journal of Psychology 89: 191-204.

Lange, E. B., & Verhaeghen, P. (in press). No age differences in complex memory search: Older adults search as efficiently as younger adults. Psychology and Aging.

Oberauer, K, Sus, H., Schulze, R., Wilhelm, O., & Wittmann, W. (2000). Working memory capacity — facets of a cognitive ability construct. Personality and Individual Differences, 29(6), 1017-1045. doi: 10.1016/S0191-8869(99)00251-2.

Oberauer, K. (2005). Control of the Contents of Working Memory--A Comparison of Two Paradigms and Two Age Groups. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(4), 714-728. doi:10.1037/0278-7393.31.4.714

Oberauer, Klaus. (2006). Is the Focus of Attention in Working Memory Expanded Through Practice ? Cognition, 32(2), 197-214. doi: 10.1037/0278-7393.32.2.197.

Oberauer, Klaus. (2009). Design for a Working Memory. Psychology of Learning and Motivation, 51, 45-100.

Verhaeghen, P., Cerella, J. & Basak, C. (2004) A Working Memory Workout : How to Expand the Focus of Serial Attention From One to Four Items in 10 Hours or Less. Cognition, 30 (6), 1322-1337.