Getting to ‘flow’ state while studying: simulating mental effort while reading

11 min readApr 7, 2021

Like many self-directed adult learners, when I become interested in a new topic, I try to start with an introductory article or book. This may work just fine if you happen to be the exact target audience. But the author of that text assumes a certain profile for the reader — it is assumed that they’re familiar with certain concepts and skills. Sometimes, the text seems to go over a bunch of concepts I already know all about, and before I get to the meat of it, my eyes glaze over. At other times, the text contains terms I have to keep looking up, and I become overwhelmed. I end up playing Goldilocks in an endless attempt to find the right things to read (this book is too easy, this article is too difficult). I spend an enormous amount of energy in switching contexts and selecting material, and not enough energy in actually engaging with the content in a manner that’s productive and enjoyable.

Is there a way to select the best material for me based on my background, from several documents, that doesn’t require trial-and-error, or spending valuable time skimming several documents? This article is about exploring automated solutions to this problem using educational psychology principles.

You can skip ahead to the last section for a demonstration of how a personalized learning path can be made for the star wiki page from previous posts based on individual background knowledge. For those who want more context, the following paragraphs will describe cognitive load theory and an explanation of the working memory model that was used to simulate cognitive load in readers.

Working memory

In cognitive psychology, various systems have been proposed for how humans make sense of new information and learn. Many systems are modeled after computer systems architecture, dividing memory roughly into working memory (analogous to RAM), and long term memory (ROM). Working memory is the part of your memory that holds a small amount of new incoming information, and processes it to try and fit it with what it already knows. It is limited both in its capacity and its duration. The capacity of our working memory is widely believed to be around 7 ± 2 pieces of information (Miller, 1956), although recent research puts the number at as low as 4 (Cowan, 2001). The duration for which any piece remains in the working memory is believed to be around 10–15 seconds, after which, if it doesn’t contribute to making meaning, it is forgotten.

Information processing cognitive architecture

How we get better at a subject

As you become familiar with concepts, they are stored in long term memory in schema. If the concepts are strongly related, they become ‘chunked’ together. Long-term memory is unlimited in its capacity and can store increasingly more complicated schema involving thousands of concepts. When we next need to work with them, the entire group of related concepts can be retrieved into the working memory as one piece- a chunk. Thus, as we learn, we are able to process more and more complex information with the same working memory resources. Knowing this builds a strong case against teaching attitudes like ‘throw ’em in the deep end’. If we are fed ‘chunks’ at a suitable pace (not too easy, not too hard) that is personalized for our long term memory, learning is much more effective.

Cognitive load

Based on Cognitive Load Theory (Sweller, 2011), cognitive load is the amount of working memory resources used during an activity. Too low a cognitive load (cognitive underload) causes low cognitive presence, a state that encourages distractions, whereas cognitive overload leads to anxiety and giving up (Yerkes-Dodson law). There is a cognitive load sweet spot where learning and task performance is highest similar to being in the state of flow (Csikszentmihalyi, 2014), where the working memory load matches working memory capacity (Klingberg, 2009). Studies on text comprehension support this theory, for instance, the reverse cohesion effect, where low knowledge readers gained from texts with more sentence-to-sentence cohesion, while high knowledge readers gained from low-cohesion text (O’Reilly and McNamara, 2007). Cohesion in this context consists of adding nouns instead of pronouns, defining unfamiliar concepts, adding connectives between sentences to clarify relationships, and adding argument overlap between sentences.

Cognitive load is categorized as intrinsic, extraneous and germane based on the source of the load.

Intrinsic Cognitive Load

As you read, new concepts you encounter add to the cognitive load on your working memory. If these unfamiliar concepts have to be processed together because the interactions between them are important (this is called high element interactivity), then the intrinsic cognitive load is high. Intrinsic load is managed by pacing your studying so core concepts are familiarized before tackling concepts that build on them. A previous article shows how to determine the most important relationships for a concept. We can determine the concepts a student is already familiar with through pretesting, and select reading material that keeps new and unfamiliar concepts to an optimal level.

Extraneous Cognitive Load

Extraneous cognitive load comes from information presented in an unnecessarily complicated manner. Extraneous load can come in many forms, such as crowded slides in a presentation or a professor who goes off on an interesting but long winded tangent.

While reading to learn, verbose sentences and meandering explanations force the student to make many micro decisions about what is relevant, taking up precious working memory resources. Readability indices are statistical metrics widely used by writers to ensure that their writing is at the correct reading level for the targeted audience, and indirectly measure the cognitive load on a reader. The calculations include average word-length of a sentence and the average number of syllables per word. Readers can also use these metrics while selecting reading material. Even if we’re able to read at a high reading level, while learning new concepts, clarity and simplicity are desired.

Germane Cognitive Load

The mental processes of encoding new and relevant information, and reorganizing the schema in your long term memory contributes to actual learning, and is called germane cognitive load. Meaning is made out of information by acquiring schema — schema is the structure that organizes and classifies concepts and the relationships between them based on how the information is used. New information might require us to modify existing schema and practice recalling it in useful chunks when certain cues a triggered.

Since intrinsic, extraneous, and germane load are additive, the key to learning well is to pace your learning to manage intrinsic cognitive load, and reduce extrinsic cognitive load. This will free up working memory resources for germane cognitive load.

Optimizing Cognitive Load: selecting material based on pre-existing knowledge and capacity

Together with a snapshot of the concepts known by the reader, the cognitive load generated by a sentence or paragraph can be estimated. Based on the learning objective and learning paths generated, material may be selected so that the cognitive load is optimized. We also look at the suitability of each text to increasing our current understanding.

Working memory model for simulation

As a student reads a sentence, the main concepts populate the working memory. Based on the working memory capacity, the student may be able to hold the concepts and process the relationships between them. If the concept is familiar to them (retrieved from long term memory), it takes up very little space in working memory. If it is unfamiliar, it takes up a larger portion of the working memory space. As a baseline, all words below a certain threshold IDF value computed from the Reuters corpus are assumed to be familiar to the student. This can be adjusted as a parameter.

Two mechanisms of concepts leaving working memory are modeled: forgetting because it is underused, or pushed out from being overloaded. If a concept is unused in making meaning and not encountered again within a span of time, it is forgotten. On the other hand, if a sentence contains many unfamiliar concepts, concepts currently in working memory are ‘pushed out’ to make space for the new concepts. What concepts are pushed out are assumed to depend on how long ago the concept was last read, how familiar it is, and how many times they’ve been encountered while reading.

The mechanism for ‘familiarizing’ concepts is assumed to be encountering a new concept when cognitive load is at an optimal level a certain number of times. Once a concept is familiarized, it occupies less space in the working memory, and reduces the cognitive load of learning other concepts that build on an understanding of it. This forms the basis of making a personalized learning path.

Parameter settings

The following are parameters used in the model:

Simulation and Results

We return to the graph of important relationships gathered for the concept ‘star’ from its wikipedia article. A previous article outlines how these were obtained. Using the central concept ‘star’ as a starting point, we look at other concepts in relation to it. For the concepts ‘star’ and ‘stellar wind’, text blurbs are first extracted from the article, and the working memory load is simulated based on a set of concepts considered familiar for the student.

Let’s first look at a few sample blurbs:

Blurb 5:
A large portion of the star's angular momentum is dissipated as a result of mass loss through the stellar wind.Working memory load: 2.4, never overloaded.
Concepts familiarized: none
Unfamiliar concepts: 'angular momentum'Blurb 2:
"Since the lifespan of such stars is greater than the current age of the universe (13.8 billion years), no stars under about 0.85 M☉ are expected to have moved off the main sequence. Besides mass, the elements heavier than helium can play a significant role in the evolution of stars. Astronomers label all elements heavier than helium "metals", and call the chemical concentration of these elements in a star, its metallicity. A star's metallicity can influence the time the star takes to burn its fuel, and controls the formation of its magnetic fields, which affects the strength of its stellar wind.Working memory load: 4.0, overloaded many times.
Concepts familiarized: none (although metallicity comes close, and may get familiarized on further reading.
Unfamiliar concepts: main sequence, chemical concentration, metallicity, magnetic fieldsBlurb 3:
"The existence of a corona appears to be dependent on a convective zone in the outer layers of the star. Despite its high temperature, and the corona emits very little light, due to its low gas density. The corona region of the Sun is normally only visible during a solar eclipse. From the corona, a stellar wind of plasma particles expands outward from the star, until it interacts with the interstellar medium."Working memory load: 4.8, overloaded many times.
Concepts familiarized: corona was not familiarized due to higher than optimal cognitive load
Unfamiliar concepts: corona, convective zone, gas density, plasma particles, interstellar medium, outer layer

If all concepts are familiar to the student, most blurbs are in the optimal cognitive load range (shown between the two horizontal black lines). If this were not the case, we would reject these blurbs as suitable reading material from which to learn about stellar wind, and move on to another relationship, or perhaps another article. However, based on the student’s current knowledge, most blurbs involving ‘star’ and ‘stellar wind’ are too overwhelming to understand.

How many concepts would we need to first learn in order to drop the cognitive load (shown by the two horizontal black lines) to around optimal levels for at least 3 of the blurbs? (the assumed familiarization count parameter) The answer is 13.

As a contrast, ‘nuclear fusion’ needs 17 concepts familiarized before 3 blurbs drop into the optimal cognitive load level, while ‘main sequence’ needs 2.

Comparison with simple star wikipedia page

The above material provides few options for the student. Adding more material will increase the number of options. A first step optimization can be done using readability indices.

The table below shows the difference in readability metrics for the regular wikipedia page and the simple wikipedia page on stars, computed using the py-readability-metrics library. The simple wikipedia was created for children and adults learning English. The simple wikipedia page is easier to read on the bases of most metrics.

Looking at the relationship ‘star’ to ‘supernova’:

The results for the regular star wikipedia page:

Regular star wikipedia page results for star->supernova

The results for the simple wikipedia page for the same relatonship is below:

Simple wikipedia page results for star->supernova

Conclusion

The working memory model shows on average how many blurbs are below the optimal cognitive load window in the simple wikipedia page, and how many are above in the regular star wikipedia page. This simple working memory model provides a method to select or reject reading material for a student based on their individual background knowledge in an automated manner, thus saving a novice much time and effort. It provides a basis on which to construct a personalized learning path. Changing the parameters for individual student’s memory capacity helps set an optimal learning pace.

References

Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1), 87–114. https://doi.org/10.1017/S0140525X01003922

Csikszentmihalyi, M., Abuhamdeh, S., & Nakamura, J. (2014). Flow. In M. Csikszentmihalyi (Ed.), Flow and the Foundations of Positive Psychology: The Collected Works of Mihaly Csikszentmihalyi (pp. 227–238). Springer Netherlands. https://doi.org/10.1007/978-94-017-9088-8_15

Klingberg, T. (2009). The overflowing brain: Information overload and the limits of working memory (pp. xiv, 202). Oxford University Press.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. https://doi.org/10.1037/h0043158

O’Reilly, T., & McNamara, D. S. (2007). Reversing the Reverse Cohesion Effect: Good Texts Can Be Better for Strategic, High-Knowledge Readers. Discourse Processes, 43(2), 121–152. https://doi.org/10.1080/01638530709336895

Paas, F., Renkl, A., & Sweller, J. (2004). Cognitive Load Theory: Instructional Implications of the Interaction between Information Structures and Cognitive Architecture. Instructional Science, 32(1/2), 1–8.

Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive Load Theory. Springer Science & Business Media.