Research: Teaching to the test: Unraveling the consequences for student motivation

Title: Teaching to the test: Unraveling the consequences for student motivation
Authors: Joy Muth,Marko Lüftenegger
Access the original paper here
Listen to a deep-dive podcast:

Paper summary

This research article examines the impact of “teaching to the test” (TTT), a practice focusing on test preparation, on secondary school students’ motivation in Austria. While commonly viewed negatively, this study, using longitudinal data and Expectancy-Value Theory, found that students perceived an increase in TTT closer to graduation. Contrary to expectations, this perceived increase was linked to positive changes in intrinsic motivation, perceived importance, and utility of the subject, suggesting TTT may not be inherently detrimental. The authors acknowledge limitations and call for further research to understand these complex relationships.

If teachers are to remember one thing from this study, it should be…

Teaching to the test (TTT), contrary to widespread concerns, might not have the detrimental effects on student motivation that many researchers and educators fear, and can even enhance some aspects of it when used strategically before exams

Paper Deep Dive

Define any technical terms used in the paper

Based on the provided sources, here are definitions of some of the technical terms used in the paper:

p-value: A statistical measure used to determine if a result is statistically significant, indicating the probability of observing the data (or something more extreme) if the null hypothesis were true (e.g., if there were no real relationship or difference). Typically, a p-value below 0.05 is considered statistically significant.

Teaching to the Test (TTT): Defined as “a conglomerate of teaching practices especially focused on (excessive) test preparation”. It’s a term often negatively discussed in education and is frequently mentioned as a consequence of standardized testing. TTT practices can include familiarizing students with test formats and strategies (like how to fill in bubble sheets or practice under time pressure), repetitive exercising (using past exam questions), teachers referencing the test during lessons (explaining how new content will be tested), and providing test-like material for practice. While often seen as undesirable and suspected of negative consequences like making teaching monotonous, inhibiting meaningful learning, and killing student motivation, this study examines its actual effects on motivation.

Washback effects: Refers to the influence of tests, particularly standardized ones, on teaching practices and student outcomes.

Expectancy-Value Theory (EVT) / Situated Expectancy-Value Theory (SEVT): A theoretical framework used to investigate students’ motivational self-beliefs. The modern version (SEVT) explains motivation based on expectancy beliefs (perceived ability to succeed) and value beliefs (reasons for engaging in a task or subject). It posits that teachers’ instructional behavior influences student motivation through students’ interpretation of that behavior. The study uses SEVT to examine how TTT might affect student motivation.

Expectancy beliefs: The subjective perception of one’s ability to succeed in achievement-oriented tasks. In the context of school, this is often measured as self-efficacy.

Self-efficacy: The belief that a specific task can be successfully accomplished at a designated level. High self-efficacy is associated with high engagement, persistence, and academic performance. In this study, it was measured as students’ confidence in achieving certain passing grades on their next class test.

Value beliefs: The reasons students have for engaging with a task, domain, or school subject. The SEVT differentiates four types:

Intrinsic value: The sense of joy experienced when performing a task. This is considered a more intrinsic aspect of motivation.

Attainment value: The personal importance placed on doing well in a task or domain, and the personal relevance of the task/domain. This is also a more intrinsic aspect of motivation and is related to a student’s identity and self-schema.

Utility value: The perceived usefulness of engaging in a task or domain. This is considered a more extrinsic aspect of motivation.

Cost: The undesirable effort required to engage in a task or domain. This is another more extrinsic aspect of motivation.

Motivational development: Refers to how student motivation changes over time. The study longitudinally investigated this in the context of perceived TTT.

Longitudinal study/data: Research that collects data from the same group of participants at multiple points in time. This allows researchers to examine changes in variables over time.

Latent change score modeling (LCSM): A statistical method used in the study to analyze change in variables between measurement points. It allows researchers to examine how individuals change over time (intra-individual change) and how the amount of change differs between individuals (interindividual differences in change). In LCSM, change is represented as a latent factor.

Latent change: The change in a variable over time, which is treated as a variable that is not directly observed but inferred from the measured data at different time points.

Level-to-change path: A specific type of relationship examined in LCSM where the value of a variable at an earlier time point predicts the amount of change in an outcome variable over time. The study tested if initial perceived TTT predicted motivational change.

Change-to-change path: A relationship examined in LCSM where the amount of change in one variable predicts the amount of change in another variable over time. The study tested if the increase in perceived TTT predicted changes in motivation.

Secondary education (Austria): In the context of Austria, this refers to schooling that starts after four years of primary school. The academic track, called “Gymnasium,” typically lasts eight years and concludes with the standardized school leaving examination.

Matura: The standardized school leaving examination taken at the end of upper secondary education (Gymnasium) in Austria. Passing the Matura allows students to apply to any university in Austria, giving it low to medium stakes.

Confirmatory factor analysis (CFA): A statistical method used to assess the underlying structure of measured variables (e.g., to confirm that items designed to measure a construct like “intrinsic motivation” actually group together). It helps determine the dimensionality of scales.

Composite reliability: A measure used to estimate the reliability of a scale (how consistently it measures a construct). It is denoted as “Omega” in the table.

Longitudinal measurement invariance: A statistical property indicating whether a measurement scale (like the one for intrinsic motivation) has the same meaning and structure across different time points when data is collected longitudinally. Establishing this is important for comparing scores over time.

Robust maximum likelihood estimator (MLR): A statistical method used in the analysis software (Mplus) to handle the data, particularly appropriate for non-normally distributed data and used here in conjunction with handling the hierarchical structure of the data.

Hierarchical structure of the data: Refers to data organized in levels, such as students nested within classrooms or schools. The analysis controlled for this structure.

Full information maximum likelihood (FIML): A statistical technique used to handle missing data by using all available information in the model estimation process.

Goodness-of-fit indices (CFI, TLI, RMSEA, SRMR): Statistical measures used to evaluate how well the specified statistical model (e.g., CFA, LCSM) fits the observed data. Common indices include Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR). Guidelines exist for interpreting these values.

Item parceling: A technique used in structural equation modeling where multiple items are combined into fewer “parcels” to reduce the complexity of the model, used here for the TTT measure in the LCSMs.

Intraclass correlation (ICC): A statistic that indicates how much variability in a variable is due to the grouping structure (e.g., how much students’ scores within the same class are more similar to each other than to scores from students in different classes).

Latent means: The estimated average value of a latent variable (like a latent change score or a latent factor representing a construct like TTT) in the statistical model.

Standardized estimate (ß): A standardized regression coefficient, which indicates the strength and direction of the relationship between a predictor and an outcome variable, expressed in standard deviation units. This allows for comparison of effect sizes across different variables.

Standard error (SE): A measure of the precision of a statistical estimate (e.g., the mean, a regression coefficient). A smaller standard error indicates a more precise estimate.

What does this paper add to the current field of research?

Based on the sources and our conversation, this paper significantly adds to the current field of research in several key ways:

Focus on Student Perception: The study shifts the focus from teachers’ reports of using Teaching to the Test (TTT) to students’ perceptions of TTT practices. Previous research largely focused on teachers’ statements and experiences, leaving a gap in understanding the effects on the learners themselves. The authors explicitly state that surprisingly little is known about if and how students perceive their teachers applying TTT practices, and this study addresses that gap.
Investigation of the TTT-Motivation Link: To the authors’ knowledge, this was the first study to directly examine the association and direct effects of (perceived) TTT on student motivation. Despite widespread concerns and speculation that TTT negatively impacts motivation, empirical research on this specific link was previously lacking.
Longitudinal Examination of Change: The study is among the first to longitudinally investigate whether perceived TTT changes over time as a standardized exam approaches. It also examines how this change in perceived TTT is associated with changes in student motivation. Prior studies mainly adopted cross-sectional approaches, which couldn’t capture these nuances of change over time.
Challenging Common Negative Assumptions: Contrary to the prevailing negative view of TTT in the literature, the study provides empirical evidence that challenges the belief that TTT inherently harms motivation. The findings suggest that when perceived TTT increases closer to an exam, it can actually be associated with positive changes in aspects of motivation like intrinsic motivation, importance, and utility. This is a significant departure from the assumption that TTT “kills students’ motivation”.
Use of Advanced Statistical Methods: By employing Latent Change Score Modeling (LCSM) with longitudinal data, the study utilizes a method that effectively captures nuanced changes within individuals over time and interindividual differences in change, accounting for measurement error. This is a more sophisticated approach compared to the correlational or regression-based methods commonly used before.

In essence, this paper makes a novel contribution by being the first longitudinal study to specifically investigate secondary students’ perceptions of Teaching to the Test and its direct influence on their motivational development, providing initial evidence that challenges the long-held assumption that TTT is detrimental to student motivation, particularly in the context of preparing for an upcoming exam.

What are the characteristics of the participants in the study?

Based on the provided sources, here are the characteristics of the participants in the study:

Location: The study was conducted in Austria. The students were nearing the standardized school leaving examination (Matura) at the end of upper secondary education in Austria.

Total Sample Size: The final sample consisted of 1855 students.

Grade Level: Students were initially in grade 11 at the first measurement wave and were assessed again in grade 12 in the consecutive wave. The study specifically used data from these two waves.

Age: At wave one, participants were aged between 15 and 20 years, with a mean age (MAge) of 17.09 years and a standard deviation (SD) of 0.79.

Gender Distribution: The sample included 37% male, 60.9% female, and 1.1% diverse students.

School Type and Track: Participants attended Austrian general secondary schools of the highest track. This track is known as “Gymnasium” and typically lasts eight years after four years of primary school. These schools represent around 11% of the total population of such schools in Austria.

Subject of Focus: All measurements related to the subject “English as a foreign language” or the students’ English teacher.

Sampling: The study recruited students from 30 Gymnasium schools selected to be representative of the population in terms of achievement and context factors.

Exclusions: The final sample of 1855 students was derived from an initial cohort of 1935. Students who only participated in wave one and had missing information needed for controlling the hierarchical data structure (17 students) were excluded. Additionally, students who experienced a change in their base-class composition between grade 11 and grade 12 (63 students) were excluded to avoid confounding motivational changes.

What are the key implications for teachers in the classroom?

Based on the sources and our conversation, the study offers several key implications for teachers in the classroom, particularly regarding their use of Teaching to the Test (TTT) strategies:

Reduced Anxiety about Test Preparation: The findings suggest that teachers might not need to feel as anxious about using test preparation strategies as previously assumed. The study challenges the long-held, negative view that TTT is inherently harmful to student motivation.
TTT May Not Be Inherently Detrimental: Contrary to widespread concerns and the negative connotation often associated with TTT, the study provides empirical evidence that TTT, at least as perceived by students, does not necessarily have the detrimental effects that researchers and educators fear.
Potential Positive Motivational Effects (Especially When Increasing): The study found that when students perceived an increase in TTT practices from grade 11 to grade 12 (as the standardized exam approached), this was significantly associated with positive changes in certain aspects of their motivation. Specifically, it predicted positive changes (either accelerating increase or decelerating sinking tendencies) in:
- Intrinsic motivation (liking the subject).
- Importance (personal importance of doing well).
- Utility (perceived usefulness for daily life/future). This suggests that TTT may potentially enhance student motivation, particularly when used strategically in preparation for an upcoming exam.
Use TTT with a “Clear Conscience”: Based on these findings, teachers may be able to use TTT practices with a clear conscience without worrying that it will instantly kill their students’ motivation.
Moderation is Key: While the findings are more positive than expected, the authors caution that this does not mean teachers should abandon other teaching methods. They suggest that, as in many areas, “the dose makes the poison” and teachers should strive to provide students with a variety of teaching methods, using TTT “in moderation”.
Don’t Abandon Student-Centered Teaching: The results do not imply that teachers should “throw student-centered teaching strategies overboard” when preparing for an exam.

It’s important to note that these implications are based on students’ perceptions of TTT, which might not perfectly align with teachers’ reported practices. However, the study emphasizes that students’ subjective perceptions of their psychological environment are crucial when examining factors affecting motivation. The study focuses on perceived TTT in the context of preparing for a standardized school leaving examination in English in Austrian general secondary schools of the highest track.

Why might teachers exercise caution before applying these findings in their classroom?

Based on the sources and our conversation, teachers should exercise caution before broadly applying the findings of this study in their classrooms for several key reasons:

Cautious Interpretation Advocated by the Authors: The authors themselves explicitly state that their findings should be interpreted cautiously, especially concerning practical implications, because the criteria for evidence-based practice in education are high, and these results come from a single study.
Use TTT in Moderation: While the study suggests TTT might not be inherently detrimental to motivation, the authors do not recommend excessive use. They emphasize that “the dose makes the poison” and advise teachers to continue providing a variety of teaching methods, using TTT “in moderation”. This implies that overreliance on TTT could still have negative consequences not fully explored here.
Focus on Student Perception: The study examined students’ perceptions of TTT practices, not necessarily teachers’ actual reported use. While student perception is crucial for understanding motivation, there might be a difference between what students perceive and what teachers are consciously doing, or the quality of that practice.
Specific Context of the Study: The research was conducted in a very specific context:
- Subject: English as a foreign language.
- School Type: Austrian general secondary schools of the highest track (Gymnasium).
- Exam Stakes: The study participants were preparing for the standardized school leaving examination (Matura) in Austria, which is described as having low to medium stakes. The perceived value or stakes of the test can significantly influence how students react to test preparation strategies. Findings might differ for high-stakes exams in other contexts.
- Cultural/Educational Context: The study is set within the Austrian education system. The generalizability of these findings to other cultural contexts or educational systems with different testing cultures and teacher practices is not guaranteed.
Limitations in Measurement: The study notes limitations regarding the longitudinal measurement invariance for utility value and self-efficacy, and low reliability for the utility scale. This suggests that the observed effects for utility value and self-efficacy might be an “under-representation of the complex interplay” and potentially less robust findings.
Achievement Not Examined: The study focused on motivation and did not investigate the effects of TTT on students’ actual achievement in the final exam. While motivation is important, teachers are also often responsible for ensuring students achieve learning outcomes, including performance on tests.
Underlying Mechanisms Unclear: While the study found positive associations between increasing perceived TTT and certain motivational aspects, the authors note that the underlying psychological mechanisms explaining these links are not yet fully understood. It’s unclear precisely why students might feel more motivated by increasing TTT (e.g., is it purely because they see it as useful for a valued goal, or are other factors involved?).
Limited Number of Measurement Waves: The study used data from only two measurement waves over a relatively short period (end of grade 11 to beginning of grade 12). More waves could provide a deeper understanding of how perceived TTT and motivation interact over a more extended period.

In summary, while the study offers promising initial evidence that challenges the assumption that TTT is inherently bad for motivation, particularly when increasing before an exam, it is a first step in a previously under-researched area. The specific context, focus on perception, methodological limitations, and the need for further research into underlying mechanisms all warrant a cautious approach before translating these findings into widespread classroom practice. Teachers should feel less anxious about using some TTT but should not abandon a varied pedagogical approach based solely on these findings.

What is a single quote that summarises the key findings from the paper?

These findings challenge the belief that TTT inherently harms motivation, suggesting it may enhance some aspects of motivation when used strategically before exams