Understanding the Adaptive Scoring Algorithm of the Digital SAT

by | SAT/ACT, SAT/ACT General

In the name of science, we went to the source and tinkered with the new Digital SAT practice tests released by the College Board. Our goal was to find out exactly what’s going on under the hood to determine scoring on this new test. In this post, we’ll lay out the specific questions we asked and what our research revealed.

Before we get into the details, a little background on the general way in which the Digital SAT algorithm works is in order. The new Digital SAT is a section-adaptive test. There are two types of sections on the tests: Reading and Writing, and Math. You’ll see two R&W sections in a row, then the two Math sections. How well you do on the first of each type of section determines which of two second sections you’ll encounter. One is easier and is seen if you miss too many questions on the first section; the other is harder and is accessed by getting most of the questions on the first section right. The sections of each type have the same number of questions (27 R&W, 22 Math).

We repeated Practice Test 1 of the Digital SAT (available through the College Board’s Blue Book app) dozens of times, exploring how mistakes on different questions in different sections affect the resulting score.

There are a number of things that we (as tutors) and you (as test-takers or their parents or guardians) need to know about this format in order to prepare effectively. Below are the questions that we addressed with our research:

  1. Are questions of different difficulty within a section equally important to your score?
  2. If questions are weighted differently, are the harder questions “worth” more?
  3. Are questions on the first R&W (or Math) section more or less important than those on the second section?
  4. How does the number of questions missed affect scoring on R&W vs Math?

1. Are questions of different difficulty equally important to your score?

The first question we had was whether the difficulty of individual questions matters. In other words, does missing one Reading and Writing question affect your score just as much as missing any other question from the same section? Or are different questions weighted differently within the section?

Answer: Question difficulty matters!

For example, missing an easy Reading and Writing question can drop your score on that section by 30 points; missing a different, harder question on the same section drops the score by only 20 points. Conclusion: Some questions are worth more than others.
To get a sense of how much this matters in the aggregate, we then performed a more nuanced experiment. Taking the second Math section as a testbed, we held constant getting all questions on the first Math section correct. In the first trial, we systematically got the first, easier half (#1-11) of the questions wrong in the second Math section while getting the last half (#12-22) right. A second trial repeated the test but switched the pattern of right and wrong answers, with 1-11 right and 12-22 wrong (#12-22) on the second section. Here are the results:

As the above demonstrates, holding the number of questions right and wrong constant, which questions you get right and which ones wrong does impact your score, though the difference is perhaps not as large as one might expect. The WAY in which one’s score is impacted is a little surprising (and complicated) as the below will describe.

2. If questions are weighted differently, are the harder questions “worth” more?

Given that question difficulty does impact one’s score, we then sought to determine whether harder questions are “worth” more. (This is a belief that many people, including those in the test prep industry, seem to hold)

Answer: Generally, no, harder questions are not “worth” more!

The above question is surprisingly difficult to answer! To begin with, referencing the data from above, we can see that missing the easiest half of the questions on the second Math section and getting the hardest half correct actually results in a lower score than the reverse pattern. This will surprise many people who believe that the algorithm uses a simple weighting system in which harder questions are “worth” more. This is also a common misunderstanding among test takers of other adaptive tests such as the GMAT or GRE, one that leads people to spend a lot of time on the hardest questions, often at the expense of the easiest ones, even though such a strategy does not yield a higher score on those tests. We have some theories as to why missing the easiest questions can be more punishing to your score based in part on our experience with the GMAT and GRE (which are both adaptive tests as well), but the truth is that the algorithm is complicated and resists simple interpretation as the below will demonstrate.

To confirm that the above would hold true on Reading and Writing and across both the first and second sections of R&W and Math, we ran MANY more experiments, varying the number of easy and hard questions right and wrong and distributing these differences across both the first and second sections of the test. What we found is that the results vary depending on whether one misses easy or hard questions on the first section vs the second section and that the results are also different on Math vs Verbal. So in some cases getting harder questions right at the expense of easier ones does yield a higher score, but in many other cases the reverse is true.

Because of the ambiguity and confusion of the above, we’d like to try to clarify and simplify some of the takeaways…

First, the score differences are actually pretty minor. Even switching out the easiest half of the Math questions on the second section for the hardest half of the questions only results in a 30-point score difference. And in many of our experiments in which 5 easy questions were substituted for 5 hard questions, the difference was only 10 points.

Second, and more importantly, it should be noted that the above experiments are a little unrealistic, so one should consider what would be likely to actually happen to a test-taker on the SAT. For example, it’s obviously very unlikely that someone who would get the easiest half of the questions right but miss the hardest half could, by spending more time on the harder questions, get the hardest half right. Obviously, the harder questions are harder, so people are more likely to miss them no matter how much time they spend on them.

Key Takeaway: Given the above, test-takers should NOT rush through the easiest questions and risk making mistakes just to get to the hardest questions! Getting the hardest questions right at the expense of the easiest ones often does not result in a better score. More importantly, test-takers are more likely to miss those harder questions anyway, so rushing through the easiest ones and risking making careless mistakes only to then get many of the hard questions wrong anyway is a pretty foolish strategy.

3. Are questions on the first R&W (or Math) section more or less important than those on the second section?

The next thing we wanted to know was the relative importance of the two sections within each type. For example, could a poor performance on the first Math section be made up for by doing well in the second Math section? If you do badly enough to see the easier second section, how much does that bring down your maximum possible score (even if you go on to get everything right on the second section)?

Answer: The first section is more important!

The way we probed this was as follows. We first got everything wrong in the first section and then right in the (easier) second section. Then we repeated the test, getting everything right
in the first section and missing all the questions in the second section. Since the two sections of each type are the same length, each case represented the same number of total questions missed (50% of the total). Here are the results for Math:

There’s a real, 20 point difference. That’s significant, but not as bad as we might have thought. One wonders whether it’s the same for Reading and Writing, which brings us to our next question.

4. How does scoring compare between the R&W sections and the Math sections? Is missing questions more damaging on R&W or Math?

We wanted to know if the scoring is more or less the same for the different types of sections. For example, is missing a single question a bigger deal for your score on one section (R&W vs Math) than the other?

Answer: Missing 1 R&W question has a much bigger effect than missing 1 Math question!

Math is scored more generously: missing up to two Math questions, even early/easy ones, was only observed to drop the Math score by 10 points. Meanwhile, as mentioned above, a single incorrect response to a Reading and Writing question was seen to cause a 20 or 30 point drop in one’s score. That’s worth keeping in mind!

As a further wrinkle, we were interested in repeating the 1st right, 2nd wrong / 1st wrong 2nd right experiment described above for the Reading and Writing section. Not only are the scores we get different between the Math and R&W, but the spread in scores is also larger for R&W. The table below repeats the scores for Math shown above alongside the scores for R&W obtained in this way.

In the table above, “1st Right 2nd Wrong” means all questions on the first of the relevant kind of section were answered correctly, while all the questions on the second section were answered incorrectly. And so on. This suggests that the relative importance of the first section compared to the second is even greater for Reading and Writing than for Math.

One important thing to note about the relative significance of the first section…Because we have many years of experience coaching test-takers for the GRE, which is section-adaptive in the same way that the SAT is, we know that stressing the relative importance of the first section is not necessarily helpful. Telling test-takers that they need to do well on the first section in order to bump up into the harder section is just not actionable advice and probably only serves to put more pressure on people.

Key Takeaway: It’s worth understanding that it’s important to be focused during the first sections of the R&W and Math so that you can bring your A-game, but it’s not helpful to stress unnecessarily about them.

Conclusions

The above information is intended to shine a light on how the scoring of the Digital SAT works and to help clarify some of the points of misunderstanding out there. Clearly, the first sections are more important and clearly the “curve” is more punishing on R&W than on Math. And question difficulty certainly seems to matter, but in unpredictable ways. That said, it’s definitely NOT the case that the harder questions are simply “worth” more.

It’s helpful to remember in all of this that adaptive tests are generally more accurate than non-adaptive ones because they can really assess how a person performs at different levels of difficulty, and there are pretty predictable patterns for what happens to test-takers as they encounter harder and harder questions, patterns that adaptive tests leverage in their scoring algorithms. However, since many of our trials replicated very unlikely scenarios that violate many of these patterns, the results may have made the scoring algorithm appear to be a little wacky or confusing.

In the real-world application of the test, however, the Digital SAT will draw on these predictable patterns to arrive at what, in most cases, is likely to be a very accurate score. Years of working with students on the GMAT and GRE (again, both adaptive tests) has taught us that these adaptive algorithms really tend to be very accurate in predicting a test-taker’s level and in producing an accurate score. So test-takers, parents, and educators should be fairly confident that the Digital SAT will produce scores that are accurate in most cases and probably more accurate than those of the paper and pencil SAT and ACT.

(Readers can easily explore the new digital SAT for themselves by creating a student account on the College Board and getting free access to the Practice Tests through the Blue Book app. As a friendly tip, we had to sign out and then back in again to load new scores each time we took a practice test to see the new score. You can take each one an unlimited number of times; newer scores appear to the left of older scores.)