Jonathan's Junkyard: A tale of two bell curves

I realised that most of my readers are students looking for information about undergraduate life. The best-performing posts on this blog are those about academics. So here's one more that should be useful to you if you're a student or soon-to-be student. It's about something quite familiar yet mysterious: the bell curve.

Most Singaporean students would have heard of the bell curve. It's a grading system that is applied at major national exams like the Primary School Leaving Examination (PSLE) and General Certificate of Education (GCE) exams. The majority of courses at the local autonomous universities grade students on a curve too.

This has led to generations of students building a cult of mystique around the bell curve, developing time-honoured traditions like praying to the bell curve god in the hope of getting good grades. But the bell curve isn't a supernatural force. Of course it isn't: it was developed by humans like you and me. So there's no need to be overawed by its supposed power. By understanding what the bell curve is and the mechanism behind it, you will be better able to manage your anxieties surrounding it.

A few years ago, the National University of Singapore (NUS) Provost Office published a blog post attempting to demystify the bell curve grading process. The post has since become a much-cited resource when the bell curve is being discussed. You can read it here.

I want to emphasise a few key takeaways from the Provost's post.

The most important one is that the term "bell curve" refers to the shape that occurs naturally when you graph the scores. All it means is that many students score marks in the middle of the range, and few students score very high and very low marks.

Credit: Corporate Finance Institute.

The bell curve is a statistical phenomenon that occurs in almost every facet of everyday life. It was not invented for the purpose of grading students. Rather, it is a pattern that was discovered to hold true time and again across a wide variety of scenarios. If you graph the height of humans or the number of goals scored by each player in a soccer league season in a similar way, you will likely find that the shape of your graph is a bell curve. One condition that must be fulfilled is that your sample size must be large enough. Statisticians use the guideline of n = 30. In English, that means that for example, if you want to see a bell curve when you graph the number of goals scored by each player in a soccer league season, you must have data from at least 30 players. Otherwise, your graph may not have the bell curve shape. In NUS, this guideline means that for modules where the enrolment is less than 30 students, the bell curve grading system is usually not applied and the professor awards grades based on absolute performance or their discretion.

So to repeat myself, the bell curve occurs naturally. It is not something that the professors force onto the scores. They don't do anything to the raw scores. They don't need to, because the raw scores fall into the bell curve shape on their own.

What, then, do the professors do in the bell curve grading system? To help me explain, let me bring in an example from my recent semester results.

What's in a B+?

I was given a grade of B+ for the modules JS1101E Introduction to Japanese Studies and NM2203 Social Media in Communication Management.

For NM2203, I answered all of the questions during the final exam in what I thought was a comprehensive manner. I felt confident that I knew the content well. For the assignments, I handed everything in on time and all my submissions were complete and immaculate. Overall, it was an easy module.

In contrast, JS1101E was hard. The big essay assignment was an academic paper about some aspect of Japan, and I struggled. The finals, worth 50% of the total grade, was the most diabolical multiple-choice questions (MCQ) paper I have seen in my entire student life. I definitely knew the answers to a paltry 20% of the questions at best, had a good feeling about my intuitions for another 30% or so, and resorted to blind guessing for the remaining half of the paper.

I got the same grade for both modules despite having very different experiences with them, and that can be explained by the bell curve.

Let's think about each module separately.

Here is a hypothetical diagram showing the distribution of scores for JS1101E. Each icon of a person represents one student, and there are 64 of them in total arranged into a perfect bell curve shape. Hey, fictionalised textbook illustrations are always perfect, right?

In this diagram, we assume that the highest total score for the module happened to be 54 and the lowest was 13. Most people scored in the middle of that range, maybe around the 30s. Like I said, it was a difficult module. The lecturer, Dr Scot Hislop, said so himself. He often related proudly a story of an exchange student from America who told him after taking the final exam a few semesters ago: "That was the hardest f*cking exam I've ever taken."

What does the professor do to apply the bell curve grading system? Basically, he draws grade boundaries. Look at the following diagram to see an example of this.

Essentially, grade boundaries are imaginary lines denoting the scores within which one must fall to get a certain grade. In our fictional JS1101E, the grade boundary for B+ is 34 to 38 and 6 students fell into this range. This means that if I were to have taken JS1101E in our fictional world, I would have scored between 34 and 38 marks in total, falling into the blue area in the diagram above.

In other words, even though I scored less than half of the total marks available, I still got a pretty decent grade. Although the exact figures I used in this illustration are made-up, the mechanism is true to real life and it is entirely plausible that I could indeed have failed in terms of absolute score but passed thanks to the bell curve. That's why it's often said that the bell curve takes into account cohort performance when determining grades. Even if the entire cohort scores less than 50%, not everyone will fail. The top scorers will still be duly rewarded with the coveted A+, those slightly further down will be awarded A's and A-'s, and so on. This is the situation that people sometimes describe as "the bell curve shifting to the left".

Contrast this to NM2203, an easy module. Everyone loves easy modules and everyone does well in absolute terms. The raw scores are high, and as a result, "the bell curve shifts to the right". It literally does. Compare the diagram below with those above and you'll see that all the little people are now gathered on the right side rather than the left.

In our fictional NM2203, let's say the scores range between 68 and 91, with most people scoring in the high 70s and low 80s. The professor might draw the grade boundaries as follows.

To get B+ in our fictional world, I must have scored between 82 and 84, as demarcated by the blue box.

In other words, even though my final grade for the two modules was the same, they mean quite different things. I had to score a lot higher to qualify for a B+ in NM2203, and would have had to score higher still to get an A. Moral of the story: Easy modules aren't necessarily easy to excel in! It's that much harder to be outstanding when everyone else is doing the same.

One thing to note: In my hypothetical examples, I only drew 64 students and arranged them in perfect symmetry. This caused the grade boundaries for A+ and D to contain only one student each. In reality, multiple students may get A+ and D, especially for modules with more than 100 students.

How are grade boundaries determined?

So we've seen that when professors "curve the grades", what they do is decide where the grade boundaries go, and then look at who falls within the various categories and give out the corresponding grades.

But what is the art or science behind that decision? The truth is: there isn't a strict procedure.

In NUS, there is a recommended grade distribution that advises professors how many A's, B's, C's, and D's they should be giving out. The actual distribution is secret but there is a hypothetical one given in the blog post by the Provost.

It's likely that professors adhere to this recommended distribution quite closely when drawing the grade boundaries. For example, if the recommended distribution says that no more than 25% of students shall get A-range grades (i.e., A+, A, and A-), and a professor has 200 students in his module, he might give A+'s to the top 10 scorers, A's to the next 15, and A-'s to the following 25 for a total of 50 students, or 25% of the total enrolment, with A-range grades. So setting the grade boundaries for the module would simply be an exercise in ranking the students in order of their scores, using the grade distribution to award the grades in the manner demonstrated above, then noting the highest and lowest score that was awarded a certain grade such as B-. Note that in the example, more students are given A-'s than A's, and more students are given A's than A+'s. This follows the bell curve pattern: A+ is right at the far right side of the curve where there are fewer people, while A- is closer to the middle of the curve where the bulk of the people are massed.

But the recommended grade distribution is exactly that: a recommendation. It is not a hard-and-fast rule. Professors are free to use their judgement to adjust the grade boundaries accordingly, as long as they can justify their actions to the university management.

Press F to pay respects

A last point I wanted to make is about failing. This isn't in the Provost blog. It's something I've learned directly from lecturers and seniors, fleshed out by a bit of common sense.

You will notice that I didn't include F grades in all my examples above. This was a deliberate decision. In NUS, it's almost impossible to get an F grade unless you're the kind of student who really deserves it: you don't turn up for class, you don't submit your assignments, and you don't turn up for exams. Lecturers have to consciously decide to give someone an F, and they are expected to produce documented justifications written in black and white when they do so. That means they have more work to do, which they will obviously try to avoid as far as possible.

So you need not be afraid of the letter F appearing on your transcript if you put in effort to your studies. What you do need to fear is the D grade. If you don't have a good understanding of the module content, get help quickly because otherwise it's entirely plausible that a D grade is in your near future. Lecturers don't have to write justifications for D grades and it's left up to the bell curve to decide who gets the D's. That's right: someone, or a few someones, always gets D. Don't let yourself get on the wrong end of the bell curve, which is the left side when you draw it out as per the diagrams above, because then you'll be that someone.

And D's are treated like failures in NUS. When you apply the Satisfactory/Unsatisfactory (S/U) option to a D grade, it gets converted to a U grade, meaning Unsatisfactory, rather than an S for Satisfactory. This means the modular credits from the module you got a D in will not be counted towards your graduation requirements. Getting a D will also immediately disqualify you from most special programmes such as overseas exchanges.

3 comments:

DOE Query Department14 June 2019 at 17:50
Hi, can I ask for your permission to reproduce this article on our Singapore education portal http://www.domainofexperts.com? Explicit mention shall be made about it having first appeared on your site Jonathan's Junkyard, and Jonathan Tiong cited as the original author. Hope to hear from you again! 🙂
DOE Query Department15 June 2019 at 11:59
Most grateful to you for granting permission! :)

Jonathan's Junkyard

Pages

Monday, 10 June 2019

A tale of two bell curves

What's in a B+?

How are grade boundaries determined?

Press F to pay respects

Further reading

3 comments: