2011 A/L Results and the Z-Score methodJanuary 19, 2012, 7:59 pm
By Dr. S. Arivalzahan
In recent days much attention has been focused on the Z-score calculation for the 2011 A/L results. This article examines the 2011 A/L Z-score calculation fiasco and the need for further research in developing a more appropriate scaling method.
Since year 2000, the Z-score has been used as the scaling method for ranking students at the G.C.E. (A/L) examinations for university admission. The Z-score is considered a better scaling method than the previous use of aggregated marks for comparing student performance in different subject combinations. For a particular subject the Z-score is being calculated using the formula, Z = (raw marks-mean marks)/Standard Deviation of marks.
In the above formula, mean is a measure of location and standard deviation is a measure of dispersion.
In the year 2011, two different G.C.E. (A/L) examinations were conducted for old and new syllabuses. While the repeat candidates sat for the old syllabus examination, fresh candidates sat for the new. Consequently for a particular subject, the Department of Examinations had two different sets of marks, one for the old and new syllabuses. Thus, when there was a need to calculate the Z-score to rank and enlist both candidates to find a common cut out for University admissions the Department of Examination was in a dilemma.
The interesting point is that in year 2000 the Z-Score was introduced by Prof. R.O. Thattil as a tool to solve such a problem. Therefore, it should not be a problem for the Department of Examinations, and for a particular subject, they should have considered the two sets of marks separately and calculated the Z-score for each examination separately. Then as usual the average of the Z-scores of the three subjects of a particular student could have been used for the ranking purpose.
Another argument that has been put forward here is that since the population of repeat candidates is smaller compared with the population of fresh candidates, and the repeat candidates are filtered students (not qualified in one or more A/L examinations), therefore, treating the repeat examination marks separately might give unnecessary benefits to repeat candidates. This same problem will arise during G.C.E. (A/L) 2012 examinations too. There is also a set of candidates who will sit for the exams as repeat candidates and the number of this student population is going to be again much smaller.
Though the Department of Examinations have not yet revealed the method they used to calculate the Z-score in the last A/L examinations, Prof. Thattil in his article has mentioned that for the 2011 A/L examinations, the means and variances of the two different examination marks have been pooled for the calculation of the Z-Score of a particular subject. In his article he has given the equations which were used to obtain the pooled mean and variance.
Let us consider the same pooling problem in a more convenient scenario. Suppose a person (say A) has 80 Canadian dollars and 70 British pounds and another person (say B) has 75 Australian dollars and 65 Euros. Suppose we want to compare the wealth of person A and B. Then in order to measure the person A’s wealth we usually convert Canadian dollars to US $ and then convert British pounds to US$ separately. Instead of doing this will we pool (add) the number of Canadian dollars and the number of British pounds together and then convert that amount to US$ (using an average exchange rate of Canadian dollar and British pound)? Every one knows that such pooling is wrong in the above case. Similarly, two different examination marks should also be considered as pertaining to two different populations. Therefore, it is obviously invalid to pool the parameters of two different examinations for the calculation of Z-Score. Prof. Thattil in his recent article clearly illustrated the above problem with a numerical example.
Therefore, if the Department of Examinations wants to use the Z-Score as a scaling method, they should not pool the means and variances of the different examinations. If the Department of Examinations feels it appropriate to pool the means and variances of the different examinations they should use some other scaling methods (not the Z-Score) for ranking purpose.
There is no perfect scaling method available and Z-Score is a widely accepted scaling method. However, there might be some drawbacks in the Z-Score method. Therefore, further research is needed in finding a better scaling method. Let us examine this in detail.
For the calculation of Z-score, we do not need to assume any particular probability distribution for the raw marks of a particular subject. Mean is a good measure of location and standard deviation is a good measure of dispersion for symmetric distributions. However, for skewed (non-symmetric) distributions mean is no longer a good measure of location and standard deviation is not a good measure of dispersion either. Therefore, we have to be careful in using Z-score for scaling, when the raw marks follow any non-symmetric distribution.
For non-symmetric distributions, Median (which is the 50th percentile) is the better measure of location, and Inter Quartile Deviation (IQD) is a better measure of dispersion than standard deviation. Inter Quartile deviation is the half of the difference between the 75th and 25th percentiles.
We could define a new scaling method, Median Centered Score (MCS) as, MCS = (raw marks – median marks)/IQD of the marks. The above MCS is robust to extreme values, as median and IQD are less sensitive to extreme values compared with mean and standard deviation respectively. However, MCS is yet to be validated using some real world data set. Moreover, further research is needed in developing a scaling method for non-symmetric distributions.
The Writer is President of the Jaffna University Science Teachers’ Association
What’s Sri Lanka’s best overseas Test win?
Last Updated May 22 2013 | 10:58 pm