Gender in assessment

Hilary Whitehouse argues that, if we look closely at the way subjects are written and assessed, we will see that there is no such thing as neutrality. The research is showing the gender in assessment.

About the author
Hilary Whitehouse is a science and environmental studies teacher and a research officer for gender equity projects with the Senior Secondary Assessment Board of South Australia. She lives in Townsville.

Gender in assessment

Assessment is the tail that wags the dog of curriculum.
~ Teacher comment recorded by Vieth (1991)

Jan Harding (1991) has written that the interaction of assessment and gender is one of the most important and neglected areas of education. No gender analysis of curriculum can be considered complete without a corresponding investigation of the impact of assessment.

This is particularly so in the senior secondary school, where the assessment requirements of a course syllabus can be of overriding concern for teachers and students [Vieth 1991]. Research interest in the performance of the two sexes in different types and constructions of assessment is growing in Australasia. The subject of this article is the research into gender and public assessment at Year 12 level conducted by the Senior Secondary Assessment Board of South Australia (SSABSA) between 1990 and 1993.

Background issues
SSABSA is a statutory authority responsible for overseeing curriculum and assessment practices in senior secondary education in South Australia. The board collects a large amount of data annually for the purposes of Year 12 assessment and certification. This database provided the basis for a quantitative analysis of student performance across a range of subject areas and assessment methods. The subjects investigated were Physics, Chemistry, Biology, Geology, Geography, English, Economics, Mathematics 1 and 2, and Mathematics IS, over six years from 1986 to 1991. Research focused on the performance of female and male students sitting for examinations and school assessments, which determined the access to further education.

SSABSA also hosted the national Gender Equity in Senior Secondary School Assessment Project—or ESSSA as it is colloquially known—which is funded by the federal Department of Employment Education and Training. The ESSSA project explored the participation and performance of females and males in a number of selected subjects across all States in Australia (where data was available).

One outcome of both the South Australian and the national project was that the work highlighted how little we really know about the process of assessment, particularly within schools. Much work is needed if research is to more fully inform assessment policies within education systems.

Some of the South Australian research findings could have been predicted, but many were unexpected and challenged some prevailing mythologies. Assumptions which often underpin the traditional interpretation of data in education measurement were also challenged.

In much of the literature on sex differences in assessment item performance, when females are shown to do worse than males, then it is considered a valid proposition to suggest that females are essentially less able than males to do the particular tasks (computational tasks in Mathematics, for example, or spatial tasks in Geography). In a number of past research papers, females have been regarded as ‘deficient’, rather than the assessment items as being problematic. (Specific examples are available from the author.) It is my observation that if males are shown to perform more poorly than females, male ‘deficiency’ is never suggested as contributing to that outcome; instead, the problem is always with the test items or assessment tasks.

The paradigms used to interpret education measurement studies on sex differences can be gendered and it always pays to scrutinise conclusions drawn from any study data. In education, many women and girls have no personal experience of being ‘less able’ than males and it is usually not women who suggest that lesser female ability is a valid interpretation of any data which shows a sex difference in assessment outcomes to the disadvantage of females. Underlying the South Australian research is a consistent philosophy that upholds the capacity of females to achieve. If Year 12 assessment cannot be seen to test the knowledge, skills and experiences of both sexes adequately, then the validity of the test item should be questioned, not the characteristics of the candidate population.

Data was subject to a number of statistical analyses. Statistical significance is different to educational significance: the latter is a matter for educational opinion, which is often divided. For example, the South Australian research showed that statistically significant differences in examination performance occur consistently within some subjects: boys did better than girls in Mathematics, Economics, Geography and Chemistry examinations; girls did better in English and, recently, Physics examinations. The reasons why such differences occur are not well understood and need to be further researched. Debate continues on the meaning of the examination findings, the nature of assessment at Year 12, and the assessed curriculum.

Stobart et al. (1992) point out that the way in which a subject is assessed is likely to permeate the way it is organised and taught. For students, assessment outcomes are a result of interactions between the given tasks, their experiences and their expectations. If these are different for large groups of females and males, then it is not unexpected that examinations assess the sexes unequally, particularly as most Year 12 curricula cannot presently be characterised as gender-inclusive [Collins 1990].

For teachers, assessment is really a matter of held beliefs as to what does constitute a subject and what should be asked of students in assessing their performance within that subject. It is not uncommon for assessors to be gender blind; that is, unaware that various assessment styles have different outcomes for females and males and that changing assessment methods can significantly change the pattern of results.

If assessment is to be equitable, then there has to be congruence between what the curriculum purports to be and what it is, as defined by the assessment demands made on students. Congruence needs to exist on many levels. If the curriculum places emphasis on process, then the major assessment should not primarily test recall. For women educators, congruence means that moves towards gender-inclusive curricula should include the development of gender-inclusive assessment strategies. What constitutes gender-inclusive assessment still needs to be defined. The function of research is to inform that process.

Research findings
In South Australia there are two types of Year 12 courses available: wholly school-assessed courses, known as SAS; and subjects in which the final assessment is composed of a school-assessed component and a public-examination component, known as PES. Thus it was possible to compare student performance in the different types of courses. With the exception of English, it was found that the female candidature of a subject in any one year always performed better in the school assessment than they did in the public examinations. Conversely, the male candidature always performed better in the examinations. Rennie and Parker (1991) noted a similar pattern in Western Australia.

In subjects such as Mathematics, Economics and Geography, where the shift between examination performance and school assessment was quite marked, it can be assumed from a reading of the syllabuses that the content assessed within the school setting and in the examination was very similar. What was it about those examinations that enhanced the performance outcomes for males as a group? Was the difference due to the context and construction of the assessment? If we assume that school-assessed marks are a realistic appraisal of student learning (and we have to assume this or else call into question the professionalism of the whole Year 12 teaching force), then why didn’t the examinations equally reflect that learning for both sexes?

Only in PES English do we find girls significantly outperforming boys in all aspects of Year 12 assessment, school-based and examination. In the SAS English subject, which had the largest enrolment in South Australia by 1991, the difference in grade distributions was highly significant with chi-square values in excess of 300. Females consistently received double the proportion of A and B grades to males. The ESSSA project demonstrated that females do significantly better than males in Year 12 English in all States in Australia.

Why males as a group do so poorly in English compared with their female counterparts, and compared with their performances in other subjects, is a matter for speculation. It may be that what is assessed in English reflects more what females are expected to be good at: for instance, examining the relationships between concepts. It may be a constellation of factors not yet well identified. A large amount of male failure may be related to not completing work required in the case of school assessment. While much gender equity work has focused on girls’ participation and performance in mathematics and science, the nationally poorer performance of boys as a group in the educationally valued subject, English, may mean it’s time to turn some attention to the situation of boys in the arts and humanities.

Girls and boys do perform differently in different types of examination questions. Males consistently perform statistically significantly better than females on multiple-choice questions across a range of subjects. However, because multiple choice items represent quite a cost saving in terms of marking effort, especially in subjects with large student enrolments, it can be questioned whether such a finding is in fact educationally significant for examination setters with budget limitations.

Teachers should be aware that girls find it more difficult than boys to isolate a ‘right’ answer. Boys, as a group, find it easier to judge a problem in isolation and tend to treat the context of an activity as idiosyncratic [Murphy 1989]. Care needs to be taken in setting multiple-choice items to ensure that each response is distinguishable from all the others in all perceived contexts, remembering that it is likely the female students will be paying close attention to context effects and can be undecided on a choice of answers if the intention of the question or each answer is unclear, or both.

No sex differences were observed for short-response items on the basis of question type alone. Where significant differences were observed these were linked to context and content effects. The same findings are true for extended-response questions.

Many teachers believe that girls do better than boys on essay questions. The research showed that this is not true in all cases, and factors other than the question type influence outcomes in the essay sections of examinations.

All essay questions are not alike and models differ between subjects. In Biology, a subject with an enrolment of 65% females, boys did much better in the essays, which required a highly structured factual response uninhibited by the need to provide an introduction or conclusion. In fact, this was actively discouraged by the marking system. In contrast, in Physics, a subject with an enrolment of 30% females, girls significantly outperformed boys on the essays which required the stylistic elements of an introduction, well developed argument and conclusion. Marks were awarded for the logical discussion of ideas and appropriate selection of content. Questions in Physics were more open-ended, in many ways resembling more the style of English essay questions, on which girls are known to perform much better. Pure reiteration of learned facts were not valued in Physics responses in contrast to what was marked favourably in Biology. The marking emphasis given to papers is suggested as a contributing factor to gender outcomes in summative assessment.

Evidence is emerging that examination construction plays a large role in how females perform overall. How an examination is constructed depends on a number of factors such as historical precedent, the assessment focus of the syllabus, how the subject is interpreted, the philosophy of the people responsible for writing the examination, logistical constraints such as the marking effort needed, and teacher acceptance of examination elements.

Changes in Physics
An interesting case study emerged in Physics in South Australia where a chief examiner instigated a number of quantifiable changes to the paper in keeping with her philosophy of learning in science. Before her tenure, boys and girls performed equivalently on the examination despite the girls being an apparently able and select population of students. The chief examiner believed that Physics examinations should demand more of a candidate than a demonstration of mathematical skills and reduced the amount of calculation questions from two-thirds to one-third of the paper. She directed that questions elicit more detailed explanations of phenomena, placing greater emphasis on literacy and graphics skills. Candidates were expected to demonstrate a relational understanding of syllabus topics in order to score highly. She also changed the marking schemes for essay questions. By 1991 these changes had the effect of significantly improving girls’ performance in all sections of the examination over that of the boys.

Question-by-question analysis of subject examinations showed that a large number of individual items were biased towards one sex or the other. This was an unexpected finding because we commonly expect set questions to be neutral. Unfortunately there is little guidance in the literature on how to avoid gender bias within tests and examinations beyond those provided on the use of non-sexist language. The ESSSA project did initiate work on analysing the language of questions which were shown to advantage one sex over the other. Preliminary findings suggest this is a fruitful avenue of research to pursue.

The National Action Plan for the Education of Girls 1993-1997 states that curriculum reform “requires a fundamental reworking of what knowledge is valued within the curriculum, how that knowledge is made available and how it is taught”. How this knowledge is assessed needs to be added to the reform requirements.

Research has revealed that assessment cannot be considered to be a value-free activity. The findings call into question the notion that student ability is fixed and objectively determinable through public assessment. It has been demonstrated that the performances of large groups of students, divided on the basis of sex, can change when alterations are made to the construction of the assessment instrument and to the methods of assessment [Whitehouse and Sullivan 1992].

There is hard data that assessment can be gendered. Attention to assessment methods should be a part of any discussions on curriculum reform. As Murphy (1989) points out, unless we tackle the complex issue of gender and assessment, a question mark will remain over the assessment process and its validity.


  1. Harding, L., “Can assessment be gender fair?”. Paper delivered at seminar to the School Programs Division of the Ministry of Education and Training, Victoria, 23 August 1991.
  2. Murphy, P. “Gender and Assessment”, Curriculum, v.93 (1989), 165-171.
  3. Rennie, L. and Parker, L., “Assessment learning in science: the need to look closely at item characteristics”, Australian Science Teachers Journal, v.37(4) (1991), 56-59.
  4. Stobart, G., Elwood, J. and Quinlan, M., “Gender bias in examinations: how equal are the opportunities?”, British Educational Research Journal, 18(3) (1992), 261-276.
  5. Vieth, E., “Dilemmas in theory and practice: a case study of English teachers and school-based curriculum development and assessment”, English in Australia, v.98 (1991), pp. 67-79.
  6. Whitehouse, H. and Sullivan, M., Girls and Year 12 Science Examinations, Senior Secondary Assessment Board of South Australia, Adelaide, 1992.