Current Issues and Concerns in Language Testing

Theories have advanced but practices in testing still lag behind. The shift in perception from views which emphasized reading as a collection of specific skills to views of reading as a total process with skills interrelated and individual strategies effectively directing learning greatly affect research and practice.

Both research and practice are increasingly emphasizing cognitive processing, meaning-making, activation and use of prior knowledge, levels of questioning in comprehension, active response to what is read, and similar dimensions of cognition. Yet assessment of reading performance remains mired at the skills level. Increasingly we seem to be trying to assess process-oriented learning with product-oriented measures (Squire, 1987).

Valencia & Pearson (1987) share some issues and concerns in reading assessment in relation to the current reading theories:

A major contribution of recent research has been to articulate a strategic view of the process of reading (e.g., Collins, Brown, and Larking, 1980; Pearson and Spiro, 1980). This view emphasizes the active role of readers as they use print clues to “construct” a model of the text’s meaning.

It deemphasizes the notion that progress toward expert reading is the aggregation of component skills. Instead, it suggests that at all levels of sophistication, from kindergarten to research scientist, readers use available resources (e.g., text, prior knowledge, environmental clues, and potential helpers) to make sense of the text.

Reading assessment has not kept pace with advances in reading theory, research, or practice (Valencia and Pearson, 1987). The time has come to change the way we assess reading. The advances of the last 15-20 years in our knowledge of basic reading processes have begun to impact instructional research (Pearson, 1985) and are beginning to find a home in instructional materials and classroom practice (Pearson, 1986).

Yet the tests used to monitor the abilities of individual students and to make policy decisions have remained remarkably impervious to advances in reading research (Farr and Carey, 1986; Johnston in press; Pearson and Dunning, 1985).

What has happened, of course, is that with reading conceptualized as the mastery of small, separate enabling skills, there has been a great temptation to operationalize “skilled reading” as an aggregation – not even an integration – of all these skills; “instruction” becomes operationalized as opportunities for students to practice these discrete skills on worksheets, workbook pages, and Ditto sheets.

As long as reading research and instructional innovations are based upon one view of the reading process while reading assessment instruments as based upon a contradictory point of view, we will nurture tension and confusion among those charged with the dual responsibility of instructional improvement and monitoring student achievement.

Current practices in reading assessment which run contrary to the current reading theories pose some hidden dangers.

One danger lies in a false sense of security if we equate skilled reading with high scores on our current reading tests. A close inspection of the tasks involved in these tests would cast doubt upon any such conclusion.

A second danger stems from the potential insensitivity of current tests to changes in instruction motivated by strategic view of reading.

A third danger is that given the strong influence of assessment on curriculum, we are likely to see little change in instruction without an overhaul in tests. Conscientious teachers want their students to succeed on reading tests; not surprisingly, they look to tests as guides for instruction. In the best tradition of schooling, they teach to the test, directly or indirectly. Tests that model an inappropriate concept of skilled reading will foster inappropriate instruction.

A fourth danger stems from the aura of objectivity associated with published tests and the corollary taint of subjectivity associated with informal assessment. For whatever reasons, teachers are taught that the data from either standardized or basal tests are somehow more trustworthy than the data that they collect each day as a part of teaching. The price we pay for such a lesson is high; it reduces the likelihood that teachers will use their own data for their own decision making.

We live in a time of contradictions. The speed and impressiveness of technological advance suggest an era of great certainty and confidence. Yet at the same time current social theories undermine our certainties, and have engendered a profound questioning of existing assumptions about the self and its social construction. Aspects of these contradictory trends also define important points of change in language testing.

Rapid developments in computer technology have had a major impact on test delivery. Already, many important national and international language tests, including TOEFL, are moving to cornputer based testing (CBT).

Stimulus texts and prompts are presented not in examination booklets but on the screen, with candidates being required to key in their responses. The advent of CBT has not necessarily involved any change in the test content.

The use of computers for the delivery of test materials raises questions of validity, as we might expect. For example, different levels of familiarity with computers will affect people’s performance with them, and interaction with the computer may be a stressful experience for some.

Attempts are usually made to reduce the impact of prior experience by the provision of an extensive tutorial on relevant skills as part of the test (that is, before the test proper begins). Nevertheless, the question about the impact of computer delivery still remains.

While computers represent the most rapid point of technological change, other less complex technologies, which have been in use for some time, have led to similar validity questions.

Tape recorders can be used in the administration of speaking tests. Candidates are presented with a prompt on tape, and are asked to respond as if they were talking to a person, the response being recorded on tape. This performance is then scored from the tape. Such a test is called a semi-direct test of speaking, as compared with a direct test format such as a live face-to-face interview. But not everybody likes speaking to tapes! We all know the difficulty many people experience in leaving messages on answering machines. Most test-takers prefer a direct rather than a semi- direct format if given the formats.

The speed of technological advances affecting language testing sometimes gives an impression of a field confidently moving ahead, notwithstanding the issues of validity raised above. But concomitantly the change in perspective from the individual to the social nature of test performance has provoked something of an intellectual crisis in the field.

Developments in discourse analysis and pragmatics have revealed the essential interactivity of all communication. This is especially clear in relation to the assessment of speaking. The problem is that of isolating the contribution of a single individual (the candidate) in a joint communicative activity. As soon as you try to test use (as opposed to usage) you cannot confine yourself to the single individual. So whose performance are we assessing?

The paradigm shift in language acquisition theory causes language  instruction to focus on the learner so as to understand the relationships among three variables – knowledge, thinking, and behavior. Current research in testing argues for a more direct connection between teaching and testing. The same kinds of activities designed for classroom interaction can serve as valid testing formats, with instruction and evaluation more closely integrated.

References and Other Readings:

  • Alderson, J. Charles, et al. 1995. Language test construction and evaluation.  Cambridge:  Cambridge University Press.
  • Brown, H. Douglas. Language assessment: principles and classroom practices.  New York:  Pearson education, Inc.
  • Hughes, Arthur.   Testing for language teachers.  Cambridge:  Cambridge University Press.
  • Lynch, Brian K. 2003. Language assessment and programme evaluation. Edinburgh: Edinburgh University Press.
  • McNamara, Tim.   Measuring  second language performance.  Essex:  Addison Longman Ltd.
  • 2000.  Language testing.  Oxford:  Oxford University Press.
  • Weir, Cyril J. 2005. Language testing and evaluation. New York: PALGRAVE Macmillan.
  • 1993. Understanding and developing language tests.  Hertfordshire: Prentice Hall International (UK) Ltd.
  • 1988. Communicative language testing.  Hertfordshire:  Prentice Hall International (UK) Ltd.