Report No SR4166/1



Dec 1996


The initiative by the UK regulatory agencies (Environment Agency, Scottish Environment Protection Agency and Department of the Environment, Northern Ireland) to control certain toxic effluent discharges by direct toxicity assessment places great emphasis on the performance of aquatic toxicity tests. This project was undertaken to assess the performance specifically the precision - of ecotoxicity tests to be used in the Toxicity-Based Licensing (TBL) Programme and to recommend ways in which the quality of test data, and regulatory decisions made on the basis of those data, can be maintained or improved.

Like all biological and chemical measurements, determinations of toxicity are liable to both random and non-random error which, in turn, can introduce uncertainty into decisions made on the basis of test results. Therefore, the precision of data arising from toxicity tests has important implications for the derivation of toxicity-based limits (TBLs) and subsequent compliance monitoring. This report describes the results of a major UK ring-test to determine the precision of test data arising from four 'priority' test methods: Microtox , Daphnia magna 48h immobilisation, Acartia tonsa 48h lethality and Pacific oyster (Crassostrea gigas) 24 h embryo-larval development) when they were exposed to two reference toxicants: 3,4-dichloroaniline (3,4-DCA) and zinc sulphate.

The ring-test showed that the variability of the test methods is sufficiently large to warrant being taken into account when making regulatory decisions based on ecotoxicity test data. In addition, laboratories which may have a problem with poor repeatability or possible bias, and laboratories achieving good precision which may therefore be regarded as achieving 'best practice', were also identified. It has not, however, been possible to convincingly identify the causes of variability in the ring-tested methods.

The test methods adopted the following rank order of diminishing precision: Microtox > Daphnia > Acartia = OEL. This rank order was independent of the toxicant used but results with zinc sulphate were consistently more variable than those for 3,4-DCA. The greater precision of the Microtox test is probably associated with the highly prescriptive procedure used which has the effect of minimising variability between operators, batches of test organisms and environmental conditions. There appears to be little to choose between the two marine invertebrate tests evaluated (oyster (OEL) and Acartia) in terms of their precision, although it was clear that within-test variability made a greater contribution to the variability of the oyster (OEL) test than the Acartia test. This suggests that variability of OEL tests may be more readily reduced by measures to reduce or compensate for random error, e.g. by promoting 'best practice' and increasing replication.

Because participation in the ring-test was not selective, it seems likely that the test laboratories involved did not represent a homogeneous group in terms of their experience. This probably had the effect of overestimating the variability of the 'higher organism' test methods compared to the variability that would be obtained by experienced participants only.

It was possible to partition the observed variability into within-test, within-laboratory and between-laboratory sources of error. This analysis enabled criteria for the repeatability and reproducibility of these aquatic toxicity test methods to be deve loped, which could be used in a Quality Control Scheme aimed at constraining their variability, as described below. This is a generic procedure and may be applied to any test method for which sufficient data are available on variability of toxicity estimates from repeated tests in different laboratories.

The undesirable effects of variability in toxicity tests can be minimised in two ways:

  1. by making allowances for variability in the decision-making process, especially when monitoring compliance with a TBL;
  2. by constraining variability through the use of externally imposed limits on variability ('Quality Control' criteria).

We recommend that both approaches be adopted and describe procedures that might be employed. These include (a) simple statistical procedures for controlling the incidence of false positive or false negative conclusions in regulatory decision-making and (b) a Quality Control scheme to monitor, assess and recommend remedial action to improve variability, based on a comparison of results obtained from repeated toxicity tests with a reference toxicant with the Quality Control criteria

The use of limit ('pass/fail') tests for monitoring compliance with TBLs warrants particular attention. These are simpler and cheaper to operate than conventional concentration-response tests and, in addition, simple procedures are available for taking variability into account when judging whether or not an effluent sample has complied with a TBL. It is also possible to incorporate into the Quality Control scheme described above, measures to constrain the variability of limit tests.

Finally, factors which influence the estimation of toxicity have been investigated, some of which have led to additional recommendations about the statistical procedures used to analyse test data and also changes to some of the test protocols.


Precision, toxicity, accuracy, within-test, within-laboratory, between-laboratory, repeatability, reproducibility, Quality Control, ring-test, performance, REML,. Residual Maximum Likelihood

Copies of the report are available from FWR, price 30.00, less 20% to FWR Members.