REASON OUTPREDICTS CLAIRVOYANCE

Dutch skeptics, in cooperation with the Netherlands' Association for Parapsychology, NVP, and the daily newspaper De Telegraaf, have carried out a nation-wide test, named "Predict the 1995 News". The first results are now available; what follows is an informal account of the main conclusions, a formal report is in preparation.

In December 1994, the paper's readers were invited to judge the probable outcome of 25 given predictions concerning news items in the coming year, to indicate what made them think as they did, and to venture three free predictions themselves. When 1995 had passed and the outcome was known, the participants received 10 points for each false prediction to which they had assigned 0-20% probability, 9 for 20-40%, 7 for 40-60%, 4 for 60-80% and 0 for 80-100%, and the reverse for each right prediction. (It can be shown that this "fair score" statistically yields a maximum reward when the probabilities are judged correctly.) Thus one could collect up to 250 points; it turned out that the highest score was 231, the lowest 51 and the average 172. By coincidence the median, 175, was just what the participant collected who placed all 25 predictions in the 40-60% interval.

As to the grounds for their judgement, the participants could indicate to which of six possibilities: chance, reason, intuition, clairvoyance, astrology or means like charts and pendulums, they attached no, minor or major significance. The aim of the test was to find out which, if any, difference these sources of knowledge would make with respect to the quality of their estimates and predictions .

Each score was distributed over the six sources by a weighting scheme in which "major" significance counted as three times "minor" while each response received the same total weight. The average weights and the weighted means of the scores are listed in table 1.

			   TABLE I

source    total   chance  reason   in-      clair-  astro-  cards/  para-
                                 tuition   voyance   logy  pendulum normal

weight,%   100      14      48      30       4.5     0.5     3.5     8.5
score      172     172     176     169     163     165     164     164
In view of poor statistics, the last three categories should not be considered by themselves. However, when aggregated into one "paranormal" category, they produced a score of 164 and this is indeed significantly below the average with its probable error of 0.8.

The differences become even more apparent when the weight factors are calculated as a function of the scores. Table 2 lists the averages found in quarters with ascending scores.

		TABLE 2

quarter chance      reason    intuition       paranormal
.....|..........|..........|.............|.............

1       0.14        0.39          0.35            0.12

2       0.13        0.44          0.31            0.11

3       0.15        0.51          0.28            0.05

4       0.12        0.56          0.27            0.06
.........................................................
total   0.14        0.48          0.30            0.09
Altogether, the 1080 participants produced 2368 free predictions, the largest category (14%) of which concerned Dutch royalty, followed by wars, politics, sports, weather and disasters. After eliminating trivial, impossible and ambiguous statements, the NVP investigators could classify most of the remaining as being based either on reasonable estimates and extrapolations or on emotional factors like the desire to see one's favorite soccer team win a championship. Approximately one third came true; these, however, could not qualify as showing evidence of paranormal sources of knowledge by containing details that would be impredictable on reasonable grounds. And of those that could have so qualified, none turned out to be quite right. Spectacular events like the Kobe earthquake, the murder of Rabin and the detailed course of events in Bosnia were not foreseen by anyone.

We are satisfied that this test has yielded an unambiguous outcome. In addition, it drew the attention of the newspaper's readers to the subject, enabled them to test themselves and confronted them with the need to carry out statistical analyses. We shall be pleased to exchange experiences with colleagues in other countries who find themselves in a position to launch a similar project. Please, contact us for further details.

Some comments by "Skepsis" on the project "Predict the 1995 News"

To carry out a study like this by themselves would have been impossible for the research groups from Skepsis and the NVP, who operated essentially without financial and human assistance. Cooperation with the newspaper gave us an opportunity to take advantage of its access to the public and of its competence and facilities. In addition, De Telegraaf had the answers put into a digital file.

The success of this venture depended on a mutual understanding between the cooperating parties of their - potentially conflicting - interests, in particular with respect to the contents and the timing of publications of progress and results. The rights and responsibilities of the three parties were laid down in an agreement that dealt in particular with matters of intellectual ownership and rules for publication. Thus, science and journalism worked together in a satisfactory manner.

The scoring system mentioned above perhaps deserves some comment. The scheme may be understood as a fair betting rule: a participant is certain to receive 7 points if he rates a prediction at 40-60%; if the probability of a positive outcome is less than 40% or more than 60%, a gain of two points or a loss of three constitutes a fair reward for those choosing the appropriate percentage range, and if the probability is less than 20% or more than 80%, one further point against a possible loss of four is fair to those who judged accordingly. One arrives at this scoring rule by analysing the requirement that the statistically expected reward is highest if the true probability is in the selected range. When the test was announced, it was stated only that the method to be used would not reward gambling. Moreover, no competitive element was introduced and no prizes were offered. An analysis of the answers nevertheless leaves the investigators with the impression that not all participants have resisted the temptation to make a gamble now and then. All but one of the 25 given predictions received the greatest number of votes for either the lowest or the highest classification. It might have been advisable, therefore, to make the scoring table known beforehand.

Instead of scoring from 0 to 10 points, one can introduce negative scores. Subtracting 7 points flat, one arrives at numbers from -7 to +3 and at total scores ranging from -175 to +75. Certainly, a difference between +l for reason and -11 for the paranormal looks more spectacular than between 176 and 164; the question is, what is easier for the general public to understand.

The request to classify various sources of knowledge as of no, minor or major relevance, saddles the investigators with the task to translate into numbers what the participant meant. In some rare cases, the way in which this was done resulted in anomalies like a weight factor 1 for a category marked as being of minor importance against weight factors 0.2 for five categories deemed to be of major importance. Perhaps, the participants should be requested to distribute, say, 10 points over the six categories themselves. Alternatively, one could ask to indicate the source of knowledge for each of the 25 ratings and 3 predictions.

As a further remark, we note that the average expectation of the respondents was that nine or ten predictions would come true. The outcome was four, perhaps favoring those who are sceptically inclined. It would be interesting to see if a less improbable set of predictions yields the same correlation between score and tribute to reason as the one we found.

We had secretly built in three predictions whose probabilities were evident to those used to thinking in such terms. Those who stated that they relied on reasonable considerations did indeed on average score highest on our ratonality scale (correlation factor 0.16). Moreover, this hidden rationality test correlated even more strongly with the overall scores than did the participants' claim to be guided by reason (0.47 against 0.23).

One can only guess how many people would have responded if the test had been presented as a competition. As it turned out, the 1080 usable answers received from some 900 000 newspaper copies circulated were enough to provide decent statistics without overloading the computing power accessible to the group. A much larger response would have made it necessary to gain access to a larger facility, preferably capable of running SPSS although in our case Excel turned out to be adequate.

Skepsis wishes to encourage colleagues in other countries to undertake similar projects, and will be happy to exchange experiences. A full report (in Dutch) will appear in a few months.