The Blog.

The advantages and challenges of web-based experiments in behavioral sciences

Web experiments have a few significant advantages that should be considered before discussing their shortcomings. Namely, they are excellent for increasing reach and maximizing the participant pool’s size and diversity. Well-designed web experiments have the potential to reach nearly anyone with online access and so utilizing web experiments, one can expect very large and diverse samples, which can open doors to addressing hypotheses that simply couldn’t be tested in a lab environment. But as always, this major advantage has to be weighted and considered in light of the shortcomings and challenges of setting up and distributing web-based assessment systems. The shortcomings of web-based systems are the most dire when it comes to accurate timing of stimulus presentation and response logging. The reader is gently reminded that choosing research methodology, as always, depends on the nature of the research question. The jury is still out when it comes to the timing accuracy of web experiments.

The challenges and current issues with web experiments can be roughly split into a few groups by their source. As the first source, we will discuss people – the participants as well as the developers. The second source is the experimental environment with its reduced control, and the final significant source of challenges is the user’s system with its software and hardware considerations.

WHO – participants and developers

The problems in web-based research when it comes to participants is that the pool is self-selected. This has more significant implications to some type of research than to other, and might even be desirable if the experimenter wishes to gather a large pool of very particular participants in their ‘native’ environment – if the research is explicitly targeted at web users. Another problem is that the pool is relatively uncontrollable; it is difficult to absolutely exclude the possibility of multiple participation, although there is research indicating that this doesn’t contribute significant variance to the data because it is rather unlikely and can be controlled to some extent by simply asking if the participant has taken part in the study before.

Drop-outs represent the other side of the issue. Anyone who is logging the activity on the first sections of their study will notice that a huge proportion of participants halt their participation. Some of them might return, others might not, but one has to wonder, whether this pool of participants is somehow systematically different from those who decided to carry on until the rest of the experimental battery. The final population taking part in the experiment might represent a very particular section of the population, and one has to consider whether or not this is acceptable, taking into account their research problem, and whether it can be considered a fair return for the increased reach and sample size.

Currently, when it comes to web-based experiments, the developers also create some challenges. One of the problems I discovered was that while there were a couple promising platforms for creating web-experiments, their active development and support usually stops after a couple of years. It could be that people simply lose interest in maintaining these projects, or that the system is initially created and maintained by someone with a passing interest, e.g. a graduate student, who then finishes and moves on with their career. A number of competing commercial systems also exists, which might discourage developers from maintaining their systems – although it is arguable whether these commercial systems are really any better – and finally, development often relies too much on one person or a very small team of people and so if one of them leaves, it is difficult or even impossible for others to pick up the reigns. There is rarely a community behind the web-experiment platform, in contrast to e.g. PEBL and PsychoPy, both of which are platforms for creating offline computerized experiments, with large and active communities behind them, which really shows in their active development and maintenance, attractive lists of features and ready-made tests and experiments.

WHERE – the experimental environment

A major shortcoming of web-experiments is reduced experimental control. Users all participate at their own locations. As an addition to unknown environmental noise, distractions, unrepresentative characteristics of users and unknown psychophysical user states, they all have different operating systems, hardware, software and system loading. You never really know what’s happening on the participant’s side, so you can’t really control for the obscuring variance it may introduce. A ton of research and optimization has gone into the effort of minimizing all of these factors in a laboratory environment, and we simply cannot make use of any of those things if we run web-based experiments. If the laboratory environment can be scrutinized, polished and tested until it is relatively noise-free, the web-experiment environment can only go with the assumption that noise is hopefully distributed in a way that will make it cancel itself out once the participant pool is sufficiently large and diverse. The fundamental problem in this assumption is that the noise can be non-random and contribute systematic differences. Depending on the research problem, it can inflate either alpha or beta error probability. In the typical scenario, the random noise would drown a weak signal the experimenter would be looking for, inflating the beta error probability. The opposite is also possible. For instance, I’m working on a project where I am building a hierarchical linear model of video game teams’ performance from individual players’ cognitive profiles. If I were to only focus on reaction times, it could be that players with better hardware would get better scores on the web experiments because of system speed and responsiveness, and the same players could fare better at video games for the same reason. I would be tempted to make the conclusion that these individuals are better players because of better cognitive processing speed, whereas the effect is actually there due to hardware differences and I would have discovered an effect of cognition where there was none.

Many of the variables contributing to reduced experimental control fall under the scope of the next category of technical considerations and will be discussed there.

WHAT AND HOW – the software and hardware considerations

With web-based experiments, all of the participants work on the tasks through their own computers, which means that the data may be confounded by any number of combinations of different hardware layouts, operating systems and software. Hardware considerations involve processor types and clock speeds, input and output devices with screen refresh rates and keyboard and mouse latency. Hardware considerations are the most important when it comes to reliably timing stimulus presentation and logging reaction times, or the time when the user gives their response to the experimental stimulus. Output devices are important also when it comes to presenting the stimulus in the same way for all the participants. For instance, for tasks utilizing color vision, the color output should be identical for all participants, and for many visual tasks, screen sizes might be a confounding factor, even if the stimulus itself is of the same size but occupies a smaller proportion of the screen. When it comes to input, different mouse types and keyboard layouts could be a psychomotorically confounding factor since giving a simple response might be more cumbersome for some participants than for others.

On the software side, participants have different versions of different operating systems and use different browsers, which also make for some rather interesting interactions. For instance, a number of web experiment solutions so far have been written in Flash, which will create problems for participants using Linux. Operating system and browser constraints not only confound the data but need to be taken into account as something that reduces the reach of the study. If the experiment doesn’t run on the user’s system, they will be prevented from participating altogether. Web-experiments also suffer from system load, especially with users multitasking with weaker hardware, so users should be prompted to close other programs before participating. As always, compliance is obviously not guaranteed.

Finally, in contrast to offline computerized experiments, web experiments require some consideration for backend architecture. Typically, the same architecture is used for storing stimuli and results. This creates some extra work and complications for the experimenter and reinforces the question of security.

Don’t despair…

After reading this far, it might seem that web experiments are simply not something you should bother your head with, but even with all their challenges, they might offer you access to a participant pool that would be absolutely out of the scope of a study utilizing lab experiments. The key is to keep it simple, minimize noise and try to capitalize on timing accuracy, and to thoroughly consider whether web experiments are suitable for your research problem before committing to anything.