Running psychological experiments on a Raspberry Pi with OpenSesame // Cogsci

I’m typing this blog on a Raspberry Pi, a £25 / €30 / $40 mini computer that is literally the size of a credit card.

The Pi is an adorable machine (if you’re into that kind of stuff): Just a small printboard with connectors for a monitor, mouse, keyboard, and an ethernet cable. The system boots from an SD card, so there is no hard disk. There is a choice of Linux-based operating systems, the most commonly used being Raspbian, a Debian spin-off that has been optimized for the Pi. This is also what I installed. And in case you’re wondering: I didn’t open the Pi up – It just doesn’t come with a casing!

Because the Pi is extremely cheap, some people have wondered whether it could be used to equip low-budget psychology labs. This is also how I came into the possession of this diminutive cutie: It’s a gift from Clayton, who wondered how well OpenSesame would fare on the Pi. Thanks Clayton!

If you promise to keep reading, I’ll give you the answer now: Moderately well, with a few caveats.

Snappiness, OpenGL, and v-sync

The operating system on the Pi does not support OpenGL, which is the library used for hardware-accelerated graphics by OpenSesame (or actually by Expyriment and PsychoPy, which are used by OpenSesame). This is a shame, because the Pi comes with a decent graphics unit, which is now essentially unused. However, OpenSesame’s non-hardware-accelerated back-end (legacy, which is PyGame based) works just fine.

Performance-wise, the Pi is pretty snappy. You can run most of the example experiments without any noticeable lag, and even forms (for questionnaires, text input etc.), which can be slow, run smoothly. I wouldn’t recommend developing your experiments on a Pi (although you can), but as a runtime environment it’s fast enough for most purposes. So that’s the good news. The bad news is that I haven’t been able to get the Pi to synchronize with the vertical refresh of the monitor (v-sync). A little background, in case you’re not familiar with the term ‘v-sync’: Monitors are refreshed line-by-line from the top down (video). If you start drawing a new image while the refresh cycle is in the middle of the monitor, there is a moment during which the upper part of the monitor shows the old image, whereas the lower part of the monitor shows the new image. Even though a refresh cycle is very brief (16.7 ms on a typical 60 Hz monitor), under some conditions you can see this in the form of horizontal lines running through the image, a phenomenon called ‘tearing’. Tearing can be prevented by drawing a new image only at the moment that the refresh cycle starts from the top, but the video driver needs to support this. Vision scientists consider tearing a bad thing, because it makes it difficult to present visual stimuli in a clean way. So its unfortunate that the Pi suffers from it, at least in my test configuration.

Some benchmarks!

I ran a number of benchmarks, using the photodiode in my Boks (the OpenSesame response device that is currently under development). By presenting a white display, and recording the response time of the photodiode to the white display, you can estimate how accurate the display timestamps are. Or, phrased more simply, how well OpenSesame knows when things appear on the monitor. The idea is that the photodiode responds immediately. So if the response time is, say, 10 ms, this means that a display appeared only 10 ms after it was timestamped. On good systems, this doesn’t happen, and the error is in the order of a millisecond.

There is a photodiode integrated in the Boks (on the bottom, not visible), which I used to benchmark the Pi.

So how about the Pi?

I ran two series of tests¹. (See References below for a Figshare link to the data.) One while holding the photodiode to the top-left of the monitor, another while holding it to the bottom-right. Because monitors are refreshed from the top-down, as explained above, the ideal response time is 0 ms when the photodiode is held to the top-left of the monitor, and 16.7 ms (1 refresh duration on my monitor) when the photodiode is held to the bottom-right. Both durations should have an extremely small variation. In contrast, when v-sync is completely lacking, you should get a mean response time of half a refresh duration (= 8.3 ms on my monitor), irrespective of where the photodiode is held, and a fairly large standard deviation.

The actual results, which you can see below, are somewhat in between. Apparently, the Pi tries to get v-sync straight, but doesn’t always succeed. The reported response times are biased from the ideal (indicated by the colored lines) towards the 8.3 ms v-sync-less mean.

The results of 40 tests with 50 measurements each. I did multiple small tests, to see if there were systematic differences from one test to another, which is what I initially thought. However, on closer inspection this doesn’t appear to be the case. I ran tests with double-buffering on as well as off. This also doesn’t appear to make a difference. Error bars indicate standard deviations.

When we look at the measurement distribution, we see something interesting: There is a huge peak of low response times (in the 0 - 1 ms range) when the photodiode is held to the top-left of the screen. When the photodiode is held to the bottom-right, there is no such peak, which should have emerged around 16 - 17 ms. This is odd, because you would expect the distributions to be shifted relative to one another, but similar in shape. I think this means that the Pi sometimes timestamps the displays too late, which is somewhat a-typical (usually the error is such that the display timestamps are too early). Because negative response times are not possible in this test, these errors are all reported as very low response times. There’s also a hint of bi-modality in the distribution for the top-left condition (blue), but not in the bottom-right condition (orange). I’m not sure what this means.

The distribution of photodiode response times when the diode is held top the top-left (blue, M = 5.85 ms) and the bottom-right (orange, M = 12.46 ms) of the monitor.

So the temporal precision that the Pi offers in this configuration is modest. However, I should stress that I didn’t observe any notable outliers, and the temporal jitter is generally less than one refresh duration. Contrary to what most experimental psychologists think, this precision is sufficient for manual response time studies (see e.g. Damian, 2011). The fact that v-sync doesn’t work properly is more of an issue, although in many situations it’s probably acceptable.

My own opinion: If I were to set up a low-budget lab, I wouldn’t have any problem with running my behavioral studies on a Pi, since this type of temporal jitter has a negligible impact on statistical power. For things that require a higher degree of temporal precision, such as ERP and gaze-contingent studies, I would probably invest in a pricier, more accurate system.

Is Android the future?

Because the Pi runs on an ARM processor, just like most tablets and phones, I was hoping to install Android on it. Unfortunately, according to the Razdroid project, the best Pi-compatible Android build is ‘very slow, barely usable’. This didn’t inspire much confidence, so I haven’t explored this option further.

This is a shame, because now that the OpenSesame runtime for Android is available, running experiments on Android is a serious option. Jarik, a technician from the VU University, conducted some benchmarks on the Nexus 7, which suggested – somewhat to my surprise – that the temporal precision of the OpenSesame runtime for Android is in the millisecond range, on par with desktop computers. Pending further testing and assuming that these benchmarks generalize to other Android devices, I think that Android-based devices might be(come) viable tools for conducting experiments. And I’m not just talking about tablet-based studies. Also traditional, cubicle-style experiments could be done with Android devices. The benefit is obvious: Android devices are cheap, easy to maintain for an IT department, and don’t come with the usual licensing restrictions. A device that comes to mind is for example the MK802IIIS Android USB dongle, which you can get for about €80,-².

The future. It’s exiting!

References

Damian, M. F. (2010). Does variability in human performance outweigh imprecision in response devices such as computer keyboards. Behavior Research Methods, 42(1), 205-211. doi:10.3758/BRM.42.1.205

Mathôt, S. (2013). Raspberry Pi benchmark data. http://dx.doi.org/10.6084/m9.figshare.663652

Using a Rasperry Pi model B and a 23” HP 2311x flatscreen monitor. The CPU of the Raspberry was set to 1 Ghz, and the cpu scaling governor was set to ‘performance’.↩
This is a random suggestion, and not based on any research or experience.↩