Two years ago I wrote a post demonstrating Python pre-processing of EEG data using Python chunks in an RMarkdown document. This worked great. But something I mentioned then was that there was a package called reticulate that would allow more direct interfacing with Python in R. That package has been under a lot of development since then, as has RStudio. Plots produced in Python chunks can now be embedded in RMarkdown.
Event-related potentials are one of the simplest ways of representing event-locked EEG data. Imagine a very simple visual experiment in which participants have to respond to pictures of objects. You set up the experiment so that an event trigger is sent to your EEG recording system at the time the picture flashes up screen. Then, afterwards, you extract epochs around those triggers to get a bunch of seperate epochs, each time-locked to the onset of the stimulus.
In the last post, I showed how you can get the EEG data from EEGLAB .set files saved as Matlab v7.3 files, but that there are some limitations on what else you can get from them beyond the data itself. Specifically, you can’t extract channel locations, and there are no labels to tell you which channels the data is from. This is due a limitation of the available tools for reading HDF5 files, which is the actual format of Matlab v7.
Like a lot of people, I’ve been using EEGLAB and Fieldtrip for years and have a lot of data already processed using those packages. It can be a bit annoying getting the data from them - in the past I’ve converted the data to text/csv files, which is ok as far as it goes. It’s a bit of a faff getting them in the right format, and EEGLAB’s in-built export function drops useful info like epoch numbers and event codes etc.
As mentioned in my last post, I’ve been working on a package for EEG analysis in R called eegUtils. I’d mostly been focusing on relatively simple visualization tools: topographical plots - ERP Visualization: Creating topographical scalp maps: part 1. But one thing was really bugging me - how the data gets into R in the first place. Sure, it’s nice pre-processing data in other packages - EEGLAB or MNE-Python - and then transferring the processed data across to R.
A few months ago I wrote a post about how there isn’t really a killer EEG analysis package for R, and that many of the things you typically want to do are not really implemented yet. So I’ve started to implement several functions myself and incorporate them into my own package, currently called eegUtils. I’ll maybe come up with a catchier name at some point before I get to the stage of trying to get it on CRAN.
In the last post I loaded in some data from a BDF file and showed how to re-reference and high-pass filter it. Normally when running an EEG data we’ll have periods of interest that we want to extract from the recording, typically using triggers sent via a parallel cable or similar. Trigger events A Biosemi system can record up to 16 trigger inputs, representing the resulting numbers as 16-bit (i.e from 0-65535 coded in binary format).
At my recent workshop, several people asked about pre-processing of data in R, and I told them it wasn’t really possible at the moment. What I might have been less clear about is that it is not that is not possible, but that there isn’t really the wide range of pre-programmed tools as in Python and Matlab. So it’s just not practical to do it all in R as it stands.
As mentioned in my last post, an issue doing EEG analysis in R at the moment is that there’s a distinct lack of tools in R for a lot of the typical processing steps. In the past I’ve done a lot of processing in Matlab (specifically with EEGLAB and Fieldtrip) and shifted things over to R for statistics. But all is not lost. For example, with the following code, I can run a bunch of preprocessing, including automatic artefact rejection, and have nice ERPs in R in the blink of an eye!
An issue doing EEG analysis in R at the moment is that the tools just don’t exist to do a lot of the typical processing steps. It’s an extraordinarily complicated thing to produce working packages that cover even a few of the possible ways to analyse EEG data. The makers of tools like EEGLAB, Fieldtrip, and MNE have been doing it a long time, and not on their own. Essentially, there just isn’t a big community of EEG R users to develop and support dedicated packages at the moment.
In my previous post on plotting topographies in R, ERP Visualization: Creating topographical scalp maps: part 1, I was aiming for maximum comparability with EEGLAB defaults. That meant I used the ‘jet’ colour map, which is what I’m most used to using. You may have noticed that there was no default jet colour map - I had to define one manually. While jet produces nice, punchy looking images, there are a heap of problems associated with it.
As well as ERPs or time-frequency plots from individual channels, it’s always useful to see topographical maps of our data. It’s a nice way to see what’s going on across the whole head, showing us whether effects are broadly or narrowly distributed across the whole scalp. So now I’m going to show you how to do topographical plots in R. I want to first of all thank alexforrance and Harold Cavendish over on Stack Overflow for being the source of much of the code I’ve adapted here.
In an earlier post I took a look at visualizing ERPs from two conditions at a single electrode. This time I’m going to look at three conditions. As in the previous post, I’ll assume a basic familiarity with ERPs. First I’ll load in the full dataset, which contains ERPs for all conditions for all subjects, and whip it into shape. library(ggplot2) library(tidyverse) library(afex) library(Rmisc) library(magrittr) levCatGAall <- read_csv( "https://raw.githubusercontent.com/craddm/ExploringERPs/master/levCatGAall.csv", col_names = c("Object.
Shiny app updated! In my last post unleashed the Shiny app I’d knocked up in a few hours to do some basic display of different confidence interval types and difference waves. I’ve been hacking away at it on and off and I’ve now added some exciting new features! You can now try loading up your own data. You’ll need a .csv file with the following structure: No header Comma-separated values Each row should be one time-point, one subject, columns should be “condition1”, “condition2”, “Time”, “Subject” Here’s the first few lines of the example data I include (note this is already after import, so it’s stripped the commas between values).
Shiny app In an unusual fit of enthusiasm, I decided to have to go at writing a little app in Shiny, a simple programming framework to make web-based apps using R. So, as usual, all programmed using RStudio, the devs who also make Shiny and various fantastic R packages such as dplyr and ggplot2. It turned out to be pretty simple. I’m planning to add various additional functions as I get time to work on my blog posts, like allowing people to use their own data, for example.
As I mentioned in a previous post, between-subject confidence intervals/standard errors are not necessarily all that useful when your data is within-subjects. What you’re interested in is the not really the between-subject variability but the variability of the differences between your conditions within subjects. I’m going to use here the command summarySEwithin from the package Rmisc. This removes between-subject variability for within-subject variables, returning corrected standard deviations, standard errors, and confidence intervals.
Running statistical tests using “purrr” Something which puzzled me for a while was how to efficiently perform running (i.e. timepoint-by-timepoint) statistical tests on ERP/EEG in R. That was solved for me when I discovered the purrr package, another of ggplot2 author Hadley Wickham’s projects. Using the split command, you can easily split a data frame into multiple frames by one of its variables. In the EEG/ERP case, that means I can easily split the data into separate data frames for each timepoint and run my test of choice on each timepoint independently using the map command.
ERP visualization is harder than people think. Often people take the path of least resistance, plotting grand average ERP data as simple traces representing condition means, with no information regarding variability around these means. There are a couple of variations on this simple theme which show regions of significance, but it’s extremely rare to show anything else. A new editorial letter by Rousselet, Foxe, and Bolam in the European Journal of Neuroscience offers some useful guidelines, and Ana Todorovic’s recent post on adding scatterplots to time-series data is also great.