ERP Boot Camp Tip: General Hints for Processing Data
/EEG/ERP data are noisy and complicated, and it's easy to make mistakes or miss problems. Here are some hints for avoiding common problems that arise in EEG/ERP data collection and processing.
Start by running one subject and then doing a fairly complete analysis of that subject's data. You will likely find some kind of problem (e.g., a problem with the event codes) that you need to fix before you run any more subjects. Make sure you check the number of trials in each condition to make sure that it exactly matches what you expect. Also, make sure you check the behavioral data, and not just the ERPs. If you collect data from multiple subjects before doing a complete analysis, there's about a 50% chance that you will find a problem that requires that you throw out all of the data that you've collected, which will make you very sad. Do not skip this step!
Once you verify that everything in your task, data collection procedures, and analysis scripts is working correctly, you can start collecting data from multiple additional subjects. However, you should do a preliminary analysis of each subject's data within 48 hours of collecting the data (i.e., up to and including the point of plotting the averaged ERP waveforms). This allows you to detect a problem (e.g., a malfunctioning electrode) before you collect data from a bunch of subjects with the same problem. This is especially important if you are not the one collecting the data and are therefore not present to notice problems during the actual recording session.
The first time you process the data from a given subject, don't do it with a script! Instead, process the data "by hand" (using a GUI) so that you can make sure that everything is OK with the subject's data. There are many things that can go wrong, and this is your chance to find problems. The most important things to look at are: the raw EEG before any processing, the EEG data after artifact detection, the time course and scalp distribution of any ICA components being excluded, the number of trials rejected in each condition, and the averaged ERP waveforms. We recommend that you set artifact rejection parameters individually for each subject, because different people can have very different artifacts. One size does not fit all. (In a between-subjects design, the person setting the parameters should be blind to group membership to avoid biasing the results.) These parameters can then be saved in an Excel file for future use and for reporting in journal articles.
If you need to re-analyze your data (e.g., with a different epoch length), it's much faster to do this with a script. Your script can read in any subject-specific parameters from the Excel file. Also, it's easy to make a mistake when you do the initial analysis "by hand," so re-analyzing everyone with a script prior to statistical analysis is a good idea. However, it is easy to make mistakes in scripting as well, so it's important to check the results of every step of processing in your script for accuracy. It can also be helpful, especially if you are new to scripting, to have another researcher look through your data processing procedures to check for accuracy.
Bottom line: Scripts are extremely useful for reanalyzing data, but they should not be used for the initial analysis. Also, don't just borrow someone else's script and apply it to your data. If you don't fully understand every step of a script (including the rationale for the parameters), don't use the script.