ERP Boot Camp Tip: Comparing conditions with different numbers of trials

A common question in ERP research is whether it is legitimate to compare conditions in which different numbers of trials were averaged together (e.g., error trials versus correct trials in an ERN study; oddballs versus standards in an oddball or MMN study).  It turns out that the answer depends on how you're measuring the ERP components.  In a nutshell: if you're measuring mean amplitude, then it's not a problem to compare conditions with different numbers of trials; if you are measuring peak amplitude, then it is a problem.

An extended discussion of this issue can be found in this document. Here, we provide a brief summary.

The figure below shows a clean ERP waveform and the same ERP waveform with noise added. Note that the peak amplitude is higher in the noisy waveform.  This exemplifies a general principle: All else being equal, the peak voltage will be greater in a noisier waveform than in a cleaner waveform.  This is why it is not legitimate to compare waveforms with different numbers of trials (and therefore different noise levels) when using peak amplitude.  The usual solution to this problem is to create an averaged ERP waveform using a subsample of trials from the condition with more trials, equating the number of trials in the averages.  However, it is almost always better to stop using peak amplitude and instead use mean amplitude to quantify the amplitude of the component (see Chapter 9 in An Introduction to the Event-Related Potential Technique for a list of reasons why mean amplitude is almost always superior to peak amplitude).

different numbers of trials.jpg

Mean amplitude (e.g., the average voltage between 300 and 500 ms) is not biased by the noise level.  That is, the mean amplitude will be more variable if the data are noisier, but it is not consistently pushed toward a larger value.  So, you might have more subject-to-subject variability in a condition with fewer trials, but most statistical techniques are robust to modest differences in variance, and this variability will not induce an artificial difference in means between your groups.  There is no need to subsample from the condition with more trials when you are using mean amplitude.  You are just throwing away statistical power if you do this.

Bottom line: In almost every case, the best way to deal with the "problem" of different numbers of trials per condition is to do nothing at all, except make sure you're using mean amplitude to quantify the amplitude.