To permute

TO PERMUTE FULL

Permutation tests for studying classifier performance.

On using permutation tests to estimate the classification significance of functional magnetic resonance imaging data. This is quite a simplified simulation I think it would be prudent to check null distributions and relabeling schemes in actual use, since non-random label samplings may turn up.

But since the resulting null distributions are so similar, I can't say that permuting just one set of labels or the other is really worse, much less invalid. centered above chance if only mostly-matching relabelings were used).Ĭomments Intuitively, I prefer the permute-both scheme: more permutations are possible, and the strong correlation is absent. The null distributions would be skewed if the labels for the permutations were not chosen at random (e.g. This makes sense, since the relabelings are chosen at random, so relabelings quite similar and quite dissimilar to the true labeling are included. When all labels are permuted there isn't much of a relationship.ĭespite the strong correlation, the null distributions resulting from each permutation scheme are quite similar (density plot overlap graph). There is a strong linear relationship between the number of labels changed in a permutation and its accuracy when either the training or testing set labels alone are shuffled: the more the relabeling resembles the true data labels, the better the accuracy. These graphs are complicated, but enlarge if you click on them. Histograms of each variable appear along the axes. I plotted the classification accuracy of each permutation against the proportion of labels matching in the permutation and calculated the correlation. When both training and testing labels are permuted this is an average over the two cross-validation folds. The accuracy of each dataset is given in the plot titles, and by a reddish dotted line.įor each permutation I recorded both the accuracy and the proportion of labels matching between that permutation and the real labels. not permuted) was 0.695, ranging from 0.55 to 0.775 over the 10 repetitions. This is a moderately difficult classification: the average accuracy of the true-labeled data (i.e. This was simply for convenience, when coding, but restricts the number of possibilities another example of how the same idea can be implemented multiple ways. Then the classifier was trained on the 2nd run of the real data, and the SAME permuted label scheme applied to the 1st run and the classifier tested.). the classifier was trained on the 1st run of the real data, then the permuted label scheme was applied to the 2nd run and the classifier tested. I coded it up such that the same relabeling was used for each of the cross-validation folds when only the training or testing data labels were permuted (e.g. I ran the simulation 10 times (10 different datasets), with the same dataset used for each permutation scheme.ġ500 label permutations of each sort (training-only, testing-only, both) were run, chosen at random from all those possible. I classified with a linear svm, c=1, partitioning on the runs (so 2-fold cross-validation). I used R email me for a copy of the code. I made the data by sampling from a normal distribution for each class, standard deviation 1, mean 0.15 for one class and -0.15 for the other, 50 voxels.

TO PERMUTE FULL

These simulations use a simple mvpa-like dataset for one person, two classes (chance is 0.5), 10 examples of each class in each run, two runs, and full balance (same number of trials of each class in each run, no missings). I put together this little simulation to explore one part of the effect of the choice: what do the null distributions look like under each label permutation scheme? The short answer is that the null distributions look quite similar (normal and centered on chance), but there is a strong relationship between the proportion of labels permuted and accuracy when only the test or training set labels are permuted, but not when both are permuted. Which permutation scheme is best? As usual, I doubt there is a single, all-purpose answer. This reminds me of the situation with searchlight shape: many different implementations of the same idea are possible, and we really need to be more specific when we report results: often papers don't specify the scheme they used. Note that I'm considering class label permutations only, "Test 1" in the parlance of Ojala 2010. Which labels should be permuted for a permutation test of a single person's classification accuracy? A quick look found examples of MVPA methods papers using all three possibilities: relabel training set only (Pereira 2011), relabel testing set only (Al-Rawi 2012), or relabel both training and testing sets (Stelzer 2012).