Kaggle has held a competition called Predicting a Biological Response before, which has expired but still can be joined for practice. I’ve written some random forest code which gives us a rank of around 169 on the final leaderboard.

What is interesting is the approach via PCA yields an extremely poor results (rank 6xx) which I have not yet understood why. Could it be due to PCA unable to cope with non-linearity?

https://github.com/log0/predicting_a_biological_response