Stagewise Reset MLP Ensemble

A practical Kaggle experiment using staged neuron resets and freezing to build diverse MLP ensembles for tabular prediction tasks.

6/13/20262 min read

Stagewise Reset MLP Ensemble

This is a short overview of the idea behind my Stagewise Reset MLP Ensemble. I will keep the post brief, since the full implementation is available in the Kaggle notebooks linked below.

I have been experimenting with a lightweight MLP ensemble approach for tabular prediction tasks using staged neuron resets and freezing.

The idea is to train an MLP in several stages: freeze part of the hidden representation, reset the remaining neurons, and let the network relearn around what it has already captured. I then train several of these models and average them as an ensemble.

The ensemble diversity comes from more than just different random initializations. Each model can use different seeds, learning rates, dropout values, and freeze/reset plans, which makes the individual MLPs follow slightly different training paths.

There are several small variants of this idea. For example, each stage can be validated separately, and if a reset/freeze step does not improve the validation score, the model can roll back to the best previous checkpoint. You can also vary how many neurons are frozen, which layers are reset, how long each stage trains, and whether to add a short full-network finetune at the end.

I have tried this method on many different datasets and have seen strong results, especially as a practical way to get more out of simple MLP ensembles on tabular problems.

My intuition is that the gains are not only from standard ensemble variance reduction. The stagewise reset/freeze process also seems to make the individual models stronger.

One way to think about it is as a small boosting-like mechanism inside the neural network: keep part of what the model has learned, reset another part, and force it to focus on patterns it may have missed before. That could help reduce bias, while the final ensemble still adds robustness through averaging.

I also tested a related Parent-Child MLP Ensemble variant, where child models inherit parts of a trained parent network while resetting and retraining the remaining neurons. I think this inheritance idea opens up some interesting directions, since it gives another way to reuse useful representations while still creating diverse child models.

Stagewise Reset MLP Ensemble: https://www.kaggle.com/code/ern711/parent-child-mlp-ensemble
Parent-Child MLP Ensemble: https://www.kaggle.com/code/ern711/parent-child-mlp-ensemble