Performance raises to the level of its training. Data is not exempt from this rule.
When running data experiments, split your data into three sets:
- Training set
- Cross-validation set
- Test set
You train the model with the first set. You cross-validate with the second set. You test with the third.
A split of 80%/10%/10% is a good place to start.
Credit: Will Henry