Model metrics
Evaluation results of the stacked ensemble and its base learners on the held-out test set, plus the dataset and class distribution it was trained on.
Ensemble (stacked) — headline
Weighted F1
0.6211
Macro F1
0.2688
Accuracy
70.85%
Per-model comparison
| Model | Weighted F1 | Macro F1 | Accuracy |
|---|---|---|---|
| Ensemble (Stacked) ★ | 0.6211 | 0.2688 | 70.85% |
| Random Forest | 0.6329 | 0.3019 | 66.30% |
| Gradient Boosting | 0.6244 | 0.2748 | 70.20% |
| XGBoost | 0.5869 | 0.2902 | 55.20% |
Class distribution
| Class | Owner range | Tier | Samples |
|---|---|---|---|
| 0 | ≤10K | Common Indie | 4,500 |
| 1 | 35K | Niche | 2,200 |
| 2 | 75K | Growing | 1,500 |
| 3 | 150K | Established | 1,000 |
| 4 | 350K | Popular | 504 |
| 5 | ≥750K | Breakout Hit | 296 |
Metric definitions
Weighted F1
Harmonic mean of precision and recall, weighted by class support. Ranges 0–1; higher is better.
Macro F1
Unweighted average F1 across all classes. Useful for evaluating performance on minority classes.
Accuracy
Percentage of correctly classified games. A simple overall performance metric.
About the ensemble. The stacked model combines Random Forest, Gradient Boosting, and XGBoost via an XGBoost meta-learner — leveraging the complementary strengths of each base model.