Decoding the Machine: Feature Importance in Google Earth Engine Classifiers
In complex geospatial machine learning, knowing that a model works is rarely enough; we need to know why. When using popular supervised classifiers like Random Forest (smileRandomForest) in Google Earth Engine (GEE), the "Feature Importance" metric provides a window into the model's decision-making process. It quantifies the contribution of each input variable—be it NDVI, elevation, or specific spectral bands—toward reducing uncertainty during the classification. As we move into 2026, where multi-sensor data fusion is the norm, leveraging importance scores is the primary method for "Dimensionality Reduction," allowing analysts to strip away redundant data and focus on the signals that truly define land cover patterns.
Table of Content
- Purpose: Beyond the Black Box
- The Methodology: Mean Decrease in Impurity
- Step-by-Step: Extracting Importance Scores in GEE
- Use Case: Sentinel-2 Band Selection for Crop Mapping
- Best Results: Improving Model Generalization
- FAQ
- Disclaimer
Purpose
Calculating feature importance in GEE serves three critical technical goals:
- Model Interpretability: Validating that the model is using logically sound physical variables (e.g., Water Vapor bands shouldn't be the top predictor for urban sprawl).
- Feature Selection: Removing "noisy" or highly correlated bands to prevent overfitting and reduce computational costs.
- Scientific Insight: Identifying which bio-physical properties (like the Red Edge in vegetation) are the strongest discriminators for specific classes.
The Methodology: Mean Decrease in Impurity
The primary method used by the Earth Engine smileRandomForest and smileGradientTreeBoost is the Gini Importance (also known as Mean Decrease in Impurity).
As each decision tree in the forest is built, the algorithm looks for the variable that best splits the training data into homogeneous groups. Every time a specific feature is used to split a node, the "impurity" (Gini index) of that node decreases. The classifier sums these decreases across all trees in the forest and normalizes them. The higher the score, the more that specific feature helped in creating "pure" classification categories.
Step-by-Step: Extracting Importance Scores in GEE
1. Train the Classifier
First, initialize your Random Forest model. You must ensure you are using the ee.Classifier.smileRandomForest (or similar) to access the explainability tools.
var classifier = ee.Classifier.smileRandomForest(100)
.train({
features: trainingData,
classProperty: 'landcover',
inputProperties: ['B2', 'B3', 'B4', 'B8', 'NDVI', 'elevation']
});
2. Call the explain() Function
Earth Engine stores the metadata of the trained model in a dictionary accessible via the explain() method.
var dict = classifier.explain();
print('Model Explanation:', dict);
3. Extract and Plot Importance
The importance values are stored under the key importance. You can convert this to an ee.FeatureCollection to create a visual chart.
var variableImportance = ee.Dictionary(dict.get('importance'));
var chart = ui.Chart.array.values({
array: variableImportance.values(),
axis: 0,
xLabels: variableImportance.keys()
}).setChartType('ColumnChart')
.setOptions({
title: 'Random Forest Feature Importance',
vAxis: {title: 'Importance Score'},
hAxis: {title: 'Spectral Bands / Indices'}
});
print(chart);
Use Case: Sentinel-2 Band Selection for Crop Mapping
An analyst is trying to distinguish between Soybeans and Corn in a diverse agricultural landscape.
- The Challenge: Using all 12 Sentinel-2 bands plus 5 indices makes the script run slowly and causes memory errors.
- The Action: The analyst runs a preliminary classification on a small sample and checks
variableImportance. - The Result: They discover that B11 (SWIR) and B8A (Narrow NIR) have scores 10x higher than the visible Blue band (B2).
- The Solution: They retrain the model using only the top 5 features, resulting in a 40% faster execution time with no loss in Overall Accuracy.
Best Results
| Feature Score | Interpretation | Recommended Action |
|---|---|---|
| High Score | Primary Driver | Keep and prioritize for future multi-temporal stacks. |
| Near-Zero Score | Noise / Redundant | Remove to simplify the model and prevent overfitting. |
| Uniform Scores | Feature Correlation | High correlation between inputs; consider PCA or removing one. |
FAQ
Does a high importance score mean high accuracy?
Not necessarily. Importance scores only tell you which variables the model used most effectively. If your training data is biased, the model might highly value a feature that is actually a proxy for that bias rather than a real-world signal.
Can I use this for the CART classifier?
Yes, ee.Classifier.smileCart also supports the explain() method, but the results are based on a single tree rather than an ensemble, making the scores less robust than Random Forest.
Why do my importance scores change every time I run the script?
Random Forest is stochastic—it uses random subsets of data. Unless you set a seed in the classifier arguments, the scores will fluctuate slightly with each run.
Disclaimer
Gini Importance can be biased toward continuous variables or features with many unique values (high cardinality). In cases where features have vastly different scales or types, consider validating importance through "Permutation Importance" outside of GEE for a more rigorous statistical check. March 2026.
Tags: GoogleEarthEngine, Random_Forest, Feature_Importance, Machine_Learning_GIS