Decoding the Machine: Feature Importance in Google Earth Engine Classifiers

In complex geospatial machine learning, knowing that a model works is rarely enough; we need to know why. When using popular supervised classifiers like Random Forest (smileRandomForest) in Google Earth Engine (GEE), the "Feature Importance" metric provides a window into the model's decision-making process. It quantifies the contribution of each input variable—be it NDVI, elevation, or specific spectral bands—toward reducing uncertainty during the classification. As we move into 2026, where multi-sensor data fusion is the norm, leveraging importance scores is the primary method for "Dimensionality Reduction," allowing analysts to strip away redundant data and focus on the signals that truly define land cover patterns.

Table of Content

Purpose: Beyond the Black Box
The Methodology: Mean Decrease in Impurity
Step-by-Step: Extracting Importance Scores in GEE
Use Case: Sentinel-2 Band Selection for Crop Mapping
Best Results: Improving Model Generalization
FAQ
Disclaimer

Purpose

Calculating feature importance in GEE serves three critical technical goals:

Model Interpretability: Validating that the model is using logically sound physical variables (e.g., Water Vapor bands shouldn't be the top predictor for urban sprawl).
Feature Selection: Removing "noisy" or highly correlated bands to prevent overfitting and reduce computational costs.
Scientific Insight: Identifying which bio-physical properties (like the Red Edge in vegetation) are the strongest discriminators for specific classes.

The Methodology: Mean Decrease in Impurity

The primary method used by the Earth Engine smileRandomForest and smileGradientTreeBoost is the Gini Importance (also known as Mean Decrease in Impurity).

As each decision tree in the forest is built, the algorithm looks for the variable that best splits the training data into homogeneous groups. Every time a specific feature is used to split a node, the "impurity" (Gini index) of that node decreases. The classifier sums these decreases across all trees in the forest and normalizes them. The higher the score, the more that specific feature helped in creating "pure" classification categories.

Step-by-Step: Extracting Importance Scores in GEE

1. Train the Classifier

First, initialize your Random Forest model. You must ensure you are using the ee.Classifier.smileRandomForest (or similar) to access the explainability tools.

var classifier = ee.Classifier.smileRandomForest(100)
    .train({
      features: trainingData,
      classProperty: 'landcover',
      inputProperties: ['B2', 'B3', 'B4', 'B8', 'NDVI', 'elevation']
    });

2. Call the explain() Function

Earth Engine stores the metadata of the trained model in a dictionary accessible via the explain() method.

var dict = classifier.explain();
    print('Model Explanation:', dict);

3. Extract and Plot Importance

The importance values are stored under the key importance. You can convert this to an ee.FeatureCollection to create a visual chart.

var variableImportance = ee.Dictionary(dict.get('importance'));

    var chart = ui.Chart.array.values({
      array: variableImportance.values(),
      axis: 0,
      xLabels: variableImportance.keys()
    }).setChartType('ColumnChart')
      .setOptions({
        title: 'Random Forest Feature Importance',
        vAxis: {title: 'Importance Score'},
        hAxis: {title: 'Spectral Bands / Indices'}
      });

    print(chart);

Use Case: Sentinel-2 Band Selection for Crop Mapping

An analyst is trying to distinguish between Soybeans and Corn in a diverse agricultural landscape.

The Challenge: Using all 12 Sentinel-2 bands plus 5 indices makes the script run slowly and causes memory errors.
The Action: The analyst runs a preliminary classification on a small sample and checks variableImportance.
The Result: They discover that B11 (SWIR) and B8A (Narrow NIR) have scores 10x higher than the visible Blue band (B2).
The Solution: They retrain the model using only the top 5 features, resulting in a 40% faster execution time with no loss in Overall Accuracy.

Best Results

Feature Score	Interpretation	Recommended Action
High Score	Primary Driver	Keep and prioritize for future multi-temporal stacks.
Near-Zero Score	Noise / Redundant	Remove to simplify the model and prevent overfitting.
Uniform Scores	Feature Correlation	High correlation between inputs; consider PCA or removing one.

FAQ

Does a high importance score mean high accuracy?

Not necessarily. Importance scores only tell you which variables the model used most effectively. If your training data is biased, the model might highly value a feature that is actually a proxy for that bias rather than a real-world signal.

Can I use this for the CART classifier?

Yes, ee.Classifier.smileCart also supports the explain() method, but the results are based on a single tree rather than an ensemble, making the scores less robust than Random Forest.

Why do my importance scores change every time I run the script?

Random Forest is stochastic—it uses random subsets of data. Unless you set a seed in the classifier arguments, the scores will fluctuate slightly with each run.

Disclaimer

Gini Importance can be biased toward continuous variables or features with many unique values (high cardinality). In cases where features have vastly different scales or types, consider validating importance through "Permutation Importance" outside of GEE for a more rigorous statistical check. March 2026.

Tags: GoogleEarthEngine, Random_Forest, Feature_Importance, Machine_Learning_GIS