Oil spills pose severe threats to marine ecosystems, coastal economies, and human health. Detecting them quickly and accurately is critical for limiting their impact. This project applies computer vision and machine learning techniques to satellite and aerial imagery to identify the visual signatures of oil spills — dark, irregular surface patches with characteristic spectral properties — and classify regions as spill or non-spill. By automating this analysis, you can scale environmental monitoring far beyond what manual inspection allows.Documentation Index
Fetch the complete documentation index at: https://github-52.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Dataset overview
The dataset consists of satellite and aerial imagery collected over open-water environments, with each image annotated to indicate regions containing oil spills. Key characteristics of the dataset include:| Property | Details |
|---|---|
| Source | Public remote sensing datasets (e.g., SAR and optical imagery archives) |
| Image format | Grayscale and RGB raster images (TIFF/PNG) |
| Labels | Binary pixel-level or bounding-box annotations (spill / no spill) |
| Resolution | Varies from 1 m to 10 m per pixel depending on sensor |
| Class distribution | Imbalanced — spill regions are a small fraction of total pixels |
Methodology
Image data preparation
You begin by loading raw images and their corresponding label masks using OpenCV and NumPy. Images are resized to a consistent resolution, normalized to the [0, 1] range, and split into fixed-size patches (e.g., 64×64 pixels). Each patch is assigned a binary label based on whether its corresponding mask region contains any annotated spill pixels. You apply oversampling to the minority (spill) class to balance training batches.
Feature extraction
You extract handcrafted features from each image patch to represent its visual content compactly. Features include mean and standard deviation of pixel intensity, texture descriptors from the Gray-Level Co-occurrence Matrix (GLCM), and edge density computed via Canny edge detection. For RGB imagery, you also compute channel-wise statistics. These features are assembled into a flat feature vector per patch.
Classification model
You train a Random Forest classifier on the extracted feature vectors, selected for its interpretability, robustness to class imbalance (via
class_weight="balanced"), and strong baseline performance on tabular feature sets. You also experiment with a Support Vector Machine (SVM) with an RBF kernel. Both models are trained using a stratified 80/20 train-test split, and hyperparameters are tuned via 5-fold cross-validation.Evaluation
Because accuracy is misleading on an imbalanced dataset, you evaluate the model using precision, recall, F1 score, and the area under the precision-recall curve (AUC-PR). You also compute the confusion matrix to understand false-positive and false-negative trade-offs. In the environmental monitoring context, false negatives (missed spills) are costlier than false positives, so you tune the classification threshold to favor higher recall.
Environmental impact assessment
Beyond classification accuracy, you estimate the spatial extent of detected spill regions by mapping predicted patch labels back to image coordinates and computing the approximate affected area in square kilometers (using the known image resolution). This gives a tangible environmental metric that connects the model output to real-world impact and supports downstream response planning.
Key findings
The Random Forest classifier achieved an F1 score of 0.83 on the test set for the spill class, with a recall of 0.87 — meaning the model correctly identified 87% of actual spill patches. High recall is the priority in environmental monitoring scenarios where missing a spill has far greater consequences than investigating a false alarm.
class_weight="balanced", the classifier defaulted to predicting “no spill” for ambiguous patches. Balanced weighting and threshold calibration were essential for achieving useful recall.
Applications
Automating oil spill detection from remote sensing data supports several practical use cases:- Early warning systems: Satellite passes over at-risk regions can be processed automatically, triggering alerts for unusual surface patterns before a spill spreads significantly.
- Environmental monitoring: Regulatory agencies and environmental groups can use the model to screen large archives of historical imagery and track spill frequency and geography over time.
- Emergency response planning: Rapid area estimation helps responders prioritize deployment of containment booms and skimmer vessels to the most affected zones.
- Insurance and liability assessment: Documented spill extent from imagery provides objective evidence for post-incident damage assessments.
Technologies
| Tool | Purpose |
|---|---|
| Python 3.10+ | Primary programming language |
| OpenCV | Image loading, resizing, edge detection |
| scikit-image | GLCM texture feature extraction |
| Scikit-learn | Random Forest and SVM classifiers, evaluation metrics |
| NumPy | Array manipulation and patch extraction |
| Matplotlib | Prediction overlays and precision-recall curves |
Related projects
Stock Market Analysis
Price trend and volatility analysis using historical OHLCV data.
Call Volume Trend Analysis
Time series analysis of inbound call patterns and forecasting.