AI-based methods for Surface Logging Data Quality

🚀 Why Data Quality in Surface Logging Is Critical
Surface logging sensors generate a high-frequency, multivariate stream of signals—rotary speed, weight on bit, ROP, torque, mud flow, and more. These drive real-time drilling decisions, but noisy or corrupted data can mislead operations and even compromise safety. Traditional threshold checks or manual monitoring often fail to capture subtle multivariate inconsistencies, making automated anomaly detection a necessity.
🧩 A Hybrid Pipeline That Works
The solution we implemented at id3 consists of three well-orchestrated stages:
1️⃣ Syntactic Filtering: Rule‑Based Pre‑Check
Method | Purpose |
---|---|
First‑order differencing | Detect spikes or jumps in signal sequences |
Domain rules | E.g. if ROP > 0 then WOB must be > 0 |
Interpolation + dens‑based cleaning | Impute missing values & discard noise clusters |
This preprocessing ensures the ML stage isn’t fed garbage—only semantically meaningful anomalies pass forward.
2️⃣ Autoencoder with Temporal Convolutions (TCN)
We train a sequence-to-sequence TCN autoencoder on “clean” drilling intervals. Once operational:
- It reconstructs incoming data.
- Computes reconstruction error per timestep.
High error signals data unlike normal, thus is marked as a candidate anomaly.
Why TCN?
- More compact than LSTM/GRU
- Captures temporal dependencies effectively
- Less prone to overfitting in this context
3️⃣ Data‑Driven Thresholding
To set the anomaly threshold smartly:
- Inject synthetic anomalies informed by sensor physics.
- Fit two Gaussians to the reconstruction error distributions (normal vs. anomalous).
- Threshold = their intersection → balancing false pos/neg rates automatically.
🎯 Anomaly Scoring & Visualization
Result: each timestep gets a normalized anomaly score [0,1]. This enables:
- Severity-ranking anomalies
- Tuning sensitivity
📈 Performance & Takeaways
- Reconstruction accuracy: Achieved a low reconstruction error (MSE ≈ 0.0058) on normalized sensor signals, indicating the model effectively captures normal patterns in the data. This enables high-contrast detection of anomalies using reconstruction error.
- Model complexity: TCN remains efficient compared to LSTM
- Thresholding: Gaussian-intersection proved more robust than F1 or Youden’s index, especially under imbalance.
Outcome: multi-step pipeline for surface logging data quality, combining both classical methods and AI models, to detect most complex anomalies in surface logging data.
References: Nadirkhanlou, A., & Concina, F. (2025). Detecting Low-Quality Intervals in Surface Logging Data Using AI-Based Anomaly Detection. SPE Europe Energy Conference and Exhibition, Vienna, Austria, June 2025. OnePetro