Artificial Intelligence

AI-based methods for Surface Logging Data Quality

Timeseries and dots
Fabio Concina (@fabioconcina)
#ai#data

🚀 Why Data Quality in Surface Logging Is Critical

Surface logging sensors generate a high-frequency, multivariate stream of signals—rotary speed, weight on bit, ROP, torque, mud flow, and more. These drive real-time drilling decisions, but noisy or corrupted data can mislead operations and even compromise safety. Traditional threshold checks or manual monitoring often fail to capture subtle multivariate inconsistencies, making automated anomaly detection a necessity.

🧩 A Hybrid Pipeline That Works

The solution we implemented at id3 consists of three well-orchestrated stages:


1️⃣ Syntactic Filtering: Rule‑Based Pre‑Check

MethodPurpose
First‑order differencingDetect spikes or jumps in signal sequences
Domain rulesE.g. if ROP > 0 then WOB must be > 0
Interpolation + dens‑based cleaningImpute missing values & discard noise clusters

This preprocessing ensures the ML stage isn’t fed garbage—only semantically meaningful anomalies pass forward.


2️⃣ Autoencoder with Temporal Convolutions (TCN)

We train a sequence-to-sequence TCN autoencoder on “clean” drilling intervals. Once operational:

  • It reconstructs incoming data.
  • Computes reconstruction error per timestep.

High error signals data unlike normal, thus is marked as a candidate anomaly.

Why TCN?

  • More compact than LSTM/GRU
  • Captures temporal dependencies effectively
  • Less prone to overfitting in this context

3️⃣ Data‑Driven Thresholding

To set the anomaly threshold smartly:

  1. Inject synthetic anomalies informed by sensor physics.
  2. Fit two Gaussians to the reconstruction error distributions (normal vs. anomalous).
  3. Threshold = their intersection → balancing false pos/neg rates automatically.

🎯 Anomaly Scoring & Visualization

Result: each timestep gets a normalized anomaly score [0,1]. This enables:

  • Severity-ranking anomalies
  • Tuning sensitivity

📈 Performance & Takeaways

  • Reconstruction accuracy: Achieved a low reconstruction error (MSE ≈ 0.0058) on normalized sensor signals, indicating the model effectively captures normal patterns in the data. This enables high-contrast detection of anomalies using reconstruction error.
  • Model complexity: TCN remains efficient compared to LSTM
  • Thresholding: Gaussian-intersection proved more robust than F1 or Youden’s index, especially under imbalance.

Outcome: multi-step pipeline for surface logging data quality, combining both classical methods and AI models, to detect most complex anomalies in surface logging data.

References: Nadirkhanlou, A., & Concina, F. (2025). Detecting Low-Quality Intervals in Surface Logging Data Using AI-Based Anomaly Detection. SPE Europe Energy Conference and Exhibition, Vienna, Austria, June 2025. OnePetro