Domain-Informed ML

The best optimization is the computation you don't have to do

The Best Optimization is the Computation You Don’t Do

By incorporating physical constraints and domain knowledge into ML systems, I build solutions that are more efficient, interpretable, and trustworthy than purely data-driven approaches.

This isn’t about choosing between ML and physics—it’s about using each where it excels.


The Core Philosophy

Physics-informed ML: Use domain knowledge to eliminate unnecessary computation.

Instead of asking an ML model to learn everything from scratch, encode what we already know about how the world works. Let the model focus on what we don’t know.

The Advantage

Purely data-driven approaches:

  • Learn relationships from scratch
  • Require massive datasets
  • Struggle with edge cases
  • Produce black-box predictions
  • Computationally expensive

Domain-informed approaches:

  • Encode known physics/constraints
  • Require less training data
  • Respect physical limitations
  • Produce interpretable results
  • Computationally efficient

Real-World Examples

Image Processing: Leveraging Optical Physics

The problem: Microscope images are blurry due to optical diffraction.

Purely data-driven: Train a neural network on millions of blurry/sharp image pairs. Hope it learns the physics of light propagation.

Domain-informed: Use the known Point Spread Function (PSF) from optical physics. Deconvolution with known PSF is faster, more accurate, works with less data.

Result: 10-100x faster processing, physically accurate results, works on new microscope types without retraining.

Biological Imaging: Incorporating Spatial Structure

The problem: Track objects (cells, organelles) across time in microscopy videos.

Purely data-driven: Learn motion patterns from scratch, no assumptions about physics.

Domain-informed: Encode physics constraints:

  • Objects don’t teleport (continuity)
  • Movement follows inertia (smooth motion)
  • Structures have characteristic sizes (spatial constraints)

Result: More robust tracking with less training data, fewer false positives, interpretable failure modes.

Signal Processing: Known Frequency Constraints

The problem: Separate signal from noise in measurements.

Purely data-driven: Learn noise patterns from labeled examples.

Domain-informed: Encode known signal characteristics:

  • Expected frequency range (Nyquist limits)
  • Physical bandwidth limitations
  • Sensor noise profiles

Result: Better signal recovery, works in real-time, no training data required.


Why It Matters

Efficiency

Compute is expensive. Cloud GPU time costs money. On-device inference drains batteries. Edge deployment has power constraints.

Domain-informed approaches reduce computational requirements by eliminating unnecessary learning.

Interpretability

Black boxes are risky. In medical diagnostics, aerospace, finance—you need to understand why the model made a prediction.

Physics-informed models produce interpretable results because the physics is explicit, not hidden in learned weights.

Trustworthiness

Models must respect reality. A purely data-driven model might predict physically impossible outcomes because it never learned the constraints.

Domain-informed models can’t violate known physics—they’re built on it.

Data Efficiency

Labeled data is expensive. Medical imaging annotations require expert radiologists. Scientific datasets require expensive experiments.

Domain-informed approaches learn from less data because they start with knowledge, not ignorance.


Where Domain Knowledge Comes From

Optics and Signal Processing

My background in microscopy and imaging physics provides deep understanding of:

  • Light propagation and diffraction
  • Sensor noise characteristics
  • Fourier analysis and frequency constraints
  • Spatial and temporal sampling limits

Biological Systems

Years working with biological datasets teach:

  • Characteristic spatial scales (cell sizes, organelle dimensions)
  • Temporal dynamics (movement speeds, reaction timescales)
  • Physical constraints (membrane mechanics, diffusion limits)
  • Statistical properties (distributions, correlations)

Real-World Deployment

Production systems reveal constraints that matter:

  • Computational budgets (time, power, cost)
  • Hardware limitations (memory, bandwidth)
  • Latency requirements (real-time processing)
  • Interpretability needs (regulatory compliance)

The Philosophy in Practice

Start With Physics

Before training a model, ask:

  • What do we know about this problem from first principles?
  • What physical constraints must the solution respect?
  • What can we solve analytically instead of learning?

Use ML Where It’s Needed

ML excels at:

  • Complex non-linear relationships
  • Patterns too subtle for hand-crafted rules
  • Situations where physics is unknown or intractable

ML struggles at:

  • Relationships we already understand
  • Enforcing hard constraints
  • Extrapolation beyond training distribution

Combine them: Use physics for what we know, ML for what we don’t.

Validate Against Reality

Physics-informed models should:

  • Respect known constraints (conservation laws, causality)
  • Produce physically plausible outputs
  • Fail gracefully at boundaries
  • Provide interpretable failure modes

If a model predicts the impossible, it’s wrong—even if the metrics look good.


The Computational Win

Example: Image deconvolution

Purely data-driven (neural network):

  • Train on millions of image pairs
  • Days of GPU training time
  • Gigabytes of model weights
  • Inference: 100ms per image
  • Retraining needed for each new microscope type

Domain-informed (PSF-based deconvolution):

  • No training required
  • Knowledge from optical physics
  • Kilobytes of code
  • Inference: 10ms per image
  • Works on any microscope (just measure PSF)

Result: 10x faster, no training cost, better generalization.

This pattern repeats across domains: The computation you don’t do is the fastest computation.


When to Use Domain-Informed Approaches

Strong fit:

  • Physical systems with known governing equations
  • Problems where interpretability is critical
  • Deployments with strict computational budgets
  • Domains with limited training data
  • Applications requiring provable guarantees

Weak fit:

  • Problems with unknown underlying physics
  • Abundant labeled data, few computational constraints
  • Purely perceptual tasks (image aesthetics, language understanding)
  • Situations where black-box predictions are acceptable

Best: Most real problems benefit from hybrid approaches—physics where we have it, learning where we need it.


Connect

If your ML systems are computationally expensive, require massive training data, produce black-box predictions, or violate physical constraints—domain-informed approaches can help.

See it applied in projects →

Back to About →

Contact