Domain-Informed ML
The best optimization is the computation you don't have to do
The Best Optimization is the Computation You Don’t Do
By incorporating physical constraints and domain knowledge into ML systems, I build solutions that are more efficient, interpretable, and trustworthy than purely data-driven approaches.
This isn’t about choosing between ML and physics—it’s about using each where it excels.
The Core Philosophy
Physics-informed ML: Use domain knowledge to eliminate unnecessary computation.
Instead of asking an ML model to learn everything from scratch, encode what we already know about how the world works. Let the model focus on what we don’t know.
The Advantage
Purely data-driven approaches:
- Learn relationships from scratch
- Require massive datasets
- Struggle with edge cases
- Produce black-box predictions
- Computationally expensive
Domain-informed approaches:
- Encode known physics/constraints
- Require less training data
- Respect physical limitations
- Produce interpretable results
- Computationally efficient
Real-World Examples
Image Processing: Leveraging Optical Physics
The problem: Microscope images are blurry due to optical diffraction.
Purely data-driven: Train a neural network on millions of blurry/sharp image pairs. Hope it learns the physics of light propagation.
Domain-informed: Use the known Point Spread Function (PSF) from optical physics. Deconvolution with known PSF is faster, more accurate, works with less data.
Result: 10-100x faster processing, physically accurate results, works on new microscope types without retraining.
Biological Imaging: Incorporating Spatial Structure
The problem: Track objects (cells, organelles) across time in microscopy videos.
Purely data-driven: Learn motion patterns from scratch, no assumptions about physics.
Domain-informed: Encode physics constraints:
- Objects don’t teleport (continuity)
- Movement follows inertia (smooth motion)
- Structures have characteristic sizes (spatial constraints)
Result: More robust tracking with less training data, fewer false positives, interpretable failure modes.
Signal Processing: Known Frequency Constraints
The problem: Separate signal from noise in measurements.
Purely data-driven: Learn noise patterns from labeled examples.
Domain-informed: Encode known signal characteristics:
- Expected frequency range (Nyquist limits)
- Physical bandwidth limitations
- Sensor noise profiles
Result: Better signal recovery, works in real-time, no training data required.
Why It Matters
Efficiency
Compute is expensive. Cloud GPU time costs money. On-device inference drains batteries. Edge deployment has power constraints.
Domain-informed approaches reduce computational requirements by eliminating unnecessary learning.
Interpretability
Black boxes are risky. In medical diagnostics, aerospace, finance—you need to understand why the model made a prediction.
Physics-informed models produce interpretable results because the physics is explicit, not hidden in learned weights.
Trustworthiness
Models must respect reality. A purely data-driven model might predict physically impossible outcomes because it never learned the constraints.
Domain-informed models can’t violate known physics—they’re built on it.
Data Efficiency
Labeled data is expensive. Medical imaging annotations require expert radiologists. Scientific datasets require expensive experiments.
Domain-informed approaches learn from less data because they start with knowledge, not ignorance.
Where Domain Knowledge Comes From
Optics and Signal Processing
My background in microscopy and imaging physics provides deep understanding of:
- Light propagation and diffraction
- Sensor noise characteristics
- Fourier analysis and frequency constraints
- Spatial and temporal sampling limits
Biological Systems
Years working with biological datasets teach:
- Characteristic spatial scales (cell sizes, organelle dimensions)
- Temporal dynamics (movement speeds, reaction timescales)
- Physical constraints (membrane mechanics, diffusion limits)
- Statistical properties (distributions, correlations)
Real-World Deployment
Production systems reveal constraints that matter:
- Computational budgets (time, power, cost)
- Hardware limitations (memory, bandwidth)
- Latency requirements (real-time processing)
- Interpretability needs (regulatory compliance)
The Philosophy in Practice
Start With Physics
Before training a model, ask:
- What do we know about this problem from first principles?
- What physical constraints must the solution respect?
- What can we solve analytically instead of learning?
Use ML Where It’s Needed
ML excels at:
- Complex non-linear relationships
- Patterns too subtle for hand-crafted rules
- Situations where physics is unknown or intractable
ML struggles at:
- Relationships we already understand
- Enforcing hard constraints
- Extrapolation beyond training distribution
Combine them: Use physics for what we know, ML for what we don’t.
Validate Against Reality
Physics-informed models should:
- Respect known constraints (conservation laws, causality)
- Produce physically plausible outputs
- Fail gracefully at boundaries
- Provide interpretable failure modes
If a model predicts the impossible, it’s wrong—even if the metrics look good.
The Computational Win
Example: Image deconvolution
Purely data-driven (neural network):
- Train on millions of image pairs
- Days of GPU training time
- Gigabytes of model weights
- Inference: 100ms per image
- Retraining needed for each new microscope type
Domain-informed (PSF-based deconvolution):
- No training required
- Knowledge from optical physics
- Kilobytes of code
- Inference: 10ms per image
- Works on any microscope (just measure PSF)
Result: 10x faster, no training cost, better generalization.
This pattern repeats across domains: The computation you don’t do is the fastest computation.
When to Use Domain-Informed Approaches
Strong fit:
- Physical systems with known governing equations
- Problems where interpretability is critical
- Deployments with strict computational budgets
- Domains with limited training data
- Applications requiring provable guarantees
Weak fit:
- Problems with unknown underlying physics
- Abundant labeled data, few computational constraints
- Purely perceptual tasks (image aesthetics, language understanding)
- Situations where black-box predictions are acceptable
Best: Most real problems benefit from hybrid approaches—physics where we have it, learning where we need it.
Connect
If your ML systems are computationally expensive, require massive training data, produce black-box predictions, or violate physical constraints—domain-informed approaches can help.