Cancer Diagnostics Platform
Leading cross-functional teams to build GPU-accelerated imaging systems for oncology research
Quick Takeaways
Challenge: Transform cutting-edge microscopy research into production-ready cancer diagnostics deployable across multiple sites
Role: Principal Data Scientist at a Madison-based biosciences startup, leading cross-functional teams (biology, engineering, software, operations)
Impact: 90%+ reduction in analysis time (days → hours), multi-site reproducibility, FDA-aligned validation frameworks
Key Innovation: ScopeFusion platform integrating multi-modal imaging (FLIM, multiphoton, dOCT) with unified control software
Philosophy: Product vision to production reality—translating “what if?” questions into deployed clinical tools through disciplined execution
Recognition: Promoted to Principal Data Scientist (2023), publication in Biomedical Optics Express (2024)
The Challenge
The startup needed to transform cutting-edge microscopy research into production-ready cancer diagnostics tools that could be deployed across multiple research sites. The vision: advanced imaging of cancer tissue could provide fast and accurate prediction of treatment effectiveness.
The challenge was three-fold:
- Speed: Manual analysis was too slow for clinical relevance (days per sample)
- Reproducibility: Results varied between operators and sites
- Scale: System needed to handle multiple imaging modalities (FLIM, multiphoton, dynamic optical coherence microscopy)
Technical Complexity:
- FLIM (Fluorescence Lifetime Imaging): Sensitive to cellular metabolism, but difficult to operate with a daunting number of adjustable settings where user error was inevitable even for experts
- dOCM (dynamic Optical Coherence Microscopy): Fast depth information and structural changes, but required novel processing algorithms
- Integration: Combining multiple modalities on a single platform with unified control software
Previous attempts at automation struggled with the complexity of multi-modal data correlation and the computational demands of real-time processing on high-dimensional imaging data.
The Solution
As Principal Data Scientist, I led the development of a comprehensive GPU-accelerated data acquisition and processing platform. This wasn’t just a technical problem - it required coordinating across biology, engineering, and software teams to translate research needs into robust, deployable solutions.
Technical Architecture:
- Real-time GPU processing pipelines using custom CUDA and DirectX kernels for fluorescence and bright-field microscopy
- Multi-modal data fusion correlating FLIM (fluorescence lifetime imaging), multiphoton, and dOCT (dynamic optical coherence tomography) streams
- ML-powered biomarker detection with classification pipelines trained on multi-site oncology datasets
- Hardware control integration for microscope automation and synchronized data acquisition
- Validation frameworks ensuring reproducibility across lab sites
- Custom file format (EH5) for fast storage, collation, and retrieval of multi-modal imaging data
Figure: Complete tissue analysis pipeline from sample preparation through multi-modal imaging (OCM, FLIM, histology), automated processing, and data management. This workflow demonstrates the integration of hardware control, real-time GPU processing, and multi-site data coordination.
System Integration (ScopeFusion Project):
I architected and led development of ScopeFusion, a unified software platform that consolidated disparate research codebases into a coherent production solution with three main focuses:
- Microscope Control tailored to specific assays with simplified operation
- Data Management through custom EH5 file format and IT infrastructure for multi-site coordination
- Visualization providing consistent imaging interface across control systems, standalone viewers, and web interfaces
The key insight: one “visual language” across all contexts - operators, researchers, and collaborators could communicate effectively because images looked the same everywhere. This unified interface significantly reduced training time and operator error.
Leadership Approach:
- Cross-functional team direction: Coordinated biologists (experimental design), engineers (hardware integration), and software developers (analysis pipelines)
- Strategic alignment: Collaborated with marketing and finance teams to ensure technical roadmap matched business objectives
- Team building: Recruited and managed team of 1 FTE software developer and 2 contract developers, plus extended collaboration with IT for data infrastructure
- Training programs: Developed and delivered training for consistent data collection practices across sites
- Disciplined development: Instituted Git workflows, CI/CD pipelines, and automated testing for regulatory alignment
The Impact
90%+ Reduction in Analysis Time: From days to hours per sample, enabling near-real-time clinical decision support
Multi-Site Reproducibility: Achieved consistent results across multiple laboratory locations, critical for clinical validation
Expanded Research Scope: Enabled larger studies previously infeasible due to analysis bottlenecks
Production Deployment: Successfully transitioned from research prototype to production system used in active oncology research
Team Enablement: Built a development culture emphasizing reproducibility, documentation, and stakeholder consensus
Publication Impact: Contributed to research published in Biomedical Optics Express (2024) on dynamic optical coherence microscopy for cell viability assessment
Technical Deep Dive: Custom HDF5 File Format
A critical innovation was designing a custom HDF5-based file format - a data structure that balanced four competing requirements:
Design Goals:
- Consistency: Uniform structure enabling automated workflows and multi-site reproducibility
- Flexibility: Accommodating diverse modalities (FLIM, MPM, dOCM, histology) without hindering innovation
- Compactness: Smart chunking and lossless compression for terabyte-scale datasets
- Speed: Optimized caching for fast retrieval during real-time analysis
Technical Implementation:
Built on HDF5 (Hierarchical Data Format), the format extends the Imaris schema to support raw data alongside processed results. Key features include:
- Multi-modal storage: FLIM lifetime data, OCM structural/amplitude images, and histology in a single file
- Metadata preservation: Software versions, processing parameters, calibration data tracked with results
- Imaris compatibility: Files openable in ImageJ, Fiji, and other standard tools
- Cross-platform APIs: C++, Python, MATLAB, and C# bindings for broad accessibility
The format proved essential for clinical validation - enabling researchers to track complete provenance from raw acquisition through final analysis, a requirement for regulatory compliance.
Technical Stack
GPU Computing: CUDA, DirectX for real-time image processing Machine Learning: Custom classifiers for biomarker detection, feature extraction pipelines Data Management: Custom HDF5-based file format with C++/Python/MATLAB/C# APIs Imaging Modalities: FLIM, multiphoton microscopy, dynamic OCT, bright-field DevOps: Git workflows, Azure Pipelines, CI/CD automation, Docker Languages: C++, Python, MATLAB, C# Hardware: Multi-GPU workstations, automated microscopy systems
Recognition
- Promoted to Principal Data Scientist (2023) in recognition of technical and leadership impact
- Biomedical Optics Express Publication (2024): Co-author on dynamic OCT methodology paper
- Multi-site adoption: System deployed across partner research institutions
Links
- Publication: Assessing cell viability with dynamic optical coherence microscopy - Biomedical Optics Express, 2024