Cancer Diagnostics Platform

Leading cross-functional teams to build GPU-accelerated imaging systems for oncology research

Quick Takeaways

Challenge: Transform cutting-edge microscopy research into production-ready cancer diagnostics deployable across multiple sites

Role: Principal Data Scientist at a Madison-based biosciences startup, leading cross-functional teams (biology, engineering, software, operations)

Impact: 90%+ reduction in analysis time (days → hours), multi-site reproducibility, FDA-aligned validation frameworks

Key Innovation: ScopeFusion platform integrating multi-modal imaging (FLIM, multiphoton, dOCT) with unified control software

Philosophy: Product vision to production reality—translating “what if?” questions into deployed clinical tools through disciplined execution

Recognition: Promoted to Principal Data Scientist (2023), publication in Biomedical Optics Express (2024)

Publication →


The Challenge

The startup needed to transform cutting-edge microscopy research into production-ready cancer diagnostics tools that could be deployed across multiple research sites. The vision: advanced imaging of cancer tissue could provide fast and accurate prediction of treatment effectiveness.

The challenge was three-fold:

  1. Speed: Manual analysis was too slow for clinical relevance (days per sample)
  2. Reproducibility: Results varied between operators and sites
  3. Scale: System needed to handle multiple imaging modalities (FLIM, multiphoton, dynamic optical coherence microscopy)

Technical Complexity:

  • FLIM (Fluorescence Lifetime Imaging): Sensitive to cellular metabolism, but difficult to operate with a daunting number of adjustable settings where user error was inevitable even for experts
  • dOCM (dynamic Optical Coherence Microscopy): Fast depth information and structural changes, but required novel processing algorithms
  • Integration: Combining multiple modalities on a single platform with unified control software

Previous attempts at automation struggled with the complexity of multi-modal data correlation and the computational demands of real-time processing on high-dimensional imaging data.

The Solution

As Principal Data Scientist, I led the development of a comprehensive GPU-accelerated data acquisition and processing platform. This wasn’t just a technical problem - it required coordinating across biology, engineering, and software teams to translate research needs into robust, deployable solutions.

Technical Architecture:

  • Real-time GPU processing pipelines using custom CUDA and DirectX kernels for fluorescence and bright-field microscopy
  • Multi-modal data fusion correlating FLIM (fluorescence lifetime imaging), multiphoton, and dOCT (dynamic optical coherence tomography) streams
  • ML-powered biomarker detection with classification pipelines trained on multi-site oncology datasets
  • Hardware control integration for microscope automation and synchronized data acquisition
  • Validation frameworks ensuring reproducibility across lab sites
  • Custom file format (EH5) for fast storage, collation, and retrieval of multi-modal imaging data

Elephas Workflow Figure: Complete tissue analysis pipeline from sample preparation through multi-modal imaging (OCM, FLIM, histology), automated processing, and data management. This workflow demonstrates the integration of hardware control, real-time GPU processing, and multi-site data coordination.

System Integration (ScopeFusion Project):

I architected and led development of ScopeFusion, a unified software platform that consolidated disparate research codebases into a coherent production solution with three main focuses:

  1. Microscope Control tailored to specific assays with simplified operation
  2. Data Management through custom EH5 file format and IT infrastructure for multi-site coordination
  3. Visualization providing consistent imaging interface across control systems, standalone viewers, and web interfaces

The key insight: one “visual language” across all contexts - operators, researchers, and collaborators could communicate effectively because images looked the same everywhere. This unified interface significantly reduced training time and operator error.

Leadership Approach:

  • Cross-functional team direction: Coordinated biologists (experimental design), engineers (hardware integration), and software developers (analysis pipelines)
  • Strategic alignment: Collaborated with marketing and finance teams to ensure technical roadmap matched business objectives
  • Team building: Recruited and managed team of 1 FTE software developer and 2 contract developers, plus extended collaboration with IT for data infrastructure
  • Training programs: Developed and delivered training for consistent data collection practices across sites
  • Disciplined development: Instituted Git workflows, CI/CD pipelines, and automated testing for regulatory alignment

The Impact

90%+ Reduction in Analysis Time: From days to hours per sample, enabling near-real-time clinical decision support

Multi-Site Reproducibility: Achieved consistent results across multiple laboratory locations, critical for clinical validation

Expanded Research Scope: Enabled larger studies previously infeasible due to analysis bottlenecks

Production Deployment: Successfully transitioned from research prototype to production system used in active oncology research

Team Enablement: Built a development culture emphasizing reproducibility, documentation, and stakeholder consensus

Publication Impact: Contributed to research published in Biomedical Optics Express (2024) on dynamic optical coherence microscopy for cell viability assessment

Technical Deep Dive: Custom HDF5 File Format

A critical innovation was designing a custom HDF5-based file format - a data structure that balanced four competing requirements:

Design Goals:

  1. Consistency: Uniform structure enabling automated workflows and multi-site reproducibility
  2. Flexibility: Accommodating diverse modalities (FLIM, MPM, dOCM, histology) without hindering innovation
  3. Compactness: Smart chunking and lossless compression for terabyte-scale datasets
  4. Speed: Optimized caching for fast retrieval during real-time analysis

Technical Implementation:

Built on HDF5 (Hierarchical Data Format), the format extends the Imaris schema to support raw data alongside processed results. Key features include:

  • Multi-modal storage: FLIM lifetime data, OCM structural/amplitude images, and histology in a single file
  • Metadata preservation: Software versions, processing parameters, calibration data tracked with results
  • Imaris compatibility: Files openable in ImageJ, Fiji, and other standard tools
  • Cross-platform APIs: C++, Python, MATLAB, and C# bindings for broad accessibility

The format proved essential for clinical validation - enabling researchers to track complete provenance from raw acquisition through final analysis, a requirement for regulatory compliance.

Technical Stack

GPU Computing: CUDA, DirectX for real-time image processing Machine Learning: Custom classifiers for biomarker detection, feature extraction pipelines Data Management: Custom HDF5-based file format with C++/Python/MATLAB/C# APIs Imaging Modalities: FLIM, multiphoton microscopy, dynamic OCT, bright-field DevOps: Git workflows, Azure Pipelines, CI/CD automation, Docker Languages: C++, Python, MATLAB, C# Hardware: Multi-GPU workstations, automated microscopy systems

Recognition

  • Promoted to Principal Data Scientist (2023) in recognition of technical and leadership impact
  • Biomedical Optics Express Publication (2024): Co-author on dynamic OCT methodology paper
  • Multi-site adoption: System deployed across partner research institutions