Now Hiring: Data Engineer

We’re growing our team at Sphaeric.ai. We’re looking for a data engineer with proven programming experience in Python, Java, SQL, Airflow etc.

Interested? Check out the job description below and apply.

About Sphaeric.ai:
Sphaeric.ai is an up and coming leader in developing and implementing AI solutions for corporate clients and funded startups. We are currently working with clients in the insurance, healthcare, and marketing industries and are looking to move into additional verticals as we continue to grow.

Current solutions we offer include:

  • ML/AI model building and deployment in flexible and scalable cloud environments
  • Web application development for process automation and AI interaction
  • Cloud-based data engineering
  • Technical consulting

This is a great opportunity to join a small but growing company and quickly become a part of the leadership team.

Position Summary:
As a data engineer for Sphaeric.ai you will be entrusted with significant responsibility. The primary function of this role will be lead data engineer for a sophisticated cloud-based data collection platform. This includes working with Apache Airflow to ensure that data pipelines are functioning properly, innovating and implementing improvements, and taking lead on recommendations. In addition, you will be responsible for supporting our data science services that could include model deployment and some front-end web app development.

We are looking for a candidate who can grow into a leadership role with our company.

Responsibilities:

  • Managing workflows in Apache Airflow
  • Building data pipelines to collect and store data in the cloud
  • Reworking existing backend infrastructure to optimize performance
  • Deploying of ML/AI models in the cloud
  • Building and deploying UIs for ML/AI models
  • Interact with clients regarding work products

Requirements:

  • Ability to adapt to new programming languages or software products quickly
  • Proven programming experience (preferred but not required: Python, Java, SQL, Airflow, etc.)
  • Knowledge of and experience with the software development lifecycle
  • Familiarity with cloud environments
  • Familiarity with data warehousing
  • Familiarity with big data tools (Hadoop, Apache Spark, mongoDB, etc.)
  • Optional Requirements:
  • Experience with dashboarding and data visualization
  • Strong quantitative and problem-solving skills (exposure to math, statistics, engineering or physics)

Benefits of working with Sphaeric.ai

Team members working with Sphaeric.ai will be entrusted with significant responsibility and room for growth. We love innovative thinking and working at the highest standard while maintaining a relaxed environment in which we actively help each other learn and share best practices that we discover.

Specific benefits include:

  • Work on industry-leading AI projects
  • Participate in the entire project lifecycle from ideation to deployment
  • Involvement in AI conferences
  • Opportunity to work with a growing startup which is set to expand significantly in the next 5 years

Interested applicants should send resumes to:
Paul Kostoff
Managing Partner
pkostoff@sphaeric.ai

[formidable id=5]

Model Interpretation: The missing link between machine learning, healthcare and the FDA?

Recent advances enable practitioners to break open machine learning’s “black box”.

August 20, 2018

Andrew Langsner, Co-founder / Managing Partner, Sphaeric.ai

Patrick Hall, Senior Director of Product, H2O.ai

From analytical tests in drug manufacture run by machine learning algorithms, to predictive models recommending courses of treatment, to sophisticated software that can read test images better than doctors, machine learning has promised a new world of healthcare where algorithms can assist, or even outperform, professionals in consistency and accuracy, saving money and avoiding potentially life-threatening mistakes. But what if your doctor told you that you were sick but could not tell you what led her to that conclusion? Imagine a hospital that hospitalized and discharged patients but was unable to provide specific justification for each decision it made. For decades, this was a key roadblock for machine learning algorithms in healthcare: they could make data-driven decisions that helped practitioners, payers, and patients, but they couldn’t tell users why those decisions were made.

Today, recent advances in machine learning research and implementation may have cracked open the black box of algorithmic decision making. A flurry of research into model interpretation, or “the ability
to explain or to present in understandable terms to a human,”1 has resulted in a growing body of credible literature and tools for accurate models with interpretable inner-workings2, accountability and fairness of algorithmic decision-making3, and post-hoc explanation of complex model predictions4,5. Can this research really be applied to healthcare, and if so, where would it be most immediately impactful? Three suggestions and an example use case are put forward below.

THREE HURDLES TO BLACK BOX ALGORITHMS

FDA and drug development

The FDA has notoriously stringent requirements for the approval of new drugs. This could pose a challenge to drug companies experimenting with machine learning to enforce quality control and even to analyze test results to better detect the presence and proper concentrations of drug compounds.6 The FDA requires full transparency and replicability for all analytical tests involved in the manufacture of new drugs.7 In the past this has involved providing lists of formulas and methods for analyzing test results (e.g. chromatography tests). But questions remain about how the FDA would treat a new drug application (NDA) that relied on a complex black box machine learning model to maintain quality in the manufacturing process. Interpretable machine learning techniques could help address some of these questions.

Medical Devices

This year, for the first time the FDA approved an artificial intelligence device.8 This marks a major milestone for medical devices using proprietary black-box algorithms that can diagnose diseases from images. The device was approved through the FDA’s De Novo premarket review pathway9 which provides a review process for novel devices that represent a low to moderate risk. The low to moderate risk classification is key to a successful De Novo review. But the FDA has yet to approve a device determined to have a high potential risk to patient outcomes. For example, a diagnostic algorithm where a false positive could lead to an invasive and risky procedure. Extra controls would likely be needed on such an algorithm and with the latest model interpretability techniques, it would be possible to have additional checks on the model output itself.

Another possibility for bringing machine learning into medical devices is to avoid FDA oversight entirely. In December 2018, Congress passed the 21st Century Cures Act10 to exclude what is commonly referred to as clinical decision support (CDS) software from FDA purview under certain conditions; namely, that the health care provider using the software can independently review the basis for the software’s recommendation. In December 2017 the FDA published guidance11 stating that “the sources supporting the recommendation or underlying the rationale for the recommendation should be identified and easily accessible to the intended user, understandable by the intended user (e.g., data points whose meaning is well understood by the intended user)….” Traditional machine learning software would not meet this criterion due to the black box nature of most machine learning models. However, with recent advances in model interpretability, it is possible to display explanations for every decision made by a machine learning model, potentially enabling a user to verify the soundness of the rationale behind the automated recommendation.

Risk-based guidance

Much attention has been given to hospital readmissions since passage of the Affordable Care Act and the beginning of the Hospital Readmissions Reduction Program. According to a congressional advisory panel, 30-day readmissions costs amongst Medicare patients are nearing $15 billion.12 Predictive models developed with machine learning have shown to be successful at predicting avoidable hospital readmissions13 and some health systems have already adopted machine learning based models successfully.14 At the same time, interest has been growing amongst government entities and private insurance companies15 into the use of machine learning models to develop automated fraud and waste detection on incoming medical claims. Now it should be possible for these models to explain their decisions to practitioners, payers, and patients, allowing users to investigate the actual reasons behind automated medical decision making and determine if an individual decision was reasonable or could be improved.

TOWARD THE APPLICATION OF INTERPRETABLE MACHINE LEARNING IN HEALTHCARE

Since more deliberations about the ethical, medical, and economic implications of interpretable machine learning in healthcare are certainly necessary, an example risk-based guidance use case has been provided for the sake of furthering such discussions. The example use case should be similar to the methods that organizations are already using for predicting 30-day readmissions, but instead of using an older linear modeling approach, the example uses a nonlinear, “white box” machine learning approach to achieve about a 1% increase in readmission prediction accuracy. Explanatory techniques are then used to describe both the internal mechanisms of the model and every prediction the model makes.

It is left to practitioners and domain experts to determine whether the example techniques truly surpass more established methods by any number of criteria, e.g. ability to handle heterogeneous data, accuracy, or interpretability. The only explicit argument made here is: when people’s lives are being affected by mathematical models, it does seem prudent to investigate and evaluate new modeling and analysis techniques.

The open source and freely available example use case is available here:

https://github.com/jphall663/diabetes_use_case

The authors wrote this blog in anticipation of the 2018 Xavier Healthcare AI summit.

This post also appears on H2O.ai’s blog. https://www.h2o.ai/blog/interpretability-the-missing-link-between-machine-learning-healthcare-and-the-fda/

ABOUT THE AUTHORS

Andrew Langsner is a Co-founder and a Managing Partner at Sphaeric.ai. He is an experienced problem solver with a passion for data-driven decision making. Andrew is always exploring ways to make advanced analytics valuable to businesses and organizations. He holds an MPP from Georgetown University. Continue the conversation online with Andrew on Linkedin.

Patrick Hall is a senior director for data science products at H2O.ai where he focuses mainly on model interpretability and model management. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Continue the conversation online with Patrick on Linkedin, Twitter, or Quora.

8 FDA permits marketing of artificial intelligence-based device to detect certain diabetes-related eye problems. https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm604357.htm