December 10, 2020

PercayAI team, Dr. Dan Kuster

Welcome to another question and answer with Dr. Dan Kuster, the CEO and founder of Cambrio, a digital R&D lab focused on life sciences. Dan holds a PhD in biomedical engineering from Washington University in St. Louis, and is now based in the Boston, MA area.


About our software
Sign up
PercayAI home

In case you missed it, be sure to read part 1 of our Ask the Expert series.

Today, Dan will be discussing how applying AI and machine learning in the life sciences differs from applying it to other fields. We’ll discuss a comparison between two models: one which differentiates between dogs and cats, and another, which differentiates between cancer versus not cancer. As always, if you have thoughts or questions for us or Dr. Kuster, please send a message below, or reach out to us at Let’s hop in.


Q: Dan, can you start off by explaining how different it is to create a model that classifies cats versus dogs, than one that classifies concepts relevant to biology or drug discovery?


A: That’s a great question actually, the answer is they’re the exactly the same difficulty in terms of the plumbing. If the task is to read some input, for example, an image of an animal and predict the class membership, is this a cat or a dog, then from the perspective of how inputs flow to outputs, there is really no difference between cats and dogs and any other prediction. It is a classification task, and there are a ton of available examples to draw upon. You can find a model from a public domain site, read a tutorial, watch a YouTube video, however you arrive at it, and you’ve got a model that reads a two-dimensional array of numbers (i.e., an image), computes probability across all the possible classes, and guesses the most probable class (i.e., “dog”). Pretty straightforward, and now you can build “cats vs. dogs” or “hot dog vs. not hot dog” or whatever. 


Then you put it in front of a real user and they want to know about different classes than your model was trained to predict. Same inputs, new outputs. We’re still reading images of the same shape, so that’s easy, right? Well, it depends. If your data and model span the set of representations and contexts you want to add as a new class, then you probably just need to add the new class and fine tune some things. But if you want to change domains, say, from cats and dogs to “normal chest X-ray” versus “chest X-ray with ground glass opacities” then you’re going to have a rough time… there is no reason to expect that distributions of pixels that indicate cats and dogs in a photo will have any relationship to pixels in a radiograph of lungs with pneumonia in a chest X-ray. This is an important and underappreciated reality of scientific computation – just because you can feed inputs of the right shape does not mean you (or your model) is making good, informed, interpretable guesses in the new domainIt can be really easy to fool ourselves, and the rapid progress in the open domain, where data are cheap and anyone can “see,” can give a false sense of confidence about the effort required to achieve strong results in a more difficult domain, like medical imaging where rare and expensive expertise is required.


With the previous question asked about domain expertise, I framed it as “how much would you trust your physician if they hadn’t gone to medical school?” Similarly with predictive models, one cannot expect a model which has had no relevant training to perform well in a different domain-specific prediction task. Just like your physician needs to learn to observe, contextualize and weigh the balance of evidence across competing hypotheses, so do predictive models (and the scientists and engineers who build them).


To do that, we need to look at the data and predictions in the proper context to understand if a model is producing answers that are good enough to inform the best possible next action, decision, report, or downstream effect. The human expert who is responsible for making the decision needs proper contextualized input and an understanding of the tradeoffs downstream too, and they probably do this transparently as part of their own thinking patterns. So returning to the example of a radiologist, typically when making a diagnosis, a radiologist won’t care as much about absolute accuracy of a pixel-based prediction versus identifying diagnostic features to inform the best possible clinical options for the patient. And they’re not going to just look at an image in isolation. Does this individual have relevant health/disease history? How did their last chest X-ray look, and what has changed? Where things have changed, what are the most reasonable explanations for each change? What could I be missing? And so on. They’re trying to present the most informative output they can to inform the clinical team and ultimately, the best care options for the patient.


Medical decisions can be high-stakes, and context is really important; a diagnosis of lung cancer has a different treatment path than pneumonia, and the wrong decision can be deadly. Classifying photos of cats and dogs? Not so much.


Given a decent photo, I have everything I need to know if there’s a cat in the photo, I don’t need to know what this cat looked like a year ago to know it is still a cat today. Context doesn’t really change “catness,” anyone can see how a cat can be photographed in a variety of different views and it is still a cat. 


So now the differences are: how difficult is it to get an authoritative prediction? How do we know we have a good answer? For cats/dogs it is cheap and easy, so one strategy might be to just collect a lot of data and labels, and when possible, that has worked well. But it is much more difficult for cancer where there's a lot of uncertainty, data are not easily available, and it is expensive to generate and label imagery. From a statistical learning perspective, one might say we are looking for separable distributions. The closer the distributions of pixels overlap, the more samples we need to make a confident call about differences among them. Medical images tend to be harder to separate, and also much more difficult and expensive to obtain in large numbers.


Which begs the question: why don’t we look for examples on the margin, and build a dataset around those? And that’s a great idea, but the physics of the data generating processes come into play. For example, it is easy to take a digital photo of a cat or a dog. Pets are abundant and so are digital cameras, so we can easily set up a situation to generate a photo of a dog running in the grass, and a similar view of the grass without the dog. But how hard would it be to get an image of a lung with pneumonia, and a similar lung before or after disease? It is a relatively common clinical occurrence, but still requires relatively expensive machinery, in a clinical setting, where you need to control for radiation dosage, safety, imaging platform and cost. You need a consenting patient who understands how their data will be used, someone who can operate the machine, someone to take care of the patient, someone to pay and someone to interpret it. Clinical imaging machines are sensitive, complex, expensive machines that tend to emit proprietary file formats; you can’t just “share to Instagram” from any available smartphone and even if you could it wouldn’t be readable. And ionizing radiation presents a health risk, so one needs to balance exposure versus benefit. So it’s much more difficult to get a medical image than a photo of a cat or a dog. 


And there is still a big difference in ability to interpret the different imagery; it’s much more expensive to get the radiologist to make a judgement and encode it in a way that non-experts can understand, in terms of the diagnosis and context. You have to handle that image with respect to HIPAA or other regulatory requirements. You need to consider the patient and doing the best for them, and the fact that it’s their personal data. You need to be aware of the impact it might have. In the case of a chest x-ray there’s a reason that the test was ordered, it cost someone money and is significant to them. There is an impact to both a positive result and a negative result.


So to summarize, from a software plumbing perspective, there’s very little difference. You get an image, feed it into a model, and get a classification result. From this perspective it is exactly as difficult to classify “cats versus dogs” as medically relevant tasks.



But from the perspective of actually solving the problem for the patient, and enabling the best possible decisions downstream, there is a huge difference between a task with abundant, cheap, easily interpreted input data and low-risk predictions like “cats vs. dogs” and one with unique, expensive, personal, highly contextualized input data where predictions are uncertain and potentially high impact for everyone involved.


Q: Specifically with the work that PercayAI is doing, what are the difficulties in classifying that type of biological data?


A: There are many, but lets focus on one in particular, which is interpretation and the importance of domain expertise. There’s a spectrum of difficulty on these tasks, and an important “difficulty factor” seems to be how contextually sensitive your prediction can be, based upon the input? For cats and dogs, I mentioned how if you’re going to take a photo and then visually classify if that image is a cat or a dog, all of the information you need to make a strong prediction there is contained in that one photograph, objects (e.g., dog) in context (e.g., grassy field on a sunny day). But this is usually not the case for medical imaging. For example, when we talk about a chest x-ray and if there’s indications of cancer in this chest x-ray, a very important input for a radiologist to consider is: is there a history of previous chest x-rays and what has changed? Because we know cancer grows, it doesn’t just teleport into existence. And there are a lot of things that can look like cancer but they’re actually not. Maybe you had pneumonia in a past x-ray, which might look like cancer temporarily, but it clears up after treatment with antibiotics. Or if you had a sequence that showed that you had a suspicious nodule and then in a later scan it cleared, that gives you two really valuable data points. You know what pneumonia looks like in this person, and what it looks like when it clears. And maybe a new suspicious nodule grew and did not clear with antibiotic therapy. None of this context is available from a single image, the radiologist needs to use their expertise to see things occuring in a context, form a hypothesis in their mind and test that hypothesis against the other available observations. Human vision is very powerful, and experts have more powerful ability to contextualize and hypothesize in their domain than untrained eyes/mind.

This difference we find between easy open-domain tasks and challenging domain-specific ones is actually quite interesting.



What exactly makes someone confident they see a cat in a photo? If you ask someone to explain exactly which pixels or relationships between pixels give them confidence it is a cat, they will struggle to be specific. One might say “well it has whiskers that look like cat whiskers” ...but some dogs have whiskers too, what’s the difference you see? Or “it has a cat-like nose” ...but maybe you can’t even see the whole nose in the photo. “Well it has stripes, and dogs don’t really have stripes” ...true, but what about this photo of a dog with striped, painted fur, does that make it more cat-like? Probably not. So when you really dig in and start exploring the very specific observations in context, it becomes apparent that a strong model needs to weigh many different contextual factors, probably across many different levels of abstraction. The model has to learn how to balance all these probabilities amongst them, and it is really cool that it works at all... the fundamentals of that, it does not require a human to look at the image and contextualize it, these relationships can be parallelized and learned automatically.


That’s why those of us who work in AI and machine learning get very excited about this trend for deep neural networks, learning these features jointly and directly from the data. It’s really neat because it sidesteps this notion of having to contextualize feature hypotheses about some inputs, which is very powerful because it solves a big blocker in complex data... there actually are no experts who understand all of biology and can dig in and make these very detailed probabilistic calibrations. What you need is data, and you need some way you can understand what the predictions are and have intuition about what your model’s doing and how to make it better and how to interpret it. And those are all really important, but it starts to remove this requirement where one to know where and how to look at the data to find informative features and contextual relationships. 


To a deep learning model, looking at cats and dogs, you don’t have to explain what hair looks like, it learns this naturally as groupings of pixels (e.g., aligned edges). You don’t have to teach it what a single strand of hair looks like versus a patch of fur. Because it knows it can see lots of hair together and that’s another pattern, a pattern of patterns. And these things emerge directly from the data, when we have a way to generalize and capture the pattern or the patterns of patterns and use it to make predictions going forward. We can improve predictions little bit every time we see another example of data. That’s powerful. And that works for chest x-rays, and that works for cats versus dogs and that works for ‘omics data. 

What gets difficult about ‘omics data of the sort that we’re talking about in preclinical discovery efforts across pharma and biotech, the kind that PercayAI deals with regularly, is there’s no way for a person to look at this visually and some kind of intuition about it. There is no physical analog one can “see” for an interaction between genes, and biology is really complex.

In a natural image, your visual system is seeing patterns and you’re learning that these are objects in the real world. They move in a 3-dimensional space, they’re coherent, you see a cat running around, it’s legs move, but it’s not like it’s legs detach and jump around the yard, then suddenly the cat teleports and appears on the other side of the yard. That would be super weird, and you would immediately recognize something is wrong! That cat is a thing, and that thing moves coherently and we understand that it does this by using muscles and bones and these other parts of a cat. And cat can move through space, so if the cat is far away it looks smaller to us, but that’s because it’s far away, not because the cat got smaller. If a cat spins around and you can’t see it’s nose anymore, it doesn’t mean that the nose stopped existing, it just means that it’s obscured. If the cat moves under a tree and becomes shadowed, it doesn’t mean the cat suddenly became striped because there’s a pattern of light and dark being cast by the leaves or something like this, it just means there’s a shading difference, or a lighting difference. 

All these different variations, we as humans understand those variations are due to the environment. And as the cat moves around in that environment, things get projected upon it. As visual learners, we have been seeing and understanding these invariant patterns since our eyes started working as infants. The same thing actually happens with biology as well, but we don’t have any understanding whatsoever about the environment that these things are happening in because it is invisible to our senses. We have hypotheses, based on scientific observations, but we rely on abstractions and models to make sense of molecular observations.


We don’t see the space in which a gene exists. We see evidence of it through various sensors and readouts, but we actually can’t observe a gene as an object, moving around as it gets transcribed and translated, joining a pool of RNAs that are ultimately sequenced in an RNAseq or other expression experiment.



What we do see is patterns of dots that we can encode as an image, activation patterns in the experiment. The difficulty with this is if you just look at this 20,000 x 20,000 pattern of dots, the relationships are noisy, and they’re mixed throughout that 20,000 x 20,000 image, so it’s very difficult for our brains to see patterns in the data, compared to how easily we see objects moving coherently through space in the world around us.

It’s not noise when a cat walks under a tree and gets a striped pattern cast upon it, it’s a tree shadow, and we understand that...a cat plus the context of a shadow patternIn biology, we don’t know what the shadows are, metaphorically speaking. We see all these different things and we’re trying to figure out what are those contextual factors and how do we understand those in the context of biology when we don’t necessarily even know we’re tracking the right things, which are not obviously coherent.. 


In the previous answer, I talked about how predictive models can appear to have similar plumbing from one domain to another, but it can be difficult to generalize predictive performance into the new domain when the hierarchy of relationships. Analyzing multi-omics data is in many ways more difficult than analyzing radiology images because we don’t have an obvious physical hierarchy to assemble hypotheses. Radiology requires a lot of expertise, but radiologists understand the data generating process and the context of the patient who is being imaged: hart, lungs, tissue inside the lungs, what pneumonia looks like versus what cancer looks like, versus what mesothelioma looks like, etc. This is all conditioned on medical experts’ ability to link macro-level observations back to observations about features in the images. In contrast, for multiomics data, although there is a lot resolution to the data, understanding the biological context and generative processes at work can be very challenging. 

That’s why I am so enthusiastic about the combination of contextual models and big high-quality multiomics data. A vision model won’t work on gene expression data, but the underlying mechanisms we use to discover, test, and operationalize computation is going to be a massive improvement to how researchers get traction against the complexity of biology.


Thanks again to Dr. Dan Kuster of Cambrio for sharing your insights. In our final Q&A session, we’ll discuss what future developments in this space look like, and what scientists can expect from AI and ML in the future.


Have any thoughts? Want to learn more about our tools or partner with us? Want to schedule a demo? Send us a message below, we’d love to hear from you.


4220 Duncan Ave

Suite 201

St. Louis, Missouri



Contact us

© 2019 PercayAI  

  • LinkedIn - Black Circle
  • Twitter