Episode 1 Con AI Podcast Dr

Transcription

Dr. Jerry Mascioli, Chief Medical Officer of HHS Technology Group, introduces Dr. Jim Golden, a renowned quant researcher, for a discussion on the use of artificial intelligence (AI) in healthcare. They discuss the importance of data in research and how AI can enable the analysis of diverse healthcare data to generate medical hypotheses. Dr. Golden shares his background in using data analysis for various industries, including genomics and pandemic surveillance. He emphasizes the need to understand the effects of novel pathogens like COVID-19 and the surprises encountered when analyzing COVID-19 data. He also highlights the importance of improving public health data collection and analysis. Hey there, data detectives and pandemic puzzlers, this is Dr. Jerry Mascioli, Chief Medical Officer of HHS Technology Group, and I'm your host for this brain-bending journey into the fascinating world of numbers and science. Today, we're peeling back layers of COVID-19 with a true data superhero, Dr. Jim Golden, a world-renowned quant researcher whose mind unravels mysteries faster than a PCR test. Get ready to crack the code on this ever-evolving virus, one insightful number at a time. Before we dive deep, let me tell you about a game-changer in the healthcare landscape, Constellation AI. Imagine an all-inclusive, comprehensive medical plus database powered by the transformative capabilities of artificial intelligence and the cutting-edge technology of HTG. Yes, folks, we're talking about a powerhouse of medical knowledge, accessible like a friendly neighborhood search engine, yet backed by rigorous vetting and the expertise of ATG. Think of Constellation AI as your telescope into the data cosmos. Lightning-fast diagnosis, streamlined care, and empowered patients, that's the Constellation AI revolution. Keep it in mind as we chat with Dr. Jim Golden today, because the future of healthcare is all about harnessing the power of data, and Constellation AI will be leading the charge. Jim, let's jump right in. You're known for using cutting-edge data analysis to understand complex problems. How would using artificial intelligence like Constellation AI change your approach to research? That's a great question. And by the way, I love that I was writing your copy. I wish I could get them to work for me, because, you know, I've never felt so good about myself after that introduction. I wrote my copy. That's pretty good. You're going to like it, too. So let's talk about AI, right? So I, you know, you and I have had this conversation a few times. We've been in the AI business for a very, very long time, and it's not like this suddenly sprung from the forehead of Zeus, you know, in 2010 when we got deep learning networks. Really, I think, I mean, I think there's sort of two threads, right? So first of all, the reason why we suddenly have an opportunity to really leverage fundamentally developments in AI is, one, because we finally have enough data to make, you know, training sets that are worthwhile for building large models. And second, we finally have enough computing power that we can actually do those large models in some reasonable amount of time. You know, you and I began our careers probably around the same time, you in medicine and myself in math and engineering. But when we were doing, you know, sort of large neural networks back in the 90s, you know, we didn't have access to, you know, GPUs and NVIDIA chips. We were cranking those things on VAX 11780 machines, you know, trying to run all night hoping the software didn't crash. But so now that we have access to all of this very interesting data, it's how do we think about, one, what are the questions that we're trying to answer? Two, are those questions better enabled by access to diverse kinds of data? And third, what are those models we want to put in place to actually answer those questions? So like everything, I think it's being able to elucidate that question in a very rigorous sort of quantifiable way. What are we trying to get to? And we'll talk a little more about the project you and I are doing in Long COVID, which I think is very interesting. But then, you know, this ability now to actually mine copious amounts of healthcare data, whether that's clinical data, claims data, genomic data, lab data, you know, socioeconomic behavior data. I think now we finally have an opportunity to put everything in place where we can continuously run very diverse models, look for outliers in the data, try to understand health outcomes, try to better understand what are the causal KPI factors involved in an outcome. So I think really what AI gives to this place is scale. It gives us new insights. But it really allows us to ask and answer previously untenable questions using real world data in a way that can get us to generate medical hypotheses that physicians and scientists can follow up with. The future is just looking brighter and brighter sometimes to me in our constellation of data. Jim, you've got a really diverse background. How did you start crunching numbers to combat a pandemic? I mean, you've been in aerospace, you've been in finance, you've done a lot of things. What drew you to crunching pandemic numbers? Well, I like to think it's because I have an eclectic set of interests, but it's probably really more that I can't keep a job. I started out classically as a physicist, math edition engineer. I was military. So I, you know, I love the aerospace. Started doing my grad work really actually in vibrations and fluid mechanics. And then, you know, that was just post-Challenger disaster. So the space program kind of dried up, funding dried up. Fortunately, when I was in my PhD program, originally I was funded under a NASA contract and that kind of went away post-Challenger. But I was dating a girl from med school and she had just bought one of the very first DNA sequencing devices. And it was the Kluge analog device. It was even before ABI came along. This was really a sort of a four-color, fluorophore, early agarose gel device. And it really didn't work particularly well. So again, you know, those of those listening in who are younger than we are may not remember you actually had to pour these agarose gels, tap them down, get the bubbles out, load DNA in, put a charge through it. DNA would kind of move. But it was a mess. And so what I sort of figured at that time, we were looking at, could we find algorithms to better actually, you know, one, track the DNA that was moving through an agarose gel. Two, then actually read the bases that were moving through that agarose gel, you know, in the sequence. So this was just at the start of the Human Genome Program, or actually a little before the Human Genome Program. So most people at the time were using classical sort of number theory like Fourier transforms or trying to understand how to actually, you know, read an image the same way we were looking at satellite data or understanding, you know, other sort of photo images or x-rays. So we started thinking about could we actually use very early machine learning to track the DNA as it moved through the gel and in those bases. So that got me very much into early neural nets and using machine learning to try to come up with ways of, you know, actually reading the DNA that everybody was sequencing for the Human Genome Program. So that started my whole career down the path of biotech and genomics and, you know, later oncology and orphan disease and on and on and on. Long career working in a number of different biotechs, some successful, some not. A stint in consulting at both Accenture and PDBC where I ran both the data and an AI practice for healthcare. That got me really interested in moving beyond just the R&D and looking at genomic data and, you know, trying to understand why one group of patients did well on a drug, one group of patients didn't do well, how to design clinical trials. So, again, it was all about where can data take us to improve the potential outcome of therapeutic medicines for patients in need. A little sidestep on Wall Street where I was leading an AI company that was attached to a quantitative hedge fund. That was great because, again, they had access to some very, very, very smart people, a lot of compute and a lot of data. From there, the chairman of our advisory board at the hedge fund was also the chairman of the board of the Rockefeller Foundation, asked if I wanted to come over and start looking at pandemic surveillance. You know, we were right in the middle of COVID. COVID was a bad thing. I got more worried about what comes after COVID. So how do we actually, you know, do genomic surveillance around the planet, understand what the next set of viruses are, arboviruses or togoviruses or whatever that were going to be the next pandemic. That got me, again, becoming sort of a virus hunter. Again, more AI for viral hunting. I can bore your audience to tears talking about how we build AI systems for understanding protein motifs and viral replicases and polymerases. That's where things get really interesting biologically. But again, back to the interesting problem we're working on together. How do we understand what long COVID is? What are the effects of these novel pathogens in a human system where our immune systems are not used to recognizing these sort of viruses? What does that do from inflammatory response, auto coagulation, et cetera, et cetera? Like I said, I could go on for a very long and boring afternoon on this. No, that's fascinating. It's funny when you mentioned fast Fourier transformation, I recall in the early to mid 90s in neuro anesthesia when we were doing intracranial cases, we tried fast Fourier transformation of EEG to make sure that we were doing enough neuroprotective interventions to allow the patient to not just have a successful search outcome, but to also return with the same personality at the same time. It's pretty interesting. So, Jim, in this virus hunting and looking at COVID, what are some of the biggest surprises you've encountered while analyzing COVID data? And did any of these patterns defy expectations or make you rethink your assumptions? Yeah, a lot. And so what I want... So, yes, some in good ways and some in bad ways. One, when we think about viral hunting or pathogen hunting or COVID understanding, there are really different broad categories of data, right? First and most classically is just general public health data. One of the most interesting things to me, and this will be a little controversial, is how bad we are at public health. And when I say bad, I mean the CDC and the WHO. They had one job and they screwed it up royally, right? Because public health data collection isn't really a thing, you know? And I heard some stories when I was at the Rockefeller. I have some old hands in this space who, you know, one guy was telling me a story about when they were looking at one of the Ebola epidemics and impact in the 90s. They literally just had a kid on a bicycle and he would just go from village to village and they would take the sticky notes with the number of Ebola counts on it, right? And that was basically all they held. And I hate to tell you, not a lot has changed, right? So, yeah, I'm still seeing those kinds of things. And so as I got more and more into the global public health world, I became extremely unimpressed with how it's done. And again, I'm not a fan of the WHO. I think they really have screwed the cooch a lot and I haven't seen a lot of progress. So we're talking about genomic surveillance, right? So genomic surveillance is great. Genomics has come an awful long way. I've been really fortunate to work with folks like Chris Mason at Weill Cornell, one of the first, you know, and I had a call with him this morning. He's currently sequencing all the microbes on the International Space Station. And so he's got about 3,000 samples that he's started sequencing and we're trying to get access to that data because what I want to understand is how pathogens evolve in the face of cosmic radiation because, again, what I'm interested in is how do viruses evolve? Especially in light of things like climate change. So, you know, first kind of lesson was that public health isn't great. And I think it's not great because they don't have access to the people who do public health, who do heroic work, don't have access to the right tools. So one of the first things I did at Rockefeller is I went to a really interesting summit at the White House and I said, OK, what we need to do is build a modern data stack for public health. And the members of the White House pandemic team said, no one will know what you actually mean by that. I'm like, what do you mean no one will actually know what I mean by that? They said, you know, when you talk about a modern data stack where we, you know, use some of the best in class software like Snowflake and Databricks and some of the things you guys are building at Constellation, you know, nobody in public health uses that stuff. They basically, if they're lucky, they have a laptop. Right? And if you read Michael Lewis's book about the whole thing, that was really what put me over the edge. So how do we actually build tools that public health doctors, nurses, you know, NPHs can actually use to one, interact with each other, two, share data with each other, three, actually mine that data to look for interesting novel insights. So that was one thing that was very surprising to me. A good surprise is over the past six months, I think, you know, I've been, we, I set up a new synthetic biology company a couple months ago and we've been mining a lot of the public data that has been coming out of, you know, places like Sub-Saharan Africa, Brazil, where we're actually seeing climate change, human animal migration actually leads to zoonotic spillover. And again, I'm one of those people that still believes that COVID was probably a pangolin or a bat virus that jumped to humans, probably through a wet market, but really understanding what's there. And as we've been mining that data, we've been finding some super interesting things, right? We've been finding that many of the viruses that we thought had evolutionary conserved regions in them, the regions were not so conserved. We're actually looking through the data assets, the genomic data assets, like SRA and stuff with NCBI, and we're not finding the numbers of pathogens we thought should be there. And so one of the main things that I've been working on for the past few weeks is we think some of the search algorithms are incorrect that we've been using for the past 34 years, especially things like BLAST. When you look for these, you know, viral sequences, the sequences that we're finding are incorrect because they really sort of, you know, the way the model replicates works. And I'm not a virologist, but I'm very fortunate to work with some great ones. The genomic sequence is, you know, showing up one way, but the actual protein transformation and the structure and folds are entirely different. So when we do a sample, do the sequence, load the sequence in a database, and then go search for the virus, you don't find it. So those sneaky little bastards are actually hiding themselves much better through polymerases and replicases. And so we've started working on a new AI model where we use protein structure as a token for training a gen AI. And when we started doing that, we started finding a lot more viruses where we didn't think any existed. So that's been one of the funnest things. Long-winded monologue. Let me kind of sum it up. Public health data is poorer than I thought. Molecular biology is richer than I thought. How do we figure out how to put this together using data and analytics? Then I think we can actually make whole products. I would agree with you that Michael Lewis's book, The Premonition, is a fascinating but deeply disturbing read on many levels. And we'll have to save the WHO and CDC conversation to a steakhouse and a good bottle of Brunello. I've got some really good friends at the CDC. Again, Dylan George, who now runs the Health Outcomes, is a great guy. Really good guy. Comes from Ginkgo and other places. I mean, there's some really good people there. I'm not exactly sure what happened. I'm not a policy guy. But oh my God, did we screw something up. Yeah. So, Jim, let's talk about the thing that we're starting to work on now, what I think is the most pressing question out there. Long COVID. Why don't you describe to our listeners what we're looking for and how we're proceeding. Because in my mind, this is not just a medical issue, but this is a societal issue. The economic impact of long COVID on the United States could, in my mind, not to be hyperbolic, create a near depression and a lack of productivity by a large number of people and a lack of productivity by people who have to care for those who have truly symptomatic, not long COVID. Love to hear your thoughts around that. Yeah. And you and I have had a lot of very animated conversations about this, which is, I mean, it's still unresolved, which is both good and something we need to make progress on. So, what is long COVID? So, you know, I'm an engineer, so I'm pretty reductionist about things. So, you get human beings in all their various shapes, forms, and sizes around the planet, different genetic makeups, different immunological exposure, different antigens and antibodies, and we get exposed to this fairly novel virus. I mean, coronavirus has been around for a while, but so we get exposed to this new virus and things happen to us, right? And again, this virus is mutating from, you know, Delta through Alpha through Omicron through JM1, BA2, you know, so the virus has continued to move and adapt. And we're seeing various different responses. Some people, you know, got COVID and, you know, were a little sick for a while and recovered perfectly fine. Others got COVID and had all sorts of, you know, after effects. So, I think the formal definition, I think, that you and I, when we go look at ICD-9 and 10 codes, is really anyone who's had COVID for more than, what is it, like 12 weeks or something like that? I don't know. I've seen 12 to 14 weeks. So, what does that mean? Well, you know, does it mean they test positive for the virus? We know that people now, in fact, what did I just see? There was a new CDC issue today that said if you don't have a fever, you can go to work even if you're COVID positive, right? So, what does that mean? So, do you have subviral reservoirs? Are you shedding dead virus that's making the antigen test pick up? Are you still containing and producing antibodies? So, long COVID for me sort of comes into two big buckets. And I think we have to look at at least two buckets. One is people that simply can't lose the virus, right? Very much like HIV patients, hepatitis C patients, et cetera. People who have the virus and do not clear, the body does not clear the virus. Second, people who have had an extreme immune response. And if you remember in the early days as a physician, you definitely remember this, the cytokine storm, organ failure, right? All kinds of secondary, you know. And then, I think the Delta variant was more sort of deep chest where Omicron is a little more larynx and JM1 is more perimasal. So, you know, we got people that have the virus and don't shed it. And then we got other people that just had extreme immune responses. So, you know, anybody who's dealt with autoimmune disease, you know, when your immune system kicks in, you know, that can go on for quite a while. You know, I've got a type 1 diabetic daughter. And so, you know, I've spent a lot of time over the last 10 years learning about, you know, what that means to have an immune response. And again, there's a genetic component. There was an infection component. So, long COVID is those who are having, to me, severe damage from having been exposed to COVID. And as you know, and as our friends at HCG know, that can lead to renal damage. It can lead to ocular damage. It can lead to early onset dementia. So, for me, when I started maybe a year ago thinking about where did we want to generate novel hypotheses in long COVID that we could go after, I just started looking through the literature. And it was basically tied to organ systems that had high, very dense capillaries that had been damaged primarily by, you know, a swarm of white blood cells in response to an infection. So, you know, that could lead to forebrain damage. You know, there's a lot of theories about people who lost taste and smell. Is that actually something about, you know, the nose and mouth? Or is it actually brain damage in that section, you know, that may come from small capillary bursting? That's a very important question. Kidney is, you know, a big focus for the work HCG is working on, you know, renal damage based on having COVID more than once. Maybe vaccinated, maybe not. Maybe boosted, maybe not. What does that mean for the future of dialysis? So, when I think about long COVID, I think about a bunch of different things that are important for us to work on. One is, what is the burden of disease and related disease, right? How many more dialysis centers do I need? Are we going to need more memory care centers when people begin having early onset dementia in their 50s because they had, you know, two encounters with the COVID virus? And then those who can't clear the virus, what does it mean for antiviral therapy? So, you know, whether it's Paxilovir, or, you know, which is, I think, a protease inhibitor. If I look at, you know, a nucleoside analog, what are the drugs that are needed to actually help us clear that virus? But what I'm really interested in, what are the long-term immunological effects? What is the likelihood of setting off other immunological diseases like diabetes, MS, arthritis? You know, those are things I'm really interested in. And that's the thing, that's the work I think working with Conservation AI is going to be really cool because we have access to all of that data, both the lab, you know, and the players' data. So I think we can make a lot of progress there. I couldn't agree more. Jim, some of these problems seem incredibly daunting. And we've got better tools now to get through the mountains of data. You know, for our listeners out there who are younger data scientists, how do you stay motivated in the face of all these challenges? And what advice would you give to aspiring data scientists? So, it's a great question. I've got a really good friend whose son is a sophomore at Swarthmore, and he is a brilliant computer science student. We talk a lot about, you know, what does he want to do with a CS math degree? You know, statistics, or is it finance? Is it actually data science? Probably he's going to end up doing video games because he's so good at it. But, you know, I've thought about this question a lot. One, I think, one motivation is just now that we have data to compute really hard problems, what could be more fun to possibly work on? Right? What could be more important than thinking about how we use computation, which is allowing almost as fast as a virus, to tackle these large societal problems? For me, you know, having been doing AI since the 80s, really the essence of artificial intelligence is a system that generates a hypothesis that you can go and test. Right? So, especially in healthcare, AI is not a magic bullet. It's not a black box. It's an ability to take a large amount of data, generate several defensible hypotheses, hand those to a physician like yourself, and say, which one of these makes the most sense based on your lived experience and observational data? So, I think, I don't think of them as large intractable problems. I think of them as large, really fun problems with high impact, and we finally have enough tools that we can experiment and do some really crazy stuff. I just, you know, on Friday, I went and picked up my first NVIDIA GPU laptop. So, I had to get a custom build. It'll be here next week. First time I've ever had, you know, a quad core GPU on a laptop. So, either I'll spend the weekend doing Bitcoin mining, or I'll actually be able to run some of these protein folding AI's. But, I mean, what could be better? Right? You know? If I had the power of my generator to get, you know, not have the house fired up. But, I think it's just the best possible time. And I think it's also because of the economics of decreasing, you know, computer access to data, anybody can make an amazing contribution. Right? Anybody can make an amazing contribution. There are great teams being formed. Individuals are doing amazing stuff. There's data everywhere. You can build your own LLM, you know, using open source stuff. Python is a great, you know, language, pretty easy to code in. Everybody's kind of working together. I think it's the best possible time to be doing this stuff. That's a very embracing approach for young data scientists. A couple of quick rapid fire questions, Jim, before we wrap up. Which was our favorite data set you've worked with, and why? I bought a large collection of EMR data from a Catholic charity hospital group, EMR. And we were using it to predict the outcome of clinical trials, to trade financial instruments, so we could do the outcome of a trial before a pharma did. That was my favorite thing. I predicted fail, drug, and short Pfizer before Pfizer knew what was happening. So, that was something I had done in the past. That was fun. So, you're doing in silico research as well then. It was, you know, it's probably not the most, you know, probably I'd take some flack for that as a use case. But, you know, I did a project a few years ago, literally called, What Does Wall Street Know About Pfizer That Pfizer Does Know About Pfizer? And the answer was, a lot. That was when I realized that my, you know, quant hedge fund brothers were probably a lot more sophisticated than anybody who was currently, you know, big staff in a large pharma company. So, that was interesting. Too funny. One last rapid fire one before we close it out. In the perfect world, unlimited resources, unlimited everything, what's your dream research project? Eric Schmidt talks a lot. He's Schmidt Futures, which is a phenomenally interesting organization and also populated with some brilliant, brilliant people. He talked about building a global DNA observatory. And so, what he meant is, you know, can we locate genomic sort of sequencing centers or genomic collection sequencing centers around the planet to actually build a true map of life. And when I think about, you know, extremophiles, thermophiles, interesting, evolving, pathogens, non-pathogens, replicases, polymerases, you know, basically just sequencing the planet and really getting a good handle on the tree of life because evolution is the most amazing, mischievous designer that we've ever encountered. That sounds like a great project. That's all we have time for today, everyone. Jim, thank you so much for giving us your time today and your insights. Remember, gang, the fight against COVID-19 isn't over. We need more data quants like Dr. Goldman. Brilliant minds to translate numbers into actionable solutions. So, if you're a numbers whiz with a passion for making a difference, get out there and crunch some data. And if you enjoyed this episode, don't forget to rate us five stars and share it with your fellow data enthusiasts. Until next time, stay safe, stay curious, and keep on crunching. Thanks, Jerry.

AI Mastering

MORE INFO

Featured in

Listen Next

Other Creators

AI Mastering

MORE INFO

Other Creators

Featured in

Listen Next

Other Creators

Transcription

Other Creators