Welcome to The Operative Word, a podcast brought to you by the Journal of the American College of Surgeons. I'm Dr. Jamie Coleman, and throughout this series, Dr. Dante Yeh and I will speak with recently published authors about the motivation behind their latest research and the clinical implications it has for the practicing surgeon. The opinions expressed in this podcast are those of the participants and not necessarily that of the American College of Surgeons.
Welcome to The Operative Word, a podcast from the Journal of the American College of Surgeons. I'm Dr. Dante Yeh, one of your co-hosts for this series. In this episode we will be taking an in-depth look into the current article, "Quantifying the Prognostic Value of Preoperative Surgeon
Comparing Surgeon Intuition and Clinical Risk Prediction as Derived from the American College of Surgeons NSQIP Risk Calculator." I'm honored to be joined by the first author, Jayson Marwaha, MD, MBI and senior author Gabriel Brat, MD, mPH. from Beth Israel Deaconess Medical Center. Doctors Marwaha and Brat, thank you for joining me today. Before we begin, do you have any potential conflicts of interest to disclose?
I have one brief disclosure, and it's that during the course of the study, I was funded by a NIH National Library of Medicine t 15 trainee Grant. Just I'd like to disclose that I have ongoing funding from the National Institutes of Health. Great. Thank you very much. Dr. Marwaha, can you give us a brief summary of your study design and describe to us your main findings? Yes, absolutely. And Dr Yeh, thanks so much for having us on the podcast. We're both really honored to be here.
So, again, as you mentioned, the title of our study was "Quantifying the Prognostic Value of Preoperative Surgery Intuition." It was published a few months ago in the Journal of the American College of Surgeons. And the sort of objective of our study was to sort of get a, you know, build out a quantitative understanding of surgeon intuition, particularly as it relates to the surgeon's ability to predict what will happen to their patients after surgery.
And the the motivation or the inspiration for the study actually came largely from the NSQIP risk calculator developed by the American College of Surgeons that we consult sort of very routinely when making clinical decisions about patients.
And I'm sure many of our listeners are familiar with the prediction page, the prediction screen, rather, on the NSQIP risk calculator, where in the bottom right of a screen there is a little widget called the Surgeon Adjustment of Risks widget, and that essentially is the dropdown menu that allows a surgeon's intuition to modify the output of the risk calculator.
So if, say you're, for example, you're sitting in front of a patient and you want to calculate and you want to sort of estimate their their risks of postoperative morbidity and mortality, and your intuition sort of tells you that the patient's risks of postoperative morbidity and mortality are somewhat higher or significantly higher than what the risk calculator predicts based on the quantitative data that you give it, then you can modify its output accordingly.
And you know, the reason why this this sort of function in the risk calculator served as a source of inspiration is because we wondered, you know, is there a sort of, more, is there a scientific or precise or quantitative way to actually adjust for surgeon intuition when predicting what will happen to patients after surgery?
And what we found and when we sort of did a deep dive on how this surgeon adjustment of risk function works, is that it actually sort of to some degree, somewhat arbitrarily adjusts for for surgeon intuition. So when you lift up the hood and you look under the hood of the risk calculator, it actually just bumps up to the patient's risk of post-operative complications by one standard deviation.
Now, is one standard deviation of risk truly representative of a or a you know, does that truly sort of properly adjust for how a surgeon thinks about a patient? Unclear. And so that's that's sort of the question we sought out to ask is, is is there a way to precisely adjust for surgeon intuition when predicting what will happen to patients after surgery? And so in order to do that, we conducted a retrospective cohort study and we collected two sources of data.
One was our, was NSQIP data from our institutional registry at the Beth Israel Deaconess Medical Center in Boston, Massachusetts. So over the course of a number of years, we collect we've collected retrospective data on NSQIP variables for patients undergoing surgery at our institution. That was sort of once one of our data sets. The other source of data was prospectively collected, and that was actually surgeon intuition data.
And the way we measured that is for those same patients that we were collecting NSQIP data on, we also sent a text message, one question questionnaire to to surgeons at our institution right before they were about to operate on that patient. And we essentially asked them the same question that the NSQIP risk calculator surgeon adjustment of risks widget asks, which is, "how would you estimate to this patient's risk of morbidity, postoperative risk of morbidity and mortality?
Would you say it's about average for patients undergoing this procedure? Would you say it's significant? Would you say it's higher than average or would you say it's a lower than average?" And based on their response to that one question, we sort of got a rough understanding of what their intuition was about this particular patient.
And then once we had those two collected, once we had collected those two sources of data, the retrospective NSQIP data and the prospective intuition data, we combined them in various combinations to train various models, to train various prediction models on postoperative outcomes for these patients.
And our main findings were that when we, unsurprisingly, when we trained a logistic regression model on NSQIP variables alone, we developed a model that had predictive performance roughly similar to that of the original NSQIP risk calculator as reported when it was originally developed by the American College of Surgeons, so the AUC, or area under the curve, which is a measure of predictive performance was 0.83, which is roughly similar to the 0.82 predictive performance that
NSQIP Risk Calculator, that the NSQIP risk calculator reports. Now, when we train a model to predict surgeon to predict patient postoperative outcomes based on surgeon intuition alone, without any NSQIP data, we got to a, we found that it was a independent risk factor of patient postoperative morbidity and mortality. So we found that the area under the curve was about a 0.7.
And what's interesting about those two results is that, you know, the we found that to be, that the NSQIP risk calculator does, sorry the NSQIP data alone model predicted predicted postoperative outcomes statistically significantly better than the model trained on surgeon intuition alone. And then the third model we built was one that combined the two sources of data. So we brought together the NSQIP risk calculator variables and combine that with our surgeon intuition variables.
And interestingly, we found that the predictive performance of this sort of combined model that included both quantitative data about the patient as well as the surgeon's intuition actually did no better than the model that predicted than the model that was trained on the NSQIP data alone. And so those were our main findings. And we'll sort of go into implications and, you know, our sort of takeaways from the study from there. Great. Thank you.
All right. So I'm going to, I'm going to try and summarize this as best I can understand. So you you tried to developed a model to predict postoperative complications, and then you compared them to surgeon intuition, as predicted before the operation began. And what you found was that surgeons were okay, right an AUC of like 0.7, which is not bad, not great, but not bad.
The model that you predict that you created did better than the surgeons, and it didn't do any better when adding in the surgeons' intuition. So, okay, so actually I'd like to ask you the primary outcome or the endpoint of complications after the operation. This was a composite. Right. Do you have a list? Can you tell us what what sort of complications you were looking for? So the the overall we basically use the NSQIP outcome
list. So we use both the mortality and morbidity outcome complication list that is used for the NSQIP data. So functionally what we did is we collected the NSQIP outcome information for all the patients that were enrolled in the study. All right. Got it. And so I have it in front of me, it looks like.
We are looking at examples, for example, superficial surgical site infection, deep incisional surgical site infection, organ space, wound disruption, pneumonia, unplanned intubation, pulmonary embolism, mechanical ventilation requirement greater than 48 hours, and a whole slew of others, like a bunch of others. I was thinking about something while you were describing the study, Dr. Marwaha.
So. I'm asked as the surgeon, is this patient average risk, greater than average risk, or lower than average risk? And I'm trying to think in my mind, like, when would I ever say, oh, this is a lower than average risk patient? How how common was that in your study for a surgeon to to say, Oh, yeah, this is going to be super easy and you know, this this patient has lower than average risk. Yeah. So in our data, we found that the most common response across all respondents was was average risk.
So about 45% of respondents when, when, when responding to our questionnaire preoperatively found that the patient was average risk when compared to other patients undergoing this procedure. The second most common was higher than average risk, and that represented 40% of response of responses. And then as your sort of intuition correctly predicted, the least common response was lower than average risk.
About 15% of respondents said that the patient sitting in front of them that they were about to operate on was lower than average risk for for morbidity and mortality. Now, some element of why this sort of pattern emerges is probably because if I if I sort of had to posit an explanation is probably because the majority of patients in our dataset were collected from the Emergency General Surgery Service at Beth Israel Deaconess Medical Center, where patients generally are quite critically ill.
I'd imagine that as, and we're working on this now, but as we expand to other specialties and other pathologies, the distribution of responses may may change. And also maybe the surgeon intuition may be different if your sample was predominantly a certain subset of surgeons with a certain shared mindset. You know, you may find that elective endocrine surgeons may be better or worse than than acute care surgeons.
I think that's actually a really great point is that one of the clear characteristics of this study is that it was heavy on EGS/ACS surgeons who see a certain type of patient who is at high risk for having a post-surgical complication and other specialties may have different ways of evaluating risk. That work that was not captured as part of the study. I completely agree with that. You also enroll, you also surveyed surgeons at various stages of of experience.
Can you can you tell us a little bit more about that? Yeah, absolutely. So that's that's another very interesting sort of aspect of our study is on subgroup. One of the subgroup analysis that we performed is we examined how the intuition of attending surgeons compared to the that compared to the intuition of resident surgeons when they were responding to this questionnaire.
And so we to to, to sort of get a better understanding of that, we built two separate models, one that used attending surgeon intuition only to predict postoperative outcomes and a similar one that used resident surgeon intuition only to predict postoperative outcomes. And we actually did find that the attending surgeon only model did significantly outperform the resident model at predicting whether this patient would experience any postoperative morbidity and mortality.
Now, while that was an interesting finding, that sort of the sort of next step in our analysis was to see if the the combined model that included intuition and NSQIP data did significantly improve when attending-only intuition was incorporated and unfortunately it didn't. So a model that included that was trained on both NSQIP data as well as attending surgeon intuition data only still did not outperform a model that was trained on NSQIP data alone.
Great. Thank you. So in reading your manuscript, I was intrigued by a comment that you made in the introduction. You mentioned how several common cognitive biases and counterproductive heuristics affect clinical decision making. Can you give us some examples? In terms of kind of an overall kind of push for this study.
There is an increasing amount of data, especially in the last few years, that the way in which we make decisions in some cases can be affected by information that is external to the patient. So instead of taking every patient individually, what we do is we fall back on experiences we had previously and we use those to make decisions. And sometimes that can actually lead us down in a poor direction. So one great example of this is something called recency bias.
So there is a really nice paper that was published in Science a couple of years ago that showed that the likelihood of having a C-section was related to whether a OBGYN had a complication with the previous patient. And so what we all know this to be true is that our recent experiences affect our ability to be objective in the way that we evaluate any given patient. So we are there are multitudes of examples as one being recency bias where we're not using the information fully that's in front of
us. We're looking at where we're affected by our experiences in the past. Yeah. Thank you. Thank you for that example. I definitely recognize when I am affected by recency bias, and I'm sure that affects all of us. Jayson, do you want to mention another great example, which I think we can just if because I think this is such a important aspect of understanding that if you're okay with it, that we, we, we sometimes overlook these things that we all know inherently.
And yet when they're made explicit, they seem really clear to us. And so, Jayson, do you have another example you want to bring up? Yeah, absolutely. I think there are lots of really interesting examples. And, Dr Brat, your example of recency bias is one that is, I think, very relatable to to surgeons and surgical decision making.
To take a step back, so when we were when we were first starting out this study and reading up on, you know, other people's work on on intuition more broadly, we found that intuition, as Dr. Brat just just sort of briefly mentioned, that human intuition more broadly, not specific to surgeons, one one big element of how we make sense of the world is these cognitive biases and heuristics.
And essentially what they are are mental shortcuts that help us grapple with large amounts of data when we're confronted with it. And sometimes those shortcuts are very helpful. Sometimes they're they can be counterproductive. And, you know, they're not always sort of counterproductive as the recency bias example seems to suggest. You know, one very useful example of where heuristics are incredibly valuable are amongst trauma surgeons.
So when you're in a data poor environment such as a trauma activation, and you fall back on common patterns and sort of you need to fall back on it, on quick decision making, then these types of mental shortcuts can be incredibly valuable and in some cases lifesaving. However, there are lots of documented examples that other people have studied in other fields where these kinds of mental shortcuts can potentially be counterproductive.
And one additional example, in addition to the recency bias, one that Dr. Brat just mentioned with obstetricians is this idea of left digit bias. And there was a really beautiful example of this that was published in the New England Journal of Medicine a couple of years ago. And what they sort of wanted to expose was the was was was cognitive biases in surgical decision making for patient selection for patients undergoing a CABG.
And what they found is that that's the the numerical age of the patient plays a disproportionately large has a disproportionately large influence on patient selection for recommendation to undergo a CABG versus not undergo a CABG. So to add a little bit more detail to that, what they found was that patients that were 79, but just about two weeks away from turning 80 years old, were significantly more likely to be recommended to undergo a CABG than patients who had just turned 80 two weeks ago.
So this idea of being in your seventies or being in your eighties, even though these two cohorts of patients were only four weeks apart in age and therefore probably not very physiologically different, played an undue, uh, role in a surgeon's ultimate recommendation to undergo the surgery or not.
And so that's another example of how these sort of mental shortcuts that we fall back on when confronted with lots of data, can sometimes influence how we make decisions, even if they're not the most, even if we're not making decisions based on the most relevant pieces of information about the patient sitting in front of us. Yeah, I think the marketing people have that all figured out, which is why I go buy the avocado for $4.99, but not when it's $5. Right. Exactly.
Great. Well, thanks. Thanks for that wonderful example. Yeah. I can see now that that we're plagued by biases all around. And something like this is this risk calculator is very useful to us to help overcome that. I have a question about the methods, and so I know a little bit about statistics, but there was a unfamiliar term that I encountered in your methods. So can you explain to me what is a multivariate lasso regression and how does this differ from a logistic or a linear regression?
So the idea here is in certain situations we have a lot of variables and not a huge number of samples. So in this case, we we had to basically collect this data prospectively from colleagues across the medical center. And so it was very difficult to get a large dataset, but yet we have all the variables that exist in NSQIP that we can apply to this, to this analysis.
And in those scenarios where you have not a huge number of samples and a large number of variables, you have to have a way to basically filter through the relevant samples. And so lasso is a form of penalized logistic regression where essentially it allows us to better isolate the subset of most important and relevant variables that are related to the outcome without dramatically overfitting the model. So yeah, actually, I, I encounter this a lot when I read papers.
And you know what little statistics knowledge I have, I seem to remember that if you don't have enough events per variable, then then you are at risk of what we say overfitting the model, meaning that your model is only specific to your particular sample and it's less generalizable outside.
So. So whenever I read a paper where they said, Oh, well, we looked at these 30 variables and plugged them into our regression analysis, and yet there's only like a total of, for example, 50 or 60 events, then that makes me really, really worried that the results are not generalizable. So you're saying that lasso is a way that we can sort of prune through all of these potential candidate variables and focus on the ones that are the highest yield?
That's right. So it really is a method that has more applicability when you just have a very large number of dimensions, in other words, with a very large number of variables and you don't have a huge number of outcomes so that you aren't so that you get to the smallest number of variables that are still representing all the variance and all the effect that exists in in the model. Great. Thank you. Well, I'm going to start using lasso on all of my regressions from now on.
So there are obviously, there are downsides to using any statistical methods. There are alternatives that don't have some of the downsides of lasso. The thing about lasso and you don't necessarily need to go into deep into detail, but the downsides of these methods are that they, for example, might discard one variable versus another if the two of them have equivalent value.
And so if you're trying to if your goal is to identify all the variables that are relevant, lasso may not be the most valuable, but if your goal is to just understand what the performance of the model will be without overfitting the model, then then lasso is an effective way of doing it. Great. Thank you. So I want to ask you, how do you envision the results of your study being applied clinically by future surgeons?
And if it's not intended to be used by surgeons in the clinic or at the bedside, how do you intended to be used? Thanks. Yeah, I think that's a really important question. So so when we sort of took a step back and and thought about the results of our study, the I think one of the most important takeaways is that um, that uh, you know, particularly in the finding where when you combine the intuition data and then the NSQIP data, you find that it's no better than the NSQIP data alone.
I think one of the biggest sort of clinical, clinically actionable takeaways from this finding is that in the national NSQIP risk calculator, as far as as far as it relates to the surgeon adjustment of risks, I think our study seems to suggest that there is actually no statistical value to that surgeon adjustment, surgeon adjustment of risks adjustment or surgeon adjustment of risks tool.
Maybe now maybe there is some psychological value such as, you know, better surgeon buy in when they're able to sort of, you know, exert some influence over what to the models output is. But, you know, I think our study seems to suggest that there is minimal to no statistical value to that actual adjustment. And so what does that mean then, clinically about surgeons when they when they're making decisions about their patients?
And I think one of the most important clinical implications of this is that quite simply, the risk, the NSQIP risk calculator is a very, very powerful tool that we should be consulting often to augment our decision making, as far as it relates to patient patients that that that we're taking to the operating room. And this shouldn't be viewed as a tool that supplants our decision making in any form.
It, you know, instead should be viewed as a tool, again, that that should be that can be used to augment our decision making and sort of essentially free us to use our judgment in scenarios where our intuition is more powerful. Dr. Brat, any thoughts? Any additional thoughts? Yeah, I think Jayson addressed a really important element here, which is the fact that if we're we, we need to think of these tools as what they are, which is tools.
And there are scenarios where we've been comfortable adopting tools because we know they make our lives better so we don't divide our tissue using our fingers anymore. We use a pair of scissors because we recognize that that tool is better at that specific task.
But we have difficulty applying that same kind of methodology to digital health solutions because we believe that somehow it's undermining our ability to do the thing that we're really good at, which is to think about our patients and make difficult decisions about them. And the reality is that these tools don't do that at all.
They help us make better decisions and apply our skills, our judgment and our understanding in the scenarios that really matter and then allow these algorithms to do what they do really well, which is take a huge amount of data and organize it into patterns.
So I think the implications for this work generally is that at least in the preoperative period, we should continue to use surgical risk calculators and and really they have significant value in that respect because they're more likely to be accurate, certainly for the populations that we described that we've evaluated in this study.
And then the one other thing that I would say is I think this work kind of naturally leads to the next question, which is where does surgeon intuition matter the most? And that's work that we're currently looking at, which is not only are we looking at differences in specialties, but are there different patient populations that are better served by surgeon intuition over a risk calculator?
And then what we've also been looking at, which I think is is has significant impact is the is the analysis of what what date what patients what is the risk of a patient after the surgeon has been inside of their belly. So does a surgeon have an increased understanding of the risk profile of a patient once they've actually operated? And certainly our preliminary results suggest that that's the case.
So I think that overall, to answer your question, the important the importance of this type of work is to both understand the value of these digital health tools and understand their use cases, and then be confident to say that the use of these tools doesn't undermine our ability as surgeons. It actually augments it. Great. Thank you very much. I really appreciate the time spent speaking with doctors Marwaha and Brat today.
I encourage everyone to read this excellent paper, which was first published online several months ago but is now in print in the June 2023 issue of the Journal of the American College of Surgeons. Thank you for listening to The Operative Word. Please send us any feedback and postmaster@FACS.org. Thank you for listening to the Journal of the American College of Surgeons Operative Word Podcast.
If you've enjoyed today's episode, spread the word on social media by using the hashtag #JACSOperativeWord. Subscribe to The Operative Word wherever podcasts are available or listen on the American College of Surgeons website at FACS.org/Podcast.
