¶ Intro / Opening
Chat with Traders, Collaboration with Quantopian, Episode 1.
¶ Introduction to Quantopian Series
Okay, what's up guys? Uh if this is your first time listening to Chatwith Traders podcast, my name is Aaron Firefield. So welcome and big thanks for tuning in. What you're listening to right now is not a regular episode. This is a special six part mini-series in collaboration with Quantopian. Uh for those unfamiliar with Quantopian, Quantopian is a crowdsourced investment firm who provide pretty much anyone with an internet connection to data.
a research environment and a development platform to write their own algorithmic strategies. Select authors then may also license their strategies to Quantopian and get paid based on performance. Now the overarching theme for this mini-series is a look into the workflow of a professional quant trading firm. I will say I would consider some of the topics discussed to be quite advanced, but if you're into this sort of thing, I think you'll really enjoy it.
If not, then that's also totally fine. But I feel like this first episode at least has relevance to traders and investors of all styles. Because this episode to kick things off is titled You Don't Know How Wrong You Are And we specifically discuss numerous biases and the many ways that you can actually be wrong without knowing it.
Anyway, I want to keep this intro short and sweet because you'll hear more about what to expect from this podcast series very soon. And I still want to mention a couple of things that are important to address. So first thing, to find any of the links and resources mentioned throughout this series, you can go to Quantopian.com forward slash chat with traders to find everything there all in one place. Uh I'll also have the same on the Chatwith Traders website, so whatever is easiest for you.
Uh second, like I said, after this first episode, some of the topics get a little complex at times. So if you want to ask something, you can go to Quantopian.com forward slash questions. and submit your questions there. That way we can collect them all in one place and we will do our very best to answer as many as possible on the sixth episode, which is actually dedicated to QA. And thirdly, this series is sponsored by Data Camp.
So if you're keen to learn a programming language or further develop your programming skills, I really encourage you to check out datacamp.com. I've done a bunch of their Python online courses and they're brilliant. They also have many courses teaching R, all of which are centered around data science, stats and finance. So visit datacamp.com and create a free account to get started. Just repeat, that's datacamp.com.
Check them out. Now, without any further ado, please welcome Delaney from the academic arm of Quantopian. Let's get into it. Alright, so Delaney, what's up buddy? How you doing? Uh not too much. I mean I would I say that semi dishonestly'cause things are as usual incredibly busy here. But uh nothing too much this evening. I've just got this and then I'm I'm heading home.
Good stuff. Yeah, yeah. I can imagine. I can imagine. You've got QuantCon coming up too, so I imagine um that's just amplified things. Yes, well thank you for giving me the opportunity to right off the bat start uh start pitching this. Uh but yeah, we have a Quanton Singapore coming up. It's the first time we've done it, not in New York City.
Um so we're excited about that. It's gonna be in November, I think on the tenth or eleventh. We're there's a few days of events, but the URL is quantcon.sg in case anybody wants to check it out. Good stuff. Yeah, I'm really keen to make it along. I don't know if I'll make it out there to the Singapore event, but if not, uh definitely the next New York one or something. I'm I'm really keen to make it along. But anyway, man, I'm really excited to be doing this. So just to set the scene here.
Give us a rundown on who you are, your background, and tell us a little bit about Quantopian too. Sure. So uh I'm Delaney. My last name is long and complicated. So we're you know, it's it's out there on the internet. You can find it if you want to. Mostly I just go by Delaney. Um my background personally is in uh computer science, uh statistics and and math, kind of you know, at the intersection of all those three. And
I joined Quantopian uh right out of school actually, and I currently run uh the academic arm of the business. Uh namely uh uh kind of what Quantopian does is
¶ Quantopian's Mission to Democratize Finance
We try to democratize quantitative finance. Quantitative finance is one of the fields in which the barriers to entry are just amazingly high. I think that Some of them are so high that you don't even know how high they are. A lot of people, um, you know, kind of will sit there and they'll say, Oh, well, I have this great idea for how I could trade stocks But they there's just no sense of the path that you would go through to actually get that from idea.
into real implementation. So what Quantopian does is we try to provide as many of the tools and data sets that banks and hedge funds would have access to as possible and provide them to everybody in the world. uh regardless of background or or geography, and uh make it so that anybody has a shot to actually try to implement some of these ideas. Personally, I actually didn't know um much finance before starting at Quantopian. My background again is in kind of uh computer science and stats and um
I was so fascinated by what you could do with this stuff when I joined Quantopian, um, that that helped me realize that a lot of people do come to our platform to learn. And uh almost accidentally, um uh I uh ran into a professor at Harvard, um actually got introduced to him by someone else who was taking his course, but I ran into him and uh basically
discovered that he was using one of the same technologies that Quantopian uses internally to teach his course, namely these Jupyter notebooks. And they're such a powerful teaching tool. Um and what we said is, hey, well why don't we use the same technology and try to teach quantitative finance to our users? Because, you know, it's one thing if you have a great idea already and you're educated, but
Another barrier to entry is that nearly nowhere teaches quantitative finance. Um and since we want people to be uh, you know, writing great strategies on our platform, we want people to be educated in in doing that. So we've actually created uh kind of in through the process of working with a lot of professors at schools, including uh Harvard, Princeton, MIT.
uh MIT Sloan specifically, Cornell, um, and many others that I don't have you know time to name, uh, we developed the the the Quantopian lecture series and that's where honestly kind of I think a lot of content that we discuss uh over the next podcast is going to be drawn from. The lecture series are kind of designed to bring you from kind of knowing nothing about quantitative finance, but knowing a little math and a little programming to really actually being able to compete.
uh with a quant uh in industry. And again, like the goal of this is Uh Quantopian's primary business model is to allocate money to the best ideas that are produced on the platform. And uh so we actually recently um had some big news in which uh we uh entered into an agreement with uh point seventy two in which they would give us uh through a separately managed account access for to up to two hundred and fifty million dollars.
uh subject to some constraints um to allocate out to uh to our uh users algorithms. And so now the idea is that we're trying to grow the size of allocations that we can make to algorithms on our website. and uh pay the users uh uh percentage of profits that they make on the algorithm. So that's that's our our our main business model is doing a profit sharing
um on algorithms that are sourced from our community. And kind of our our dream user, our dream use case is someone who comes from a background or region where they traditionally would just not have access. to quantitative finance. They just wouldn't be able to do this, but through Quantopian um are able to kind of, you know, work hard, be smart, design a really good strategy, and actually get an allocation and make
¶ Open-Source vs. Secrecy in Finance
kind of on the order of the same amount of money that a a a professional quant would uh for doing the same work. So our goal is really to make it way more merit meritocratic than it than it currently is and try to flatten out the playing field um within quant finance. Yeah, right. Now just to pick up on a few things you said there. Right back at the very beginning you mentioned that the barriers to entry, uh, to becoming a quant
are very high in most cases. Why are these barriers to entry so high? Like is it the the knowledge that's necessary? Is it the expensive data that's required? Is it the tools? I mean, is it a mixture of everything? What Why are the barriers so high and unattainable for many?
I think it's an all of the above circumstance. There's a lot of industry forces in finance which push towards secrecy and push towards not sharing what you have in any way. And If you look at the way that traditional finance firms have operated, uh, they've operated in this way that like all intellectual property that is generated at that firm belongs to the firm.
And it is incredibly rare for a firm to open up that intellectual property in any way. Um that is kind of a firm's lifeblood in many cases, and in the traditional business model,
Uh your you know activities are highly dependent on your ideas being new and and your ideas being unique and that nobody else can figure them out. What that's led to is this circumstance in which Um not only in many cases is the academic field kind of far behind industry practitioners or you just kind of going off in another direction that's not really related. Uh but uh it's really you just don't even know what has to get done. Um, you know, things that would be kind of
absolutely essential to a quant workflow. And as an example, I'd say like uh strategy capacity analysis in that every trading strategy, you know, has um uh minimum and maximum capacity that you can put through it. And quants do all sorts of analyses on that capacity to kind of figure out how much they should be allocating to each strategy in their portfolio and and how they should adjust that over time and things like that if, you know, you're not already in that community.
you just might not even have any idea have to be done. And and you know, there's so many sticking points that you can get stuck on when you're trying to do this on your own. Yeah. So you brought up the the fact of um, you know, there's a lot of secrecy that surrounds this. You know A firm does need to be somewhat secretive about their operations though don't they to you know, be viable and stay uh in in the game essentially. Is that the case? Like that's that level of secrecy is necessary.
So I mean, i if if you make the argument that if assigning secrecy some kind of linear score between zero and one hundred, and you make the argument that some level of secrecy is necessary, I would say yes. But where it falls in there is highly dependent on the business model of the firm the, you know, lots of other factors. I mean
As an example, one of the first things that Quantopian did is that we actually open sourced um pretty much most or all of our code base. Uh not all, because you know we still had w some proprietary code, but kind of most of our code base, and I there may have been some time in the history of the business. I'm not sure, but there may have been some time in the history of the business at which all of our code base was open sourced. Um but we uh have a GitHub
Page. For those of you who are technical, you'll know what that means. For those of you who aren't, GitHub is just basically a service that um companies use to store all their code that they've written and and you can make it open and freely available or or closed and private. And we've made uh our backtester code, many of our libraries just open and freely available. And that that was something we did right off the bat. Um and the hypothesis was that
open source code, you know, you can have thousands of eyes on that piece of code and thousands of people looking at that code and finding bugs and making it better. Um and that's what we observe. We have you know, I don't even know how many views we've had on the on the zipline, which is our back tester, our our zipline uh code base. And uh as a result, you know, like four years times
you know, thousands of eyes looking at that code have produced so many bugs, so many bug reports have come out of, you know, having that thing be freely and available and open. And and I just can't imagine That a system that's written internally at a hedge fund and maintained by five engineers who look at it maybe, you know, like eight hours a day each. Um, I just can't imagine that that system would be, you know, in any way bug free because
Uh we have found so many bugs and we're still finding bugs in our system, you know, after four years. And uh a bank that, you know, has two years times five engineers looking at this thing, is there's just no comparison. Yeah, and that actually reminds me, uh you know, with qu on the Quantopian platform, uh users use Python as the the programming language.
Uh, Pandas, which is a Python library, uh, if I understand correctly, that actually came was originally developed by a person, uh a man who worked within a hedge fund, is that right? Yeah, that's exactly right. I believe that's Wes McKinney you're talking about, if I'm g not getting my people confused and Uh he was working at a hedge fund and he was saying, well, Python's great for most of what we have to do.
But, you know, there's some there's some key things that are lacking. And so he said, Well, why don't we just build those key things that are lacking? And these are kind of general basic tools that could benefit everybody. And I'm talking like People in finance, people in science, people in data science, all across the field.
And uh so he built those tools, he made it open source, and as a result, Pandas has become this amazingly useful library, amazingly high quality library of code that's in use at so many funds all over the place. Um But, you know, it's kind of it's not a zero sum game. Many people think of it as a zero sum game and and some very specific parts of finance, if you zoom in close enough, are a zero sum game. But in in many ways it's not a zero sum game. And I think that that's kind of the
the f the the failure um of logic when people try to keep everything super secret all the time with no exception. Okay. And just so there's no confusion here, would you mind explaining what a zero sum game is?
¶ Understanding Zero-Sum Games
That's a good point. So in economics, uh specifically kind of game theory, which is a branch of economics which deals with uh what decisions kind of should one make uh if you are trying to optimize some you know utility some objective function you're saying I want the most money or I want the most happiness
Um and you're trying to say what decisions should you make in the presence of other people also playing the game often, you know, it's not just you who's making these decisions. Uh a zero-sum game refers to the case in which Um you know. The amount that you get and the amount that the other person or other players get have to add to the same thing. So or or have to add to a single quantity. So if you get more, it means someone else is getting less, and vice versa.
And that's the notion of a zero-sum game is that you have to compete. Um and your competition and your success hurts others and and and your failure benefits others. And um
That again, it's true in some cases, but I I really don't think it's nearly as common uh as people think it is. And in fact, one interesting thing that I could talk for about for hours and and we don't have time to delve into, but Many people learn basic game theory, what are known as equilibriums, which are kind of optimal strategies to take. Uh, game theory equilibriums in in in school, especially if you'll take like kind of intro to economics courses. And uh those equilibriums kind of
Uh in the same way that many physics theorems are proved in vacuums with no friction, those equilibriums are often proved mathematically, assuming many things. And these things include assumptions about like infinite computing power and perfect information.
um and never seeing this person again ever, you know, like these are called like single single play games. And what computer scientists have done recently, they said that that really does not sound like a reasonable set of assumptions to model the real world.
And so they went back in and they've they've simulated these games in the presence of things like inefficient information transfer and and uh uh inconsistent geographical distribution of of people and all sorts of you know real world type conditions. And what they find is that whereas the you know, very uh sanitary, um, I would argue unrealistic assumptions that are made in the original equilibrium proofs say that y you're always better off kind of backstabbing people.
And you're rarely good collaborating because the risk of someone else backstabbing you is too high. Uh what's found is that when you re-simulate these games, in the presence of real world conditions, oftentimes the reverse holds true and cooperation and generally being nice tends to be kind of the optimal strategy in those cases.
Right, right. Okay. I again you can tell I'm a little I I could rant about this for a while'cause it it is a it is a sticking point for me that so many people go through these intra level courses and and graduate into life thinking that, you know
uh screwing over other people is a is a is the best thing to do. Okay. No, that's excellent. Now Just to pick up on something you brought up earlier, uh the two hundred and fifty million dollar from uh investment from Steve Cohen at uh point seventy two.
¶ Point72 Investment and User Impact
Um, what does this mean for users? Like how is this actually gonna change what's been happening sort of to date and moving forward once this um investment sort of comes into play? Sure. So I mean when we talk about um anything to do with allocation, I just have to be, you know, a little careful to to uh make sure I don't say anything inaccurate because, you know, we now have to
start uh preparing to to make sure that we're uh uh SEC compliant. Um and uh you know there's they they monitor any statements you will make about, you know, investments and and in general also I like to be accurate about these things. But specifically what this meant what this means is that um Point seventy two has agreed to make available to strategies on Quantopian up to two hundred and fifty million dollars. And that's gonna be released.
uh over the course of a year, uh and there are certain, you know, performance constraints that are gonna be have to be satisfied um of the strategies that that are picked. Uh, you know, so it's not just like uh$250 million cash is getting delivered to anybody here. It's you know it's a it's a well reasoned deal. And uh what that means though is that uh previously um any allocations we were able to make to the strategies that we liked in the community tended to be very small.
Um and they were basically kind of test allocations that we made off of our balance sheet. And uh now what we can finally do is start ramping up the size of allocations. And uh our goal is just to ramp the size of the allocations up. higher and higher until the portfolios that are being managed um by algorithms on Quantopian
¶ Series Theme: Quant Workflow Insights
are kind of, you know, as big as the portfolios that you would see at large financial institutions. And therefore the people who wrote those algorithms can get paid. On the same scale as people at the large financial institutions. And again, all of this is contingent on the strategy itself. performing and Quantope being able to secure um you know uh capital to allocate out to algorithms, but that's that's our goal.
Yeah. No, that's really excellent. I mean, um, it's a huge win for you guys and obviously uh users of the platform too. So no, that's very cool. Um now let's let's get into this. So You know, this is gonna be the first episode of about six. We're gonna be doing another four episodes after this and then a QA uh at the end.
Um, what's the overarching theme of this series? Obviously I've given a quick summary in the pre recorded intro for this episode, but let's hear your take on what someone can expect to learn from listening to this series. Sure. So like I said, I think what we're trying to do here is we're trying to give more windows into the workflow of a professional quant or a professional quantitative finance firm.
And this is some of this is really complicated stuff. You know, I'm not gonna sugarcoat it. Some of this is is really hard stuff. And you know, you shouldn't expect to listen to a podcast and then, you know, come out being able to uh do some kind of multivariate portfolio optimization analysis. But Um The goal is to kind of give you some some some holding points, give you some points, some s some skeleton such that.
Uh if you're interested in learning more, um you kind of see some of the paths that you can take and and point you to resources that you can use to to start fleshing out your knowledge of the system. So again, like The goal is to gonna be to give you like a a broad understanding of the lay of the land uh and and how quants work and how they're different from other traders and uh you know what they do and what they don't do and what they worry about at night.
Um but you know, not necessarily to be able to execute on any of the techniques that quant you quantos use because honestly that just requires more time to learn and more hands-on experience. Um one thing I will say uh just right off the bat is Uh Quantopian again maintains this lecture series uh as a resource in addition to tutorials. And if you go to Quantopian.com slash learn.
Uh slash learn will get you to a page which you can see tutorials if you just want to learn like how to get started on the platform. And also it will have a link to the lecture series where you can kind of get at more of this stuff in a in a in a much more hands on way and actually play with the code and see what happens. In addition, we're also trying to set up uh some uh uh a post on our blog.
uh so that we can kind of uh collect any of the resources and links that we talk about in these podcasts and and make there be kind of a a single source for people who want to just go there and look at it all at once. But that's something we'll we'll be setting up.
¶ Quantifying Uncertainty: Multiple Comparisons Bias
Um at some point, and depending on the magic of of free recording, it may or may not be available at the time you're listening to this podcast. Yeah, we'll definitely sort something out in that regard. But anyway, the title of this episode being the first episode is You Don't Know How Wrong You Are. Do you want to explain why we've why we've titled it this?
Well, I mostly'cause you don't know how wrong you are as the listener. Um and uh what I mean by that is One of the main differences, I would say, between a non-statistical and a statistical framework for thinking about the world is that a lot of statistics is an effort to quantify certainty and quantify uncertainty and say more so than make this choice or don't make this choice
Really look at like the distribution of outcomes that you could have after making a choice or not making a choice and and and try to estimate, hey, how certain am I that this choice is the right one? And how I'm how uncertain am I about, you know, these choices that I'm making in general?
And one of the things that, you know, I I find pops up quite consistently, uh, especially when I talk to people who aren't from a quantitative background or kind of came from a non quantitative background and transitioned into trying to do more quant stuff, is that Something that you learn when you go through a a kind of a a classic statistics education, either undergrad or graduate program.
is you learn a lot of these biases and a lot of these things that can come back and and bite you and and and just be completely invisible if you don't know about them. And um One of the one of the examples of this that I'll I'll just go over briefly to start is something called multiple comparisons bias. And multiple comparisons bias refers to uh you know it's a fancy name, it sounds cl complicated, but it refers to something that's pretty simple and kind of
I think everybody is vulnerable to this. Like even you making your your life choices, you know, about in general, regardless of what you're doing on a computer with finance are vulnerable to this and What multiple comparisons bias means is the following. So when you are looking at some event or some piece of data.
And you're trying to decide you know, whether some other thing is true, whether this data is significant, whether it reflects some overall trend, you know, you you do some kind of test and that test might be a rigorous statistical test that you do on your computer. Uh that test might be something just inside your head that you just say,
Yeah, I think this is significant. You know, it could be either. Uh it could be, you know, you get in a bunch you get into a room, a bunch of people talk about it, and then you all end up voting. You know, these are all processes. that transform that data into a yes-no decision. And whenever you have a test like this If you make uh you know each test is not perfect, so there's a chance that you get the wrong answer.
Right. And so let's say let's look at the case in which um, you know, you're looking for examples of when to trade and you're your your your data's flowing in and you're looking for examples of when you should pull the trigger and actually place a position. And Uh every time you look at that data, you're making a test. You're saying, you know, should I trade or should I not trade? And there's a chance that there's a false positive, right? There's a chance that your test
misfires and says, yes, you should trade when really you shouldn't trade. And there was something, maybe the data just popped up happened to look like the pattern you were looking for. You know, you don't know. Something happened that caused it to to be a false positive. And
Because your test is imperfect, the rate of false positives is not going to be zero. You know, you're going to get some percentage of false positives. And as you make more and more tests, The overall number of false positives that you should expect to get gets higher. And what I mean by this is let's say that the chance in every test that you get a you get it wrong is is is five percent. Let's say the chance that you get a false positive is five percent. Well if you run 1,000 tests
You know, and that's not too unreasonable. You make you make thousands of decisions every day in your regular life. If you run 1,000 tests, you should expect to be wrong. fifty of those times, because five percent of one thousand is fifty. And you should expect fifty false positives to pop up from a body of one thousand tests.
Um and the danger is let's say that you know you you look at a thousand different choices and you run your test on each, you will get 50 choices that look good just placed just based on random chance. You know, and some of them may be good. And some of them may not be good, but you know, you don't know. There's no way to distinguish between them just from the data you already have. So that's
That's an example of a bias that a lot of people just don't even think about when they're going through their lives, when they're trading, whatever they're doing, but is really core to how a statistician thinks about the world. And and is an example I think of
¶ False Positives Explained
Um, you know, people just not knowing how wrong they might be when they make decisions. Right. So first thing I wanna ask you based off of that is what is a false positive? Maybe if you could just explain that for anyone listening who might be unsure.
Yeah, so I think a lot of the language in in modern statistics came from how it was developed and and statistics was developed uh you know, there's a few major uses when it originally started, you know, getting off the ground and one of those was kind of medical. Uh you do you that's a natural source of of you want to gather data about how sick people are and when they get sick. And uh let's say you're testing for a disease.
Um well you say someone is positive or negative for a disease, right? Let's say you're testing someone for the common cold and you You have some test which is based on the symptoms you see and you you write the symptoms down and the computer you know says, here's a score and if the score is above something, you say yes, they have a cold. If it's below something, you say no, they don't. So you know very common workflow.
So in that case, a false positive would be when your test says, yes, they have a cold, but they don't actually have a cold. And the test saying yes was based on the data randomly lining up. In such a way that we It looked like they had a cold when actually they didn't. Uh and this can happen equivalently if you're saying, um I'm gonna test to see if two stocks um move in the same way, if they're correlated, you know, and if you look at many, many pairs of stocks.
and try to say, are you know, give me some pairs that move in the same way so I can try to trade off of them, um, you're likely to come up with some pairs that just ha happened to move similarly in the past. But that's not really indicative of any relationship. And it's because the data just happened to kind of line up in that. in in that way. Does that make sense? It does, yeah. So I mean, just going back to that um that example you gave there of the stocks, I think that's a really good one. So
You know, y you said it it just so happened that the data happened to line up and these two stocks appeared that they were correlated. I mean, why were they not correlated? Why was it a matter of a matter of fact that the data just happened to be That way, so that it looked correlated. Why were they not actually correlated? Does that make sense?
Yeah, absolutely. And I will say that as we talk about this, if anybody is interested in kind of seeing this in a more rigorous setting, we have a a lecture called pairs trading. uh in the lecture series and and that Paris trading lecture actually deals with a situation that's pretty similar to this. So if you want to actually muck around with real examples of this and and and see this all laid out in front of you, um, check out that lecture. But
We'll talk we'll start with a super simple example, right? Let's say that you represent stocks as up or down, right? You you don't actually care about the magnitude of the returns, but you just have uh you have up or down at each minute or day, whatever you wanna, whatever your sample frequency is, right? Um and so let's say that you're looking at stocks
And you have uh one minute's worth of data. So again, super simple example, but you know, we'll we'll we'll blow out from here and hopefully it should become like apparent um how this works. Let's say you have one minute's worth of data and stocks can go up or down and you're looking at two stocks, right? What's the chance that they both end up moving in the same direction?
And it's it's it's not a trick question. I mean the answer is if they can one can i so if one can go up and if one can go down. uh then it's a fifty percent chance that they line up. And and you can kind of you can you can do out the probabilities on a piece of paper if you want to convince yourself of this. But it's basically the same as flipping two coins and saying, what's the chance that they both come up the same?
You know, not that they both come up heads, but what's the chance that they both come up the same? And so once you have that fixed nonzero probability, well let's say now you have two time points. You know, and then you say what's the chance that both stocks ended up um, you know, going in the same exactly the same over those two time points?
And it's some probability, you know, you can work it out, and then three time points and and four time points, and you extend it all out. And so now it's like to a hundred time points, right? So if you have a hundred time points and you're representing the stocks as just up or down. The chance that, you know They're both have gonna d they're both have gonna done the exact same thing and all of those time points is is minuscule, right?
¶ Spurious Correlations in Data
And so if you looked at two stocks and you said, wow, they have been exactly the same over the last hundred samples. What a statistical test is really getting at is like, what is the likelihood that I would have observed this behavior if they were if there's no underlying relationship? And and the likelihood is super low. If if they're just moving randomly, the chance that they would have lined up over 100 time points is super low.
In practice, though, you know, they're not just moving up or down, and there's going to be some noise, right? And so what you do is you look at the two movements and you say, how similar are they? Right? It's not just exact or not exact. It's like, you know, how many of how much of the time do they both move up? How much of the time do they both move down? And so the probability that two stocks line up in a way that's like enough to pass your test starts increasing.
It's not like, you know, something minuscule anymore. Now it's let's say, you know, it's it's uh it's one percent that the chance over a hundred time points, they move similarly enough. so that they pass your test. And so again, now we're at this point where it's one percent. And there's a 1% chance of a false positive in which you have two random stocks, but they have happened to move similarly enough.
over the last hundred time points that they pass your test of being related, even though they're not related. And so if you do 100 different tests on different pairs of stocks, you should expect to get one that pops out, one pair, because you should expect 1% of those tests to have false positives. So does that make sense in how like basically what you're looking at is the false positive rate is the rate at which two random things can line up enough to pass your test, but in incorrect.
And and different tests have different false positive rates. And in statistics, one of the really nice things is that you can calibrate. the false positive rates and you can kind of test what they are and you can know about them. And one of the problems is is that if you're just making these decisions in your head or in some other method It's pretty much impossible, not impossible, but it's pretty close to impossible to really effectively calibrate those decisions and know.
what your false positive rate is. Okay. So it it really kind of comes back to the underlying relationship and there actually being some meaning as to why those two stocks should be correlated or shouldn't be correlated, right? Yeah. So I mean statistics is really
about testing for relationships in your data and and just deciding like what is the certainty I have that this relationship is real or that it's just kind of nonsense. And again, like Two things can line up so that they look like they're related.
And they're actually not. And that happens all the time. You know, all the time. And a famous example of this is uh I forget which website does this, but if you kinda Google weird correlations, um you will find I think, if if I'm if I'm remembering correctly, uh you'll find websites um That show you correlations on historical data. And so I just pulled one up and uh this website is called uh Spurious. Spurius is the statistics word for kind of like
not really meaningful. Spurious correlations. And uh this this they give you a random correlation. And so the random correlation they give you is if you look at the last uh ten years in the US the and you sample uh US or sorry uh if you s yeah let's look at the last uh ten years in the US and if you look at the number of people who drowned falling into a pool. And if you look at uh the number of films that Nicolas Cage appeared in, they line up pretty closely.
like pretty closely, right? And you'd look at this and you'd be like, oh my goodness, there's a There's likely a relationship there that I sh I should look into this. Um but of course what's happening is just that there's just so many things, so many things you can keep track of that over that incredible universe of things. the chance that two of them don't happen to line up over some time period is super low.
So what they're doing is they're just saying, because there's so many possible relationships, there's so many things I track, there's so many possible relationships, I just have to look at like a a thousand relationships and one will pop up. That looks pretty close. Um and and that's this notion of kind of multiple comparisons bias.
¶ Overfitting: A Major Finance Problem
Right. No, that's that's awesome. I think you've done a really good job of explaining this, Delaney. Um I appreciate you, you know, really fleshing this out. So sticking in line with the theme of this episode, you don't know how wrong you are Let's go into some of the other biases. Let's let's talk about overfitting. What is overfitting? And maybe let's talk about some best practices for how to avoid overfitting. Sure, so
So uh again if people want to like muck around with this more, there's a there's a full lecture in the Quantopian lecture series on on overfitting. And again, you can get to the lecture series if you have a quantopian.com slash learn or direct. directly if you wanted by quantopian.com/slash lectures. And overfitting at its core, I would say, is actually kind of one of the biggest problems that people deal with in finance. And what it is is
Um it's it's similar to multiple comparisons bias. So like kind of try to keep the same same kind of ideas floating around in your head if you can. Um but the idea is that If you look at some set of data, you know, and again, you you're making decisions, you're looking at some set of data.
Some of that data is gonna be, you know, i i if it's the right data set, some of that data is gonna be like useful data. You can make decisions off of it. It's related to the thing that you're trying to predict. Um and then some of that data is just gonna be random noise, you know, and and as an example of this, let's say that you're trying to model uh temperature. You know, and temperature in some neighborhood.
And you have a data set, and the data set is the temperatures of all the surrounding neighborhoods, you know, as measured by by the the the the thermometers in r in the surrounding neighborhoods. You just don't have a thermometer in this neighborhood. Um and then also uh the number of pop songs that were played on a local radio station that day.
You know? Like one of those is unrelated and it's it's just noise. It's it's pure noise. There's it's not related to the outcome in any way. And um what overfitting is is when you make a model that fits really, really well to the data set you have, in fact so well that it's also fitting to that noise in addition to the real signal that relies underly underneath.
And um it's a little tricky to kind of get a get an exact sense of this without seeing an example. So again, for people who are familiar, I recommend just looking up Overfitting and there's a bajillion examples of it. Again, like I said, the lecture series has examples of it. But the idea is That you know, you construct a model which has 100% predictive accuracy over that.
historical data set you're looking at. And if it has 100% accuracy, it has to take into account, you know, the noise that's coming from your data not being perfect. And If it's able to take an imperfect data set and create and and have a perfect prediction, then If you take that going forward, it's gonna be too finely tuned in to all the little noise and and and and little things that popped up, right? And um So You could as an example, if you're making a back test, right?
What you could do is you could say, uh I have a back to a trading strategy, and I want to be not vulnerable to another financial collapse like happened, like the one that happened in 2008. And so what I'll do, and this is actually where multiple comparisons bias can come in here, what I'll do is I'll look for signals that seemed to match with the collapse happening in 2008. So what you do is you say, okay, well two in two thousand eight I noticed that uh you know, housing
Uh housing growth followed this pattern. I noticed that The LA Dodgers had run had won and lost in this pattern. And I noticed that the temperature in Sacramento followed this pattern. So my model for avoiding the next financial crisis is going to take those three. variables and look and and say if at any point these three variables line up with what happened in two thousand and eight, I'm going to move to cash.
Right? And that's a particularly ludicrous example, but You know, when you start saying, Okay, well I'm gonna look at the movements of Apple in 2008. I'm gonna look at um you know uh JP Morgan's stock price, which probably is related to 2008, Apple maybe not as sure, and I'm gonna look at the movements of uh you know
PepsiCola in two thousand eight. Is that related? I mean maybe, I don't know. What you can start doing is just looking for these incredibly specific patterns that have no bearing on what's going to happen in the future and and you've become overfit. to your this historical data. And it's it's amazing how easy it is to do that and how subtle it is to you know to to to to detect. And um it's actually one of the biggest problems that we have to deal with.
at Quantopian every day, because we have to differentiate between algorithms who look really good historically because they're overfit and algorithms who look really good historically because they're going to keep performing well into the future.
¶ Avoiding Overfitting: Out-of-Sample Testing
So how do you differentiate between the two? Like uh what are some of the things you do to get around th this issue? So really kind of like there's there's two main things you can do. And and and the the first and foremost, and it's really like the gold standard for avoiding overfitting, is something known as out-of-sample testing.
And again, now you know, we can go as deep into the weeds with this as we want. Um I don't want to necessarily kind of bore or scare people who who haven't done much with the statistics in the past, but what the idea of out-of-sample testing is What you do is you develop your model and you do all of your model development and your strategy development on one data set, but you've saved some of the data.
You you you haven't run out of data. You've taken some of the data and you just haven't looked at it at all. And I mean that. Like you haven't even explored it. You haven't poked around it. You've just cut it off. You've taken your Excel file. You've cut it off under a certain number of rows. You've taken that, you've put it into a safe.
And then what you do is once you're done your strategy testing, um sorry your strategy training as you call it, we call it training and then testing. You've got your you you've like you're training your strategy on the on the on the in-sample data that you that you use.
And then once you've done that, you say, Okay, now I I think it works really well on this data set, but I want to test whether It's just fit to noise in the data set, or whether these are like real patterns I found that will continue forward into the future. And so what you do then is you say, I run this, I run my model on the testing dataset, which you haven't looked at at all, and it hasn't been used in the construction of your model.
Um and because it hasn't been used in the construction of your model, your model cannot be, you know, overfit to the random noise that occurs in the testing data set. And so its performance over the testing data set is usually far more indicative. of the quality of your model than the performance over the training data set. But the problem is that you only have a finite number of tests, and rarely is just your model work on the first pass. So generally you'll have to like save.
several out-of-sample testing segments from your data or wait for new data to be collected as time goes by or something like that. Okay. So what's a good ratio to have? of in sample and out of sample data. Like if you have uh, you know, let's say ten years worth of minutely data for the SPY, how would you segment that data into in and out of sample? Sure. So I mean, again, we're we're getting into things which are very well studied in statistics.
And you know, I I may miss, you know, crucial papers and you know, I I we if if you're a s if you're a statistician and I say something right now that's just completely wrong or completely misses a key p a key paper, please angrily write to me because I want to know, but but not too angrily. How I would approach this is Uh Basically the following, I mean this is where it kinda gets into more of an art than a science to an extent, but Generally some people like doing like ninety ten.
as a split, um one thing you have to be careful about is that with market data, uh things change over time. That's known as non-stationarity if things change over time. And so If you're developing a model on, you know, twenty two thousand to twenty and nine, and then you're saving two thousand and nine to two thousand ten as you're out of sample test.
Well two thousand nine and twenty ten may just have completely different behavioral characteristics than the nine years prior, and so it may not be like a great test. of your model. It that that you know, that's just something to consider.
Um and so doing things like rather than splitting across time Sometimes it can be better to split across, say, you know, tradable assets and just take a bunch of your tradable assets and and move them into your out-of-sample test period uh out-of-sample test uh set.
So that's one way to that's one way to handle it. Um again in practice there's there's lots of resources for this online if you just look up out of sample testing. You can get comp you mean like there's lots of people who like do things like uh cross validation, some people do this thing called jackknifing, some people do things like bootstrapping on limited data sets, lots of different crazy things you can do.
Um but at its core, the concept that I really want to get across is just this idea that uh you know you need to take some of this data and and not touch it. until you're done developing your model.
And the other thing that's dangerous is that let's say that you are developing some model And you follow this procedure really well and you develop the model and then uh you know, you you save ten percent of your data and you go ahead and you you test uh your model and the other ten percent of your data and it doesn't work, so you give up, okay. You come back two months later, you have a new idea. So you develop your model in the first nine years, and then you test it on the next year. And
It doesn't work. Okay, you do this like ten times, right? Well, now we're starting to get into multiple comparison bias territory in which if you try enough ideas, one of them will get past the the out-of-sample test purely based on random chance. You know, it's it's just there's it there's some chance it gets through. So it's it's it's just important to realize that these
These biases kind of pervade in many different directions, and it's hard to account for all the different directions that they pervade. And you can pretty much never completely avoid these biases. You can just minimize the chance that you run into them.
¶ Survivability Bias and Lucky Traders
Okay. I like how you brought up the point about, you know, different sets of data. Non stationarity. Is that what you mean? Yeah. Yeah, yeah. How there's different market conditions. You know, so if you have a strategy that you've tested from two thousand nine to two thousand uh two thousand to two thousand and nine, Um and then you run it on 2010, might be very mark different market conditions in 2010. So it's not a fair test. I really liked that point that you brought up.
Just moving on now to another type of bias and that's our survivability bias. Um take us through that. What does this refer to and what measures can a trader take to eliminate this? So again, I'll I'll just pick on that language point of eliminate because like, you know, basically just just to to to drive this point home, you you can't eliminate these biases. They are they are kind of things that are always there in some
quantity and and your your best bet is just to try to reduce that quantity as much as possible. Um With survivability bias, survivability bias is the notion that uh let's start with a simple example. Let's say that you have uh 1,000 coins, okay? and you flip all of those coins once. You go down in a row and you flip the first one, the second one, the third one, the fourth one, et cetera. And then when you you've gotten through flipping them all, you step back and you say, okay.
How many how many heads did I get? And uh well the answer is that if they're fair coins, um and you your your flip is a fair flip, uh you should get about five hundred heads. Right? So you you say, Okay, I'm gonna get rid of all the tails. Just knock'em out, knock'em out of my set. Okay, now I have five hundred coins left, and you go down and do everything again. You flip them all, and you get left with two hundred and fifty heads.
And you knock out all the tails and you just repeat this process. Well, it turns out that there is a kind of a decay. uh of how many heads you should expect after a certain number of of flipping processes. And so Um survivability bias refers to the fact that Uh if you look at one of these coins, you know, and and you say, oh wow, this coin has gotten ten heads in a row, well you might not be looking at all the coins that didn't get
10 heads in a row and are just never coming across your radar because of that. So let's take this analogy and let's put it into traders, right? Let's say that you start with a population of 1,000 traders and they're all making trades, and let's say that they're all making random trades. There's literally no skill in this population of traders.
Um well after one year, you know, there's some chance that you were you were up or you were down. And let's say it's 50-50 because kind of these traders either place largely up bets or largely down bets with a lot of leverage. And so the traders that you know, got the market wrong, they're all gonna have lost all their money and they're gonna be out of the business. You know, they're not gonna be hosting on the trading forums. There's not gonna be any news articles written about them, you know.
And then you go for two years, you go for three years, you go for four years, well there's still gonna be a few left. You know, it's not gonna be a temp. There's still gonna be a few left who, just based on random chance, have gotten it right every year and made tons of money. And generally these are the people you hear about. These are the people and again
I'm not saying that everybody you hear about is unskilled and just lucky. I'm saying that you it's hard to distinguish. It's really hard to distinguish because you hear about someone and they say, this person has gotten seventy percent per year for the last three years.
Well what about all the people who didn't get seventy percent per year for the last three years and and and dropped out of the running and aren't on TV right now? So Survivability bias is this kind of very pervasive notion in finance that whenever you're looking at performance results, uh it can be really hard to tell if they're based on luck.
or or they're based on something real. And actually out-of-sample testing is one way to try to get at that. You say, okay, well I don't know if this is based on luck, so let's test it going forward. uh for another you know year and see you know if this guy can make mostly right trades. You know? It that that's that's one of the ways you can try to get at it.
¶ Data Cleaning Bias and Bad Data
Okay, excellent answer. And just moving on to the fourth bias which um I would like to cover, and that's our data cleaning bias. What do we need to know about this one? So, I mean, this is this is a fun one in finance, and and other people may call it something other than than data cleaning bias. Um, that's just what I call it.
Uh so when you are using data to set up some kind of trading system, you know, and I'm sure many of your users get data from many different places, right? Uh some of your users may just get kind of daily data that they they scrape off of Yahoo. Some of your users may get uh they purchase data from some data vendor. Uh
Whatever, you know, it comes from all over the place. Uh and I I think I might have just said user rather than listener, which it just shows you how in how deeply deeply dug into the uh the startup culture I am here. Um but uh But uh so the idea is when you buy that data. Oftentimes, in fact I would I would say I don't know I don't have statistics on this, so I'll just say oftentimes uh that data will have corrections baked into the data.
And what I mean by this is let's say that your data provider uh comes out and they says, they say uh the price of Apple today is one trillion dollars per share. You know? And you know, you as a human say, hmm, I don't know about that one. Uh but you know, an algorithm that's making decisions uh may say, Oh my god, this is the greatest short opportunity of all time.
Right? And uh so you you this this data says Apple is one trillion dollars per share. And then they come out the next day and they say Uh the price of Apple today. is a hundred and one dollars per share. And we have issued a correction for yesterday. Uh the price yesterday was actually$100 per share and someone put their finger at the zero key too many times or something, you know? So
That's a correction. And the data that you look at most times that you get from data vendors has had this pre-cleaning done to it. They've gone back and they've wiped out the cases in which they were wrong live. And so when you are testing a strategy on this data that you get from your data vendor. You will not have any of these bad things happen in the back test of your strategy. You won't.
Be able to say, oh, how does my strategy do if my data vendor just gives me a completely wrong piece of information? You know, is it robust to that? And so you may have these inflated performance numbers based on your data that you're backtesting on being higher quality than the data that you're actually going to get in a live feed from the data vendor.
This is kind of one of those catastrophic scenarios often where uh an algorithm could be vulnerable in some weird way to um you know, some some weird extreme piece of data coming down the wire. And you know, you could like make some really horrific trade and and end up having to lose a lot of money.
Uh you know, but but this is just something that you will not notice necessarily unless the data that you're using has kind of a a representative amount of dirt in it compared to what you'd get normally.
¶ Handling Unclean Data in Live Trading
Okay. So why do these bad prices or these bad prints occur? Like what triggers that? It it can really be anything. I mean I don't know if you've ever used a piece of software and had something weird happen, but I mean just back that out to imagine that piece of software is is producing data. You know, it's it's uh it's really they can come from anywhere because They're the stock exchanges are running a piece of software and then
Uh oftentimes like there will be one data vendor that consumes it from the stock exchange that will then sell it to a second data vendor which like packages it up and then maybe that second data vendor licenses it to a third data vendor which sells it on a retail basis. So This data oftentimes will will will change hands and go through many different systems and It's really difficult to track where in those systems, you know, you might have weird bugs like this pop up, but they can be
things that are as inane, like I said, to someone literally pr putting their finger on a button too many times, which I which I think has ha that's that's happened in the past. um to, you know, as subtle as just like uh a rounding error in a computer software that was hard to catch uh that caused the price to wrap around from uh you know one cent all the way up to infinity dollars or something. You know, like little things like that can happen.
Right, okay. And so how does a quant deal with, you know, misprints or bad data, uh unclean data in a live environment? So there's a few different kind of main ways to deal with it, I think. One way uh is basically kind of ignoring outliers. Uh and this is this is a common tactic in in statistics, which is just to pre-define, before you've looked at your data, kind of say, There's a threshold
Beyond which just I don't kinda I I think data is probably just like noise or something wrong with the system. And and and set those points and maybe have those points change as the data that you come in change. So a common thing would be to say, maybe t make it so that if the data is more than three standard deviations away from the the the current mean over the last month, then just don't look at it or it don't, you know, pretend it doesn't exist. Um
Uh and that's that's known as as outlier, you're getting rid of outliers. And then there's another way, which is called Windsorizing, which is similar except that
Instead of getting rid of outliers, you just set them to be the maximum threshold that you could, you know, feasibly observe. So if you get a price for Apple and it's one trillion dollars per share and the standard deviation of the price of Apple over the last month has been ten dollars a share and the mean has been a hundred dollars a share, then you say, okay, well mean of a hundred plus three times ten, which is the standard deviation, that's$130. So
You know, I don't know what's going on here, but I'm just gonna say this price is$130. You know? And so you kind of you limit the damage to your system in that way. And and a third and more structural way that quants will deal with this, and I think is really the most powerful, which is you say quants are a good quants are really good
at admitting when they don't know stuff. And that's actually one of the things that kind of defines and and kind of can set apart quants apart, is that they're just they're really good at admitting when they are wrong and uncertain and don't know stuff. And again, that gets back to this core of statistics, which is quantifying uncertainty. And if you're really good at kind of saying, hey, you know what?
I don't really think that, you know, the the chance that one stock is gonna have a misprint is too high for me. So what I'm gonna do is I'm just gonna invest in so many assets That the chance that you know enough of them have misprints at the same time to really damage my portfolio is very low. So this idea of Accepting that bad things can happen and then adjusting the structure of your portfolio or strategy based on that acceptance.
is is I think very common in quantitative workflows and and gets at, again, like I said, one of the ways in which quants might think differently from more traditional or or or you know discretionary traders.
¶ The History and Evolution of Quants
Okay, another really great answer. I like this point. Alright Delaney, so let's shift gears here a little bit. Talk to us about quants. When did they come to be? What's their history there? Like when did quantitative trading actually become a thing? So I mean, I'm not an expert on the timelines uh of of you know how things started happening, but
My general understanding is that kind of in the 1950 to 2000 window, and really when computers started being available, um, you know, is when quants started popping up. And the idea was Well You're sitting there and you have access to all this data. And why would you make kind of by hand decisions on this data when you could gain a ton of information based on looking at summary statistics, trends, all of the stuff that statisticians look at in their in their day-to-day work?
Just as computers proliferated and it became more and more available, and as more and more of this data became quickly available, um, more and more, you know, quants started being, you know. Popping up.
up on on Wall Street and and and assets under management has been um you know kind of slowly trickling out of from you know more traditional traders towards quants and and whereas Quantitative investments like purely algorithmic trading still makes up a smaller percentage of assets under management. Um what's interesting is that
Uh so many firms incorporate some level of quantitative analysis. So it might not be that everything is controlled by an algorithm, but it might be that they have traders whose trades go to a risk system which is quantitative. You know, it might be that they have uh some analyst who uses some quantitative measures of a of some investment's quality before they go and pitch it to the board or, you know, there's many different ways that you can incorporate
quantitative analysis into your strategies. It's not just uh everything being completely algorithmically driven. Um and the other thing is that I believe the majority of trading volume right now is driven by quant, uh quant strategies, algorithmic strategies. So one thing that's interesting is whereas, you know, a smaller percentage of AUM is is is uh
you know, quantitatively managed, um, quant strategies tend to trade more. And so because they're trading more, uh, they're actually a higher percentage of of uh market volume, I believe. I'd could be wrong in that um I don't actually have the statistics in front of me. And so what's interesting is that nowadays if you as like an individual go to place a trade.
you're no longer really competing against people like you because that trade is going to be matched against, I think probably most likely, a trade made by some kind of algorithm, whether it's an execution algorithm or a market maker algorithm or or whatever it is.
And so to an extent, when you're trying to enter into a position, um you don't just have to worry about other traders anymore. You have to worry about is there an algorithm that knows something that I don't? And that's why it's selling me this stock.
¶ Quant Culture and Academic Roots
So this is probably a good point to ask. Like how have quants evolved since first sort of coming onto the scene? Like are there are there trends evolving of current new ways of how quants are trading and looking to gain an edge? So a a lot of this stuff is really hard to know because of all the secret secrecy in finance.
And um you know it's it's really hard to know exactly what people are doing inside the firms and and you can kind of try to reverse engineer it based on what departments they're hiring from and all that. Um but in general I think that Quants haven't changed a whole ton um since they got started. And and and part of that is because you know a lot of places actually
didn't trust they they just didn't think quants were as valuable as they were. And so a lot of like I think quant shops got started as quant shops. It wasn't like quants went to work for
a larger ins uh institution all the time. So um, you know, one of the things that has always separated quant shops is like A lot of these people are much more uh, you know, like Silicon Valley, uh not Silicon Valley, but just laid back in their culture, you know, like quants much more often tend to be closer to scientists.
than they are to uh you know more of the suit wearing types you'd see on a traditional uh you know trading floor or or bank and uh you know just it's it's just kind of an interesting little uh you know different community. uh that exist on Wall Street. And part of that is because a lot of them are hired from academia. A lot of
quant shops uh, you know, now and always have uh hired from physics and math departments. They are from computer science departments. Um because these are the people who are used to thinking about the world as kind of a a a thing that you can model and a and a stream of data that you can work with.
So I don't necessarily think that quants have changed too much. I think that what they work on has changed. Um but but you know, I don't know if necessarily like the roles have changed a tremendous amount over time. Okay, sure.
¶ Exotic Data Sets: Satellite Imagery
And I know one of the one of the interesting points we were discussing before we hit record was some of the exotic data sets that are being used by Quantz uh today. Do you want to tell us about some of these uh data sets, uh some of the more obscure ones, and how they're actually being used?
Yeah, this is uh I actually have a lot of fun with these'cause it it's just so sci fi, you know, when you when you get into some of these things and and and s and start actually like being like, Oh my goodness, they're actually doing that. So let me tell you one of my one of my favorite examples of something that Quantz came up with
uh to try to get an edge on the market. And and one of my favorite examples is the following. So computer vision is actually pretty good now. And what I mean by computer vision is the ability of a computer to look at an image and tell you interesting stuff about the image. And one of the things that's actually kind of a very well solved problem because many different parties have been working on it is identifying cars. And uh, you know, car companies have worked on it.
uh DARPA, which is the US uh you know robotics and and computer science defense department has worked on it. Uh, you know, now of course like Tesla and anybody else who's developing self-driving cars has worked on identifying cars. So there's all all sorts of freely available scientific research on like how to get your computer to to to identify whether or not something in an image is a car. And so what uh some quants did is they went in and they said, oh, also we have Google Maps now.
Right? We have satellites and I don't think they use Google Maps, but they have satellites that get satellite imagery and there's many companies which will sell you
imagery from their satellites, kind of live imagery. They get for maybe they'll take one or two pictures a day or even more frequently, depending on how the satellite orbits and everything. And so what these firms did is they said Um okay, we're gonna go in, we're gonna look at the satellite imagery, and we're gonna count the number of cars in parking lots outside of retail locations.
And by counting the number of cars in parking lots outside of retail locations, you can actually look for when uh there's gonna be changes in earnings and more accurately predict sales figures.
uh for these retail and st retail retail chains. You can look for downward or upward trends. You can look for shocks, you know, y all these things, like basically anything that doesn't match what you'd expect, you can say, In what direction does it not match what I expect and and therefore I expect a positive or negative surprise uh when they actually release their figures next quarter?
So what they did is they just did that. They said, you know, we're gonna count the cars, we're gonna say if we expect a positive or negative surprise, and we're going to enter into positions uh based on that positive or negative expectation. And and and I just think that's a pretty kind of like pretty futuristic and and crazy way of of getting at this information.
Yeah, satellite imagery is super crazy. I I in recently interviewed uh Michael Holsmore from Quantstart, who I'm sure you're familiar with. Um and he also spoke about that. It just totally blew my mind. It's um it's just so out there. I mean, while these these sorts of data sets like, you know, as the example here, satellite imagery, it sounds very sexy, it's very edgy.
Have these types of exotic data sets actually shown any sign of producing significantly greater returns than more traditional data? I mean, I know it sort of comes down to how you actually use it, but You know, broadly speaking, are you able to shine any light on that? So one thing that I will say is like my opinions here are kind of again, uh it's hard I don't have this data in front of me and I haven't done an analysis on Uh this particular data set.
So for like any particular data set, I don't know uh, you know, like when it was able to produce better returns, if it was able to produce better returns, how it was able to use better returns. But my understanding is that Basically, these types of techniques, uh and and another type of technique which I'll talk about in a sec in in in in a minute, but these types of kind of getting at these exotic data uh
were able to s you know basically be incredibly valuable uh to the point that um my understanding is that the the cars and parking lot thing is actually like washed out now because everybody started doing it. So it it's no longer a novel piece of information. It's just like kind of fully incorporated. But again, I'm not sure. I believe that's the case.
¶ Efficacy of Exotic Data and Oil Tank Shadows
And uh I was actually recently at a uh a conference in in Hong Kong put on by uh Macquarie Bank, which is I'm sure you're familiar with coming from Australia. They at that conference they had one of their presenters was um Terabella. which is a s a satellite imagery startup which was recently acquired by Google. And Terabella was showing some of the new imagery techniques that they had.
And not only can they create videos, satellite videos of like planes flying over the land, which is more for fun, I think, than than than than for quantitative techniques. But they were able to do things like um what you can do is for every uh GPS location and time of year you know exactly what angle the sunlight is going to be hitting the ground, right? And
Uh you can just kind of compute that and you can say, okay, well I know what angle the sunlight's coming from. So what you can do is you can look at the tops of oil tanks. This is just one example of like what they can do. You can look at the tops of oil tanks and say
I compute the size of the shadow on the top of the oil tank. And something that not everybody knows is that actually when you drive past these big oil tanks, the top actually raises or lowers depending on the amount of oil that's in the tank.
Um and that's just so that there's just not this huge amount of air in there. Um probably one of the reasons is to prevent a fire risk. Uh but the top raises or lowers depending on how much oil is in the tank. And so what you can do is you can actually compute For oil refineries and oil storage locations, how much oil is in the tanks?
And what they were showing is they can do things like look at uh Iran and look at the refineries in Iran and actually show like where they're moving their oil around to and look at the f refineries in the Middle East and show how the oil is being moved around.
and basically get the jump on big announcements of like, you know, uh cuts to production or increases to production or whether a company says they have more or less that they actually do, they can kind of they know that because they can actually just track where the oil is being stored.
And like as a result, I think that people have actually started putting up like tents over their oil drums so that the these satellites can't, you know, can't the Iranian government doesn't want uh the US knowing where they're moving their oil to. So That struck me as another example of like something that you can do with this data and and a kind of just a a a new
piece of information you can learn which has just been completely impossible to hold, you know, five years ago. Yeah, yeah. I mean it's again, it's just so crazy to think that people are are actually doing this. I mean it's it's just wild to think about
¶ Upcoming Series Topics and Q&A
But anyway, Delaney, let's uh let's leave it at that for now. Um so like I mentioned earlier, we're going to be doing another four episodes following this one plus a Q and A. Uh would you like to give just a maybe a brief overview for some of the upcoming topics we'll be discussing? Yeah, so I think The next four episodes are really designed to
try to get into um some of the major components of how a quant actually works in their day to day. And I think that the purpose of today and I felt that we actually were able to hit a lot of these points is just kind of um give people a sense of of h of what some of the differences are between kind of a more traditional trader and a quant and and what quants do differently and how quants think and what quants worry about. And now for the next four episodes we'll try to dive into
Uh specific parts of their workflow a little more. So something we'll talk about is what's known as alpha factors. And alpha factors is the idea of um by valuing everything in the tradable universe Uh, you know, can I produce a a ranking of things based on their future values?
uh that produces a trading strategy which has alpha. Um and if you if you don't understand what that means, unfortunately there's not really a greater way to explain it without really getting into it. So you'll have to wait until that episode. Um and then we'll have an episode on how, well, in reality Uh quants really don't trust any individual alpha factor.
uh to work well on its own. So they actually will end up combining alpha factors um together and and combining different methods of predicting stock movements together. uh to try to get a better picture of how the market's gonna move as a whole and therefore be able to, you know, tr make better trades. Episode four, uh, we'll talk about, okay, well let's say now you have this combined
uh alpha factor that you actually think works pretty well. Well the next step is actually taking its output and converting it to trades on the market. And we get into this this new game there of
uh risk optimization and portfolio optimization and the idea of how do you decide whether or not to enter into a new portfolio. And it turns out that quants have all sorts of definitions of portfolio risk and quantitative hedge funds will basically every trade they make before they make it will go through this extensive optimization and risk calculation which just says
Is there a better way to make the trade? And if we make this trade, will we be violating any of our risk tolerances? And if if if you know, if the latter is true, the trade is not made. So uh i every single trade will will kind of enter through one of these uh uh one of these uh these checks at at a at a quantitative firm often and and these things are done in the course of you know like less than a second.
um oftentimes way faster than obviously if you're doing something higher frequency. Uh and uh then episode five is, you know Machine learning is a really popular concept. It's a really popular topic all over the place, not just in finance. A lot of people uh constantly ask us, hey, how do you you're gonna have lectures on machine learning? Are you gonna do this with machine learning? You're gonna support machine learning on the platform? Um and machine learning is
Another one of those things where it's a really, really powerful technique, but it's also really easy to shoot yourself in the foot with machine learning. So we're gonna talk about that on that episode. And then
episode six, as you mentioned, I think we're just gonna have a have a chance for us to iron out some of the kinks that maybe came up and things I didn't explain very well or or others didn't explain very well. And and and here it depends on people's schedules, but We'll try to bring on not just myself but
We'll try to bring on Jonathan Larkin, who's the chief investment officer at Quantopian. He's worked at some really large financial institutions. We'll try to bring on Jess Stouth, who's our um VP of Quant Strategy and she's actually in charge of selecting which strategies are or are not included in the portfolios that we develop.
Um and uh Thomas Wiacki, who's uh our in-house uh director of data science, and you know, he's a PhD in neuroscience from Brown University and and works a lot on the strategy selection side and and also Max uh Marginot who's uh And I completely mispronounced his name there, so I apologize to him.
Um but uh he is someone we just hired actually to work on the lectures. Uh and Max is uh background is in statistics and he can explain some of this stuff really well. So we'll try to try to bring those people on as well to answer some of these questions more specifically.
¶ Accessing Quantopian Resources
Yeah, it's gonna be good. It's gonna be good. So where can listeners go in the meantime, uh, to get these resources? Maybe if you just wanna give that that URL you mentioned earlier again. Yeah, so for now let's go ahead and say the the e the best way to get these resources is gonna be quantopian.com slash learn, L-E-A-R-N, uh and that's gonna bring you to a page where
You can get started with tutorials or you can kind of try to dive into these lecture concepts. And I think I mentioned some specific lectures during today's podcast. um that you can dive into that were related to the concepts we discussed. And so you can find those in the lectures page. You can go there directly, quantopian.com slash lectures.
All right, Delani, well this has been a lot of fun, man. I really look forward to picking up from uh where we leave off right now on the next episode. Thanks very much. Yeah, absolutely. It's been a lot of fun. I appreciate the chance to to talk to you and indirectly to to many more people about this stuff. Cool. And guys, thanks very much for listening. We'll catch you on the next episode.
