Get in touch with technology with tech Stuff from how stuff Works dot com. Hey there, and welcome to tex Stuff. I'm your host, Jonathan Strickland. I'm an executive producer with how Stuff Works and I love all things tech and in our last episode, I talked about how web analytics work in general and why they are important both for people visiting a website and owners of websites and the advertisers who support websites and the companies that advertise through
these advertisers. They really help website designers also get a better understanding of how their users navigate and consume stuff on their sites and allows the web administrators to tweak things to make the experience better. So it's not just
about advertising. It's also about how can I make this website easier to navigate, more intuitive, more interesting, more exciting to use, or more useful or whatever the purpose of the website is that benefits the visitor, makes the experience more satisfying one, and it helps the website administrator also monetize through web advertising. But now let's get to the other side of the coin. Tracking information obviously brings with it some very nasty potential problems like threats to privacy
and security. Information is incredibly valuable. It is the currency of the Internet. You might thought it was bitcoin, it's not. Data is your currency. And generally speaking, the more data a company can get about people who are using the web, the better it is for that company, not necessarily better
for the people, the better for that company. Knowing information about a person means being able to sell to that person more effectively, or it might mean being able to exploit that person in less legal or ethical ways, and so they out of gathered about users can become a tool or a weapon, depending upon the type of information gathered and the will of the person who has access
to that information. So ideally you don't have any bad actors out there, and even if people are gathering a lot of information about users, they're not trying to put it to any malicious purpose. Before I dive into a detailed account of web analytics and privacy, I should say that not everyone is out to scrape every bit of data off of users or to figure out the identity
of a specific user. Many analyzes are more focused on identifying emerging trends rather than singling out one specific user, So the goal is not to look at that data like a browser's history, like looking at the cookie information and saying, oh, this person went from X website to HY website to Z website and then come to the
conclusion of that must be Jonathan Strickland instead. More often than not, these analytics companies are looking at aggregated data that is, at least on the service level, anonymous, and the purpose is to see more valuable information, such as rose Gold is so totally in right now, so put all your rose Gold products on your main page because
people are gonna go nuts. Right This really dates this podcast because I'm about about two years out of touch, so it tells you this one should have come out two years ago. Anyway, this concept makes sense when you're thinking of big sweeping strategies, like which products you want to feature on an online stores homepage, or which news stories are likely to be thought of as the most important and relevant on any given day. So you might look at something like Google Trends and say, oh, well,
a lot of people are searching this particular term. Let's create an article about this thing. We can inform people, we can make sure it's a really good article, but we can also take advantage of the fact that people are interested in this idea right now, so it's kind of a mutually beneficial experience in the ideal. But it would be silly to say that no one's interested in your individual preferences, because that's not true. There are people
who are very interested in your individual preferences. For one thing, it can help identify what different groups of people like, so a company could present those different groups with distinct experiences that were meant to appeal to that group. Right. That's targeted marketing or targeted advertising. So let me give an example. Let's say I run an online store, and I've coded my home page in such a way that it can dynamically display different products based off the information
I glean from analyzing a user's behaviors. And my site uses cookies and JavaScript, and those analyze the are and it presents the most appropriate products for return visitors. So when you pop into my store, I happen to know that you recently started for Star Wars toys because the cookie information that I've installed on your browser from your previous visit has told me this, And so I have some Star Wars related products that I want to prominently show to you in my homepage. Now when I say
I want to, all of this is done automatically. You've got all this meta information, these tags that computers can use to sort through and select to present what appears to be the most appropriate products that will appeal to the visitor. Now, let's say your buddy shows up and your buddy is not as into Star Wars as you are. Your buddies like a big clover Field fan, and your buddy visits my online store and see is a totally different selection of products than you do when they pop on.
Maybe your buddy is visiting my store for the first time, in which case I don't have any for any information about him or her. I don't know anything about this person because they've just come to my website for the first time. Now, they come there and I decided to pop a cookie on their web browser, so i'll know the next time they come through. But this first time, it's a blank slate. That means that my store is probably gonna show them a pretty neutral selection of products.
Maybe there will be some of the most popular products that happened to appeal to a broad spectrum of people, but they aren't targeted toward that specific person yet. Because I don't know what that person's preferences are. But as your friend navigates through my site, I'm collecting more and more information about what they like based upon their behaviors, and then I can make sure the next time they come to my website that it serves up a more
appropriate landing page for them based upon their preferences. Again, and I say, I decide this is all automatic. Let's go a step further. Let's say that you are running a blog that has online advertising on it. So you've got spaces on your blog that are reserved for advertising, and the ads themselves are tracking users with cookies and JavaScript. Most ads come from brokers who have numerous clients, right, So let's say that you go to a blog and you see an ad for a popular soft drink company.
That ad did not come directly from the soft drink company. More likely than that it came through an advertising company that has that soft drink company is one of its clients.
So the brokers, these companies that have thousands of clients representing all these different industries, can use this tracking information in cookies and JavaScript to determine what stuff you're most likely to respond to based upon your browsing history, so that means the broker could potentially serve up ads based on the information to help improve the chances that you'll find any given ad more useful and click on it.
In these cases, the experiences are personalized, but that personalization still is not dependent upon your identity per se. I mean, it's based upon what you like and what your behaviors have indicated you find valuable or interesting. But it's not like that specific data is identifiable stuff like your name
or your address or anything like that. Although they can at least get an approximation of your address based upon uh your IB address, so that that could at least know generally where you were, um maybe more specifically if you're happy to use a mobile device and you have location tracking on, or as it turns out, you don't
necessarily have to have location tracking turned on. There was a recent story from uh AP that looked into this and said that Google Android devices would check in with Google an average of fourteen times an hour, giving information about location even with location services turned off. So that's a kind of tracking information that definitely rubs people the wrong way, very valuable information. If Google wants to serve
up ads to you. That's that are based on your your locale, but not very comforting if you're thinking about I'm just carrying my phone around. I don't need my phone telling Google everywhere I'm going throughout the day. Now, there are instances where a company, an agency, or a government might want to identify someone based upon their browsing behavior.
For example, let's say that there's a crime that's been committed and law enforcement has come into possession of a computer that they believe belonged to the perpetrator of that crime, but they still don't know who that perpetrator is. They've got they've up the computer, his or her computer, but they don't know who that person is yet, and there's no overtly identifiable information on the computer's hard drive, no fingerprints,
that kind of thing. Would it be possible for an investigator or an analyst to be able to figure out the identity of the computer's owner just through that person's browsing history. If you looked at the information of what websites they went to, would you be able to figure out who it was that owned that computer? Well, setting aside the possibility that the perpetrator had remained signed into any services that would link back to his or her identity.
The task would require the analysts to look at the patterns of behaviors and the browser history to figure out what had the person had that computer's keyboard been doing. It's kind of scary to think about this, but this is totally possible to do. It's built upon the same principles that were used to support e commerce. Back in two six there were some Russian analysts who proposed a method of user profiling that would create profiles of users
based on their browser history. So you would get shoveled into progressively smaller groups based on your behavior. So you know, initial analysis might put you in one of several broad categories, but the more specific behaviors you exhibit, the more specific the groups could be that you would be sorted into, and that would represent profiles. As word vectors, that's a method to assign context to words that ties into natural language processing. I did a couple episodes on those a
little while back. The researchers use those word vectors to create clusters of topics in a hierarchy to determine or determined by rather user behavior and the stuff that users valued. More as demonstrated in their behavior by following links or staying on certain pages for a longer time, or making searches would occupy a higher place in that hierarchy, and that was one way of identifying users, at least by interest. Now again that didn't assign a name yet, but that
was a building block towards this. There's a two thousand seven paper I read that described a different approach that could predict a user's gender and age based on his
or her web browsing behavior. The researchers created a model that relied on users reporting their age and their gender, so it's a self reporting kind of thing, and they would also give up access to their browsing history to this model, and the model would learn the associate to associate certain behaviors with respect to age and gender and
draw general conclusions based on that. And once it learned through this training process, it could then analyze an unknown users browser history and then predict that person's gender and age. I don't know how accurate it was. I came across this information all reading a totally different but related paper, didn't have time to track down the two seven document. By this does lead to the way law enforcement might use user profiling to identify someone based on their browser behavior.
I'll explain more in just a second, but first let's take a quick break to thank our sponsor. Before the break, I mentioned a paper that related paper was specifically about identifying a suspect based on their web behavior and it has the title Web user profiling based on Browsing Behavior Analysis. And in that paper, the researchers describe a method in which a computer believed to belong to a suspect is
compared to other computers that have known users. So law enforcement gets hold of a computer, they know that this computer was used by the perpetrator of a crime. They don't have an identity yet, they do have some suspects. They don't know if any of the suspects actually were
the perpetrator. So the goal is to take this target computer, the one that was involved with the actual perpetrator, with candidate computers the ones that suspects are using, and factors such as the specific sites that were visited, the time spent on every site, the order that the user would browse the sites. All of these things are taken into consideration, and at the heart of the matter is the idea that we humans tend to be creatures of habit. So
here's how it would work. Investigators take that target computer and they perform a data extraction on the computer. They pull all the information they can off of it to get a lead on the identity, and includes the browser history and browser behaviors, and they analyze this. They have identified some suspects and those suspects may be using other computer s access online services, and those are the candidate computers.
So law enforcement gets possession of those candidate computers, presumably through a warrant, and they preserve they do the same sort of thing. They do a data extraction on each of those computers. Then they process all that information and they analyze it, and investigators determine which factors are domains of interest, like what what are the things in the target computer that could potentially be identify irs for somebody,
and they break this down into a vector representation. They wait each of the factors to assign each one in
relative importance. So, for example, awaiting might represent that the activity on the target computer showed the perpetrator repeatedly visited the same five websites, so those websites would be weighted heavier than others because the perpetrator had gone to them multiple times, and it might within those five websites, each of those websites might have their own weighting that is based upon the amount of time spent on those sites and the number of times that the perpetrator had logged
into them that are recorded in that browser history. These indicate trends and behaviors. Then you would compare that with the information you found from the candidate computers, and if you found one that demonstrated a similar browsing behavior as the one that was on the target computer, you can make an argument that the respective suspect may well be
your criminal, then you can consider them a lead. It's not exactly a smoking gun, but it's certainly says this person browses on the Internet exactly the same way as the person who owned this computer, and we know the person who owned this computer committed the crime, and it can lead you into a more specific investigation. In two thousand and seventeen, Gizmoto ran a piece titled Here's all the data collected from you as you browse the Web, and it was written by David Neild and I really
recommend checking out this article. Again, it's called here's all the data collected from you as you browse the web. It's great piece. I'm gonna kind of go over it here a little bit. Neil points out the type of data your computer can share with sites on the Internet, and as he mentions, it can include all of the following. Your IP address. Now that makes sense. The IP address corresponds to your computer or your router UH or a router. It's necessary so that a site knows where to send
the data that you've requested. So if you visit a website your typically you're technically sending a request to a web server. The server has to know where to send that site otherwise you'll never get anything back. But an IP address can provide information that gives the site owners a general idea of your location, not specifically where you are, but generally where you are. Then there's the type of system you're using, such as whether or not you're on a phone or a tablet, or a computer or a
gaming console. UH. This is what will also typically include information like the operating system that you're using, the display resolution on the device you have, what processors your machine might have like CPU and GPU, and the specific types like how many cores that how much processing power that kind of stuff, Which browser you might be using, what plugins you have installed in that browser, your devices battery
charge could be part of the information. All of that is part of the information that that your machine is handing over. In this exchange, Neild also mentions the web page that will let you know all the data your browser since two pages. By default, that site is called web k dot robin linus dot com or linus if you prefer, it's w E B k A Y dot R O b I n l I n us dot com. So I went ahead and checked it out just to see what would say about my connection here at work.
So it knew my work computer is running when seven, yeah, I know. It also knew that I was using Chrome as my browser. It identified the GPU and the CPU for my computer. It knew what resolution I had set my screen. It knew my laptops battery was at a charge because it was plugged into a docking station at the time. It identified the I s B my office uses. It identified the download speed I had available to me. It estimated my location. It was off by a couple
of blocks, but it was in the general area. It identified which social media accounts I was logged into at that time. If it had been a mobile device, um, it would have also told me about my devices orientation, like whether it was in portrait or landscape mode, and more information like that. And then yield linked to another site called click that one can monitor mouse movements and mouse clicks and how active you are with a site. I visited this one too, and it was kind of creepy.
It's just find in a way to actually reveal to you how much information is being sent to a website. So there's actually a voice that talks to you, prerecorded stuff that's meant to be a little unsettling, and it sends you information telling you, oh, you just move the mouse to the right, you just moved it to the left, You've sat still for thirty seconds, You've been viewing this
page for a minute. So this is all information that could be sent to a site like they could actually monitor where is your mouse moving across a web page, which again gets a little creepy. Right now, there are legitimate uses for that kind of information from a website design perspective, it could tell you a lot about the
sort of things users find attractive or interesting. About your website, but there are also potential misuses and legit analytics firms won't use information to compromise users privacy, but not everyone's legit. Here's another example. Let's say that you are in a faery person. Actually, I'm not gonna say that you're a
nice person, you're not nefarious. Let's say there is a nefarious person out there, and this nefarious person has installed some rogue JavaScript on a website, then has tricked people into going to it, and is able to give certain bits of information that appear to include compromising information about the user, and they're able to contact the user to send a message out to that users perhaps their email address or something on those lines, and through this method
of contact, they are trying to blackmail the users, saying I have dirt on you because I know that you've visited such and such website. Maybe it's an adult content website,
maybe it's a website that's about a sensitive subject. And they're able to tell this from the cookies or the JavaScript, and so they're sending a message that's essentially saying, if you don't cooperate with me, I'm going to reveal the information I have about you, Now that may not be that they have any real information about you, anything that's
of any real damaging worth. But they're trading on people's natural fears and and they know that even if not of their attacks are going to be successful, at least enough of them will be for it to be worthwhile. So that's one way someone might make nefarious use of this kind of data. I'll talk a little bit about some ways that governments and companies and individuals have tried to protect themselves and others from this kind of abuse in just a second, but first let's take another quick
break to thank our sponsor. Now, there are some laws in place that help protect people from predatory use of their data. In the United States gets a little loosey goosey. There's some state level laws in some places, but obviously those apply within a state, not across the entire country. There are a few federal protections that are in place. In Europe, the protections are way more extensive. The g d PR resolution is an example of that, but it's
just one example of that. So in Europe people generally enjoy a better level of protection as far as uh their data security is concerned, and there are a lot of analytics companies out there that have tried to address these issues because they want to know. They want people to know, Hey, what we do is valuable. What we do actually is part of what makes the Internet work. As long as we do it with accountability and we do it with respect to your privacy, everything should be
fine and everyone should benefit. So one of the big pushes in the industry is to be more transparent about what which data points these rights are collecting and to what purpose, Like why are they collecting all this information? And it can't just be transparent. It needs to be worded in a way that makes sense. It's not buried in jargon and legal ease, because then just nine people just skip over it and they don't get angry until
something goes wrong. So being able to explain in blame language, hey, we are collecting these data points about people. This is how we're using that data. Here's how you will benefit from that use, and here's how we benefit from that use. If it's completely transparent, everyone is much less likely to get upset because they're less likely to misinterpret what is happening or to make assumptions about the worst right. So
tracking in itself might not be malicious. It's meant to make things better for everybody, but it's also very easy to misuse the information and data is valuable right, so it has actual real value to it. That means bad actors will go after it too. So what can you do on a personal level to protect yourself. One thing is that browsers have a do not track setting that you can enact. You can enable do not tracked track. Rather, in theory, that protocol would mean that sites would agree
not to track you. Now, I say in theory because there's nothing legally requiring sites to obey that protocol, so they might track you anyway. The more reputable ones probably won't, but other sites might not really give it any mind,
so it's not really the safest approach. You can try to browse in private or incognito mode and a browser lots of browsers allowed for this, and usually what that means is it will only load cookies for that current session, so you're not gonna have cookies save to the browser in this way, so that reduces a site's ability to track your information. Although the longer you stay on a site and the more you click around, the more information
you are giving that site. Uh, Incognito mode really only kind of a racist trace of your activities on that local device. So the computer you're using, the mobile device you're using, whatever that may be. Incognito mode really just keeps it from being you know, your activities being left on that device. Your Internet service provider will still see where you're going, because it has to in order to be able to send you the information that you're requesting
through the web browser. Um, you still have an IP address that can still narrow down where you live or where you're accessing the information from. If you log into a service like Facebook or Twitter or something like that, that's a dead giveaway. So this is a limited help. Another thing you might do is install browser extensions that limit active scripts from running on websites without your authorization. So there are extensions like no Script Security Suite that's
for Firefox, UH, their Script Safe that's for Chrome. These are extensions that put the control in your hands. So when you access a site that has one of these sort of invisible trackers on it or whatever, it'll pop up and alert you and you can choose to either allow it or to prevent it from being able to
track you. UH. At least in the JavaScript approach. If people are looking at their access logs, that's still gonna show that you've visited the site, but it won't give the kind of tiny amounts of data that JavaScript would. Tiny tiny is in focused, there's actually quite a lot of data. The Electronic Frontier Foundation offers up an extension for Firefox, Opera, and Android called Privacy Badger. This add
on blocks trackers and spy wear. Specifically, it quote stops advertisers and other third party trackers from secretly tracking where you go and what pages you look at on the web. If an advertiser seems to be tracking you across multiple websites without your permission, Privacy Badger automatically blocks that advertiser from loading any more content in your browser. To the advertiser,
it looks like you suddenly disappeared. End quote. So it does this by identifying which content sources are registering your presence on a web page, including the ads that are loaded on that web page, and as you go from one page to another, if it keeps picking up the same sources, that's an indication that you're being tracked, and those are the ones that will um it will stop loading into your web browser, and since it stops loading it,
the source can no longer get information about your activities, and it's like you just disappeared into thin air. But what about them virtual private networks. I'm gonna have to do a full episode about VPNs and why they exist and why they're important and when you should use one. I'll do one of those in the future, but generally,
in this context, they're mostly good for hiding your physical location. UH. The lokal ation will appear to correspond to that of the virtual private network, not to you, not to your real world location, because the web browser will be acting like the VPN is the source of the traffic, not not your computer, and the VPN handles it from that point to get it to you. So you would still get cookies from sites. They'd still be able to track your activities, but I would do it through the the
context of the VPN and UH. And since your behaviors are filtering through the VPN instead of your normal I s P, what you're really doing is trading one entity for another. Instead of having the I s P be the one monitoring all the stuff you're doing, the VPN could technically monitor all the stuff you're doing, so I guess then it just comes down to who do you trust more, the VPN or the I s p UM. The answer is going to be very dependent upon which of those entities are you're making use of at any
given time. So one last little bit about the pros and cons of tracking. Tracking is what makes online advertising work. So it's somewhat infuriating because online tracking gives us a really granular view of which ads work on which sites, and which ones don't. We learned about how different form factors can be more or less effective. You might find out that at A tests really well on site one,
but it fails miserably on site too. But AD B, which is for the exact same product, is at A, but it's a different design that one works great on site too. Or maybe you find out just by changing where an AD displays on a page it drives more engagement. The reason this is important is because running a website is not free. If it were, the world would be
a very different place. So companies like how stuff works dot Com have costs associated with them, right, and those are significant costs, not just like web hosting, but other stuff like off the space, lay salaries, healthcare lots and lots of costs. So if there's no money coming in to cover those costs, you won't stay in business. You go into debt. Eventually you go into bankruptcy. Uh. So you want to make money to pay off the costs, and you really want to make enough to make a profit.
I mean, that's what a business is all about, is making profits. So without profit, businesses don't really exist. And then the content goes away. So unless we move to a totally different model of the web, which probably be one where we have to pay for everything we want to access, everything would be behind a paywall, it would be really hard to continue to have web content. We have to have some financial means to support the content or else the content goes away. Same thing is true
for podcasts. I mean, the reason we have sponsors is to h to pay off the costs of producing these shows and posting the shows and continue to develop shows and make new shows. The ads support that, and hopefully the ads that we are choosing to place with shows are meaningful to our listeners, because if they're not, then it's not really doing anyone any good. And ultimately, you want the best possible relationship between content, advertising, and users.
You want something where everybody is happy with it, because otherwise, what's the point. The same thing is true with the website, so the tracking is very important to get that kind of information. It's kind of funny to me because classic media, your traditional media, things like television, magazines, newspapers, that kind of stuff, everything that has advertising in it, Uh, it's a lot harder to tell how well that advertising works,
how much impact that advertising has. With the exception of stuff like the Super Bowl in the United States, where people famously will tune in just to watch commercials, you really don't know how much attention is being directed toward commercials. You might be able to get some general ratings about how well a certain television show has done, but that
doesn't really tell you anything about the ads themselves. So it's funny to me that the traditional media, the advertising world, is very comfortable in that space and in the online space where we can actually see how well an ad does because we can see how many people click on it, how many people actually went through and said this is interesting, I want to know more, I want to be able
to buy this. We can actually see how effective that is, and somehow that makes it less valuable, uh in some cases, like the CPMs that are demanded and in direct mail, like sending stuff out in magazines and things that's way higher than what you typically see for most online advertising. Um, one of those things where a little knowledge can be dangerous.
I guess, very fascinating topic. And while you can go through and do those extensions and use VPNs and things and turn off a lot of the the elements that will allow sites to track you, if you do that, you also lose that of the benefits that tracking gives to users. That might be a worthy trade off for you if you really value your privacy and you don't want sites to get access to that kind of information.
But UM, you know it's it's it's just this kind of the way our online world works, and without some sort of transformative change, I don't see that being any different anytime soon. But it is an interesting subject. If you guys have any ideas for future episodes, I any sort of topic you want me to cover, whether it's a technology, a company, a person in tech. Maybe there's someone I should interview or have on as a guest host.
Send me a message. The email addresses tech stuff at how Stuff works dot com or drop me a line on Facebook or Twitter to handle it. Both of those is tech stuff hs W. Don't forget. Head on over to T public dot com slash tech stuff. That's T e e Public dot com slash tech stuff to get all your tech stuff merchandise needs. You know, maybe maybe you're sitting there thinking, I have a cup of hot coffee sitting here, but I have no mug to put
it in. Get yourself a tech stuff mug. They're pretty awesome. I've got two of them myself, And don't forget to follow us on Instagram. Don't talk to you again really soon for more on this and thousands of other topics because it how stuff works dot com. Who who Who
