Zoom and Enhance: The Sound Edition - podcast episode cover

Zoom and Enhance: The Sound Edition

Sep 12, 201533 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

How could a video camera pick up sound without a microphone? We look at the science of sound and how any object could become a mic.

Learn more about your ad-choices at https://www.iheartpodcastnetwork.com

See omnystudio.com/listener for privacy information.

Transcript

Speaker 1

Brought to you by Toyota. Let's go places. Welcome to Forward Thinking. Pay there and welcome to Forward Thinking, the podcast that looks at the future and says, but my words like silent raindrops fell and echoed in the wells of silence. I'm Jonathan Strickland and I'm Joe McCormick. And today, Yes, Joe, we're going to be zooming in and enhancing the topic

we've talked about before. Yeah. We we talked about zoom and Enhanced a while ago, right, we did one way back in Uh, you know, I haven't gone back and listened to any of our episodes from in a while, but I don't know. I feel like I'd be embarrassed to hear myself. Then we have learned a lot about podcasting. I've forgotten more than I've ever learned. Wait what No, anyway, So, yeah, we we recorded one about zoom and Enhanced with image

imagery back in August two thousand teen. We're recording this at the beginning of September two thousand fifteen, so it's been more than two years. So clearly, first of all, we finally solve that zoom and enhanced for video and pictures right, That's that's done. Now you can take like a highly pixelated JPEG and turn it into a full motion three D video and it doesn't matter if that image was taken seventy years ago. You can still do it,

you know. I came across just recently. Uh, my wife Rachel and I have been going back through The X Files because it's up on Netflix, and and and uh Christian, our our colleague Christian who's also on Stuff to Blow your mind with me and Lauren, there have been like, you've got to watch the X Files, so we have, and we came across an episode that has probably the most egregious and ridiculous case of zoom and enhanced I

have ever seen in any show. It's an episode where there is a character who does psychic photography, so he like gets near a camera and whatever is going on in his head ends up on the film, and it's this polaroid of somebody's vision of some hellish nightmare escape with these ghosts called howlers screaming around a woman's face, and then in the background there's this little blur and and Moulder goes to the lab where they zoom and enhance the little blur and then they get this perfectly

resolved picture of a guy's face from a polaroid. Yeah, seems like that might be a little far fetched. Um. Yeah, as it turns out, we do not have this magical capability. Things have improved since two thousand thirteen. You know, the basic premise of zoom and enhances all about you take you take an existing video or image, you concentrate your view on one particular sector of that image, and then you zoom that in and you are in and swing the pictures so that it is more uh, that you

can see what it is. And for one thing, we've got super high resolution cameras now, so there's some that are so high resolution that a normal view you'd be looking at you know, like a like a YouTube video, something along that size. And then it turns out that the resolution is so high you could digitally punch into parts of that video and not lose a lot of resolutions. So it gives you the effect of zoom and enhanced. But really it's not enhancing. It's just that information is

already there. Yeah, and that's the key. The information is already there. We have never gotten and we will never get to a point where there can be information retrieved from an image that was not recorded in the image. Now we can have it simulated. Right. That's exactly what we're going to talk about today, is finding ways to use what information is recorded in the image and do

very smart things with it with computer programs. Right. And the really cool thing is that we're not talking about actual like images as the end result in this case, right, we are talking about using images to reconstruct sound. Yeah.

And this all comes to us because we watched a TED talk in which a computer scientist named Aid Davis talked about a really interesting project that involved cameras and uh not necessarily well, I guess inanimate objects really and being able to reconstruct sound that took place near that inanimate object as if that inanimate object itself were a microphone. Say that all again? All right, so, um, there's this guy named Aimed Davis. No no, no, no, no. The

relevant part it does what sound? So yeah, it acts as if any object itself is a microphone, not in the sense of amplifying what was said, but in the sense of of being being a record of what was said, Like the vibrations of the material itself are able to inform us enough that we can we can replicate the

sound that was created. Yeah, he's he's part of this team that includes other researchers at m I t he's a he's a grad student or possibly a graduate of m I t UH and also scientists over at Microsoft and Adobe. And this is so cool. Yes, this, So this is a synesthesia machine. It sees sound. This it's like it took a bunch of LSD and began to

see the music. Except that's what literally that they can do. Now. Yeah, it's and and when you start to break down what's going on, it starts to be less magical, but no less amazing. All right, So let's let's take some of the mysticism and magic out first. And to do that, we have to talk about what sound is, which we've done on this show before, but I'm going to give a quick rundown. So sound is really the energy of vibration, right, So when something vibrates, that's when it's creating a sound.

And as long as there is some sort of medium for that sound to travel through, such as air exactly, water or the wood of a table that you've put your head down on, right, any of these things. Metal mask around your head hatchet. Now you're just giving Aaron Cooper father for the next next image. Alright, So but yes, as long as there's a medium through which sound can travel, it will travel as far as it possibly can before

the energy has essentially dissipated. And this is why sound does not really does not travel in space because space is effectively a vacuum. So there are no particles, there's no medium through which the sound can travel. But here on Earth we've got air, thank goodness, because this is where I keep all my stuff, and air can act as a medium through which sound can travel. So what happens is when something vibrates, h, it begins to pull

and push uh, the air around it. So if you imagine a vibration, some of those vibrations are going to move inward based on your perspective, that's gonna pull air toward it. Sometimes it's going to be moving outward, pushing air away from it. So think of like a a vibrating string on a guitar or a vibrating drumhead. Uh, that's going to be pushing and pulling air. Now that air, in turn, is going to be pushing and pulling the air molecules around it, and so on and so forth.

It's this great, big chain reaction because our atmosphere is a giant fluid, right, It's it's a gas, but it's it's a It acts as a fluid. So these various molecules will continue to push and pull, and then eventually that motion will make it into the air inside of your ears. So it's not that the air molecules that were next to the strumming string on the guitar have magically made their way to your ear. It's rather that that motion has continued to move up at the speed

of push to your to the air inside your ears. Sure, now at that point it ends up vibrating your ear drum, which then goes through this whole complicated series of maneuvers where you're talking about tiny bones and the cochlea and fluid and we're not going to get into this to how hearing works. You can actually read an amazing article at how stuff works dot com on how hearing works that explains it. But our brains ultimately interpret this motion

as sound. Now, of course, the key fact here is that sound is vibration, and that vibration is something that you could in theory. See, yeah, if you could see fast enough, and if you could tell what you were looking at. Yeah, if you could see with the ability to really notice minute changes. I said fast enough, I should have said I guess fast enough enough frames per

second and with enough resolution ship Yeah right. So, uh, you know this is like there's certain videos where if you do you know, high speed photography, high speed film, you can see how how something like a tuning fork, when you strike it and it's vibrating, you can actually see how it's moving in and out of its normal alignment, and it looks really freaky because when you just look at it with our normal eyes are normal ability to perceive, it doesn't really have that, you know, you don't see

it distorting like it does in that high speed video. But uh, if we could see it, and if we could then interpret those vibrations, if we if we knew, all right, it's vibrating at this speed and this amplitude, that would tell us the pitch and volume of the sound that was affecting it, if we knew enough of

the properties of the material itself. So that's the basis of the experiment that these folks from in my Tea were following, and it was all about kind of pointing a camera at an object, a camera that was capable of detecting these minute changes, these these movements of that object, and then feeding that through a computer that had an algorithm that could interpret those changes as sound and then reconstruct the sound that must have happened to produce those changes.

And the results are pretty amazing. Yeah, I have to say I was really impressed. I am astonished. Yeah. Yeah, it was one of those things where well, first of all, uh, they decided to use Mary had a Little Lamb as a lot of their their you know what they would try to record, right, which is a throwback to experiments that Edison was doing way back when. Yeah. The first, the earliest recording that we know of that tom as and made dates to eighteen seventy eight. It was on

a device that recorded messages onto tinfoil. Interestingly, the scholarship suggests that it wasn't Edison himself that provided the voice. It was probably someone else, but the voice says he made. The voice says Mary had a little Lamb, And it's very loud and very deliberate because the technology was brand new and it was not high fidelity by any stretch of the imagination. So this is kind of a a sort of genuflecting to history, saying, well, that this was

a significant moment in history. We're going to use that same, uh, that same idea when we're trying this new experiment. And it worked like that. They did both tones of the song Mary had a Little Lamb, and they also did spoken variations of Mary had a Little Lamp, which in the Ted talk I highly recommend you watch it. It's very entertaining. Abe Davis is actually very entertaining presented. He talks about how he he shot a you know, one of the videos. He shows the video of him shouting

at an empty bag of potato chips. Yeah. Yeah, it is a very technical experiment that definitely involves M. I. T. Grad students yelling the lyrics to Marry had a Little lamb at empty bags of potato chips. Right. He even talks about how, you know, originally they wanted to have the best possible um a chance to be able to pick up these vibrations. They knew that these uh, these vibrations were going to be tiny like a micrometer, like like a tenth of a micrometer. Yeah, that's super tiny.

So they wanted to be able to get that with a pretty high resolution high speed camera, and they had to use a lot of light because these high speed cameras that you know, the shutter speed is so fast that you need a lot of light to light your scene in order to get an image of what you're you're pointing the camera at. And he even talked about how the lights were so hot that on a previous experiment they melted the bag, the empty bag of potato chips as a result of this. So it was a

lot of trial and error early on, but it worked. Yeah. So, so they were using objects like like bags of potato chips, empty bags of potato chips, and potted plants. And the camera that they were using for these first experiments was was a high speed camera could capture at two thousand to six thousand frames per second, which is a higher frequency than the audio signal. But it certainly isn't like like the highest possible end high speed camera on the market,

Phantom or anything. Yeah. Yeah, yeah, the the highest speed cameras run something like a thousand a hundred thousand frames per second. Sorry, And the software could could pick up these tiny, these tiny, tiny movements. Um a tenth of a micrometer is something like five thousands of a pixel, and it could do that thanks to very subtle changes and in each pixels color values at the edges of

the objects that were being studied. So yeah, and it's also that he pointed out in the TED talk that it's not like the camera was pointed at one particular, tiny little edge of one of these objects. It could actually take into account all of the different vibrations happening across the object, and that collectively that provided the data necessary for them to be able to reconstruct the audio. Right.

It's rooted in research from from m I t S Computer Science and Artificial Intelligence Laboratory, and the software that that team was developing was originally intended to amplify color changes in video, but then they realized that that it could thereby amplify motion, and so they bent it to to UH tasks like monitoring blood flow unobtrusively and then show that in the video right that they show the

pulse of someone's arm. Because of these minute changes, they're able to amplif by that to the point where you can actually see the pulse, which is, by the way, a little freaky sure, but also pretty cool. And and so this this new team, this this acoustic team in built on top of that software, adding the algorithms that would identify the whole object and monitor its overall movements

in order to create the the sound goodness. And it was interesting because once they determined that they could capture the motion under those quote unquote ideal circumstances with the bright lighting and the high speed camera, they started to test how far outside of those ideal circumstances they could still capture meaningful information and be able to replicate the

sound that occurred next to that physical object. And you know, the idea being that you would be able to replicate sound even if there were no microphones, no official microphones working. So they it up testing it with normal daylight, providing

the lighting and shooting through a soundproofed window. So the camera was on one side of the window, the object was on the other side of the window, another empty back of chips, yep, and and that's where the sound was generator was on the other side of the soundproof windows. So uh, in theory, there shouldn't have been any sound bleed over into the camera and they could still pick up sound that way clearly. Yeah. Um, and with normal

indoor light. They filmed a pair of earbuds like normal plastic, cheap out earbuds, and then reconstructed the music that they were playing well enough that they successfully shazammed the music and it was under pressure, which I realized now that I should have used a lyric from that from the beginning of the show, but never mind that. Uh They also furthermore found that they could use a standard camera,

not a high faluten camera. And we're talking standard like like sixty frames per second smartphone camera or you know something. You could run out and buy a target. And this is thanks to a quirk and how standard digital cameras

handle fast moving objects. It would be more accurate for for them to read measurements off of your whole array of photo detectors at the same time, but that is kind of expensive, so uh so cameras that are less expensive than than super high speed cameras instead read off of their photo detectors one row at a time, sort

of like scan line televisions. And it does this very quickly but not instantaneously sure, and it can lead to that weird lag that you might have noticed in some videos of high speed objects, like sort of jagged edges or extra pixelation when the object is moving faster than the software can can handle. Yeah. I think in the m I T article we read, they use the example

of a rotor blade of a helicopter. Right. Sure, sure, it might be spinning so fast that it's not going to capture it the same way your eye would see it, but it's going to scan the blade in a different position each line, right, and uh on, on a much smaller level, invisible to the naked eye. This flaw in in normal cameras creates visual artifacts that the researchers found out that they could use in order to measure subtle vibrations.

So the audio reconstructions that they got out of this experiment weren't as close to the original audio, but the researchers did report that they could probably still identify like the number of speakers in a room, or the identity of a speaker, given that you have an audio profile of the person's voice to begin with. Yeah, so, uh definitely a little more like um muted and a little more distorted, to the point where if you had not already heard what was being said, you might not be

able to necessarily reconstruct it. Um. Our brains are interesting, right, Like if we hear what we're supposed to hear, and then we hear the sound played, were more likely to pick it out. Uh. This is something that you find and people who claim for for like you know, hidden

messages and backmasking and that kind of stuff. Yeah, if you if you listen to the raw sound file without any prompting, you may end up saying I didn't make anything out and then someone says, oh, you need to listen for the phrase, um help me, Jonathan has me trapped in the basement. You might end up hearing it. I try very hard to make sure don't hear it, but you might hear it. Okay. So anyway, that's when you play under pressure backwards. Oh vo vovo oh, Freddie Mercury.

Other methods that you can use to pick up sound

from a distance. Uh. Some of them use a similar method, Like there's one that uses lasers, and in fact, at the TED talk, Davis actually says this particular approach, a lot of people might immediately spring to the conclusion that you would use it to spy on someone, you would aim a camera in at something that would be kind of unobtrusive, like a potted plant that happens to be near a person's desk, and then you could end up replicating conversations that went on inside that room by just

measuring the vibrations of that plant and then running it through this algorithm to recreate the sound that happened. Most evil plots take place in a room with a potted plant. Yeah, you know. Uh, yeah, that's why I only use cacti now, But even they have have been deceptive. So the the point that Davis makes is that this is pretty low fidelity. If you want to do something like that, and if you really want to do that, there are alternatives that

provide much higher quality recordings, uh, mainly using laser microphones. Now, you guys heard of laser microphones before we started looking into this, No, I don't think I had. It was one of those things that I had heard about only because I was looking through the Spy Museum in Washington, d C. And And read about them. So they work on a very similar principle, except instead of detecting the vibrations optically, what they're doing is they're using it's still optical,

but it's not you know, like a visual approach. Um you're shooting a laser out at an object and as that object vibrates when it is exposed to sound the returning laser light. Because you know, it's all based on shooting a laser out and then detecting when it comes back, the returning laser light will have slightly different arrival times than when it was sent out based on those vibrations of it's vibrating out, it's going to be a little

shorter than if it's vibrating in. And while that sounds incredibly minute, and it is, it's enough to be able to take that data, feed it back through and create a sound file based on it, so you could replicate things that are being said or other sounds that are

going on. And in fact, there are a lot of places that that due to their classified nature, due to the secrecy of stuff that goes on inside, they take great pains to try and obvious skate any view into the place, whether that's creating sort of a double glazing on the window so it disperses the laser beam and thus the laser beam can't get a good read um or other elements as well. So that's super spy stuff

that most of us don't have to worry about. But as Davis would say, that is a more relevant fear than someone using a camera to look in Um, that's not as likely. I mean, why would you do. It's expensive, but you do have to get your access to the algorithm though. That's true, that's true. No, I've got another technology in mind. That's where you just grab somebody from the room and throw them in a van and demand to know what was said. That's not so much a

technology as it is a valid strategy. Hey, vans are a technology. You could just as easily throw them in the back of a horse drawn carriage. I mean, it's just which are also a technology? Options are list uh there of course also long range microphones. You probably have seen these advertise in the back of a comic book.

Parabolic mic. Yeah, that's that's the really popular one. Like that's if you ever see, like the spy kits that are made for kids who are interested in this kind of stuff, there's usually some sort of parabolic mic and involved in that. Parabolic mics are meant to be sensitive and directional. They're not terribly good at being directional, but

the really high powered ones are fairly sensitive. Uh. So the idea is that you're you're concentrating on a specific area to try and pick up sound from that area while trying to block sound from as other from all the other directions as much as possible. Right, Because as as you keep turning up sensitivity in any kind of recorder, you're going to increase your noise. Yeah, it's it's like

turning up the gain on a microphone. If you crank that way up, you're gonna get a lot of noise, and and it's gonna be harder for you to concentrate on the signal. Uh. There are a lot of different models of long range microphones, and they all have varying degrees of directionality and sensitivity and and uh, but they all work on basically the same principle, the idea of boosting an acoustic signal to the point where you can hear it where normally you would not be able to.

There are not very many out there that are uh powerful enough and and effective enough to be able to listen to a conversation beyond like fifty meters away, unless people are shouting, in which case you probably go a little further, but um, you know that they have their limitations. Whereas the laser, I mean, as long as you have a line of sight, you could in theory be really far away from the actual conversation taking place and be

able to replicate it. Also, depending on who the person spuying on you is, they could just listen to you through your cell phone. Yeah, there are a lot of other options right, um, but at any rate, so if this is not really good as a spy technology, what else could we use it for? Well, it could be used to reconstruct conversations for court cases and and forensics within the justice system. Yep. Of course it requires that you have had a camera in place of that area

in the first place. But yes, but many places do be paranoid everyone. It could help out in hospitals that the kind of software could monitor vital signs like like breath and blood flow from a distance, like we were talking about earlier, which would be especially useful in cases where you want to disturb the patient as little as possible, which I would I would posit as everyone, but but especially like infants. Yeah, theah people who who might not be able to alert you in a change like especially

a sudden change in their health. This would be very useful to monitor that type of stuff and get a very early look at it. Or hey, uh, laper scopy is pretty cool putting cameras where cameras don't usually go into human bodies, and you could use this to take more precise measurements of what's going on inside of a human body. Joe, have you ever wondered what your spleen sounds like? I don't need to wonder. It's voice whisper sweet nothings in my ear. That's better than my answer,

which was I can show you, all right. So, going back to the Ted talk, Davis actually talked about how this could even tell us more about materials themselves and how they behave, which was really cool. The idea that it's kind of like reversing the the approach that they had been taking, where they were looking at the vibrations

and determining what sound was being made to cause those vibrations. Instead, they end up measuring vibrations and then figuring out how that material would behave uh, depending upon the different types of stresses being put on that material. So the first example he shows was a it was a wire uh, like a wire sculpture of a of a human figure, and they had it on a on a surface like it's on a little pedestal and it was resting on

like a shelf table or something. And you see a hand come into frame and bang on the surface quite a few times to create little vibrations in the wire frame.

It was. It was to the tune of shaven a haircut two bits, And they ended up measuring this with their camera equipment, but instead of trying to recreate the shaven a haircut knock, they did it to find out how this material behaves when stresses are put on it, and their algorithm could then simulate that material's behavior and they could virtually drag any part of that material and then let go to see how it would snap back into place, uh, creating kind of this virtual model but

using real imagery. It was really interesting. Yeah. Yeah, And the this little wire figure was not the only thing that he showed a test of, right, he showed a bush. They ended up shooting video of a bush as it was moving through because of I believe it was a breeze that was just blowing. Yeah, it was just a shrub outside of his apartment, I think, and then they were able to with getting enough of that information, they

were able to then recreate that as well. And so you have this image of the bush and he could pick any point of it and virtually pull on it, and it would show how that bush would move, and it looks like it's just video that's been shot of this bush as someone has you know, kicked it or something.

And also showed a like a curtain or a tapestry hanging and how that would move based upon the same sort of thing, like based upon the source of stresses you could put on it, whether you're tugging at one corner or the center or whatever, how it would ripple back into place. And this was based on extremely small, like like invisible emotions to the eye. In the case of the curtain, it was just just tiny currents of air that we're moving through the room. And so it

was really really striking getting to watch it, you know. Artistically, this kind of thing could let filmmakers adjust the motion of objects after a video has already been shot, like if if if hair was was blowing the wrong way in in in a shot, because you've because you've kind of you've kind of plastered a couple different a couple different frames together or something like that. Then you could

move the hair in another direction. Do you remember that phase in the history of video games, video games when they were no when everyone was obsessed with hair blowing in the wind. I don't think we're over there. Remember it's happening today. I mean, that's like the thing that all graphics boasting referred to. Look at this character's hair. It's the hair and it's water. Doesn't like the water

effects and the hair effects. But no, I was gonna say, like, this could create a new era of full motion video video games. Finally, finally, we've been waiting for it exactly. The reason we don't have that yet is that we don't know what to make these characters look like when they're supposed to jiggle. I am looking forward to the resurgence of the emotion video video game because I need to make sure that I have a transition plan for

when I am finished doing my futurism. I need to have another career to move into, and I think FMV actor in low budget video games is my niche. You know, if Tim Curry can do it, you can do it too. Yeah, nowhere near the panache that Mr Curry had in things like that Frankenstein game. Well, work on it. Also, your listeners over tech stuff would be so happy. That's true. That's true. So we will see if that day arrives. But more practically, it could let engineers test structures for whear,

like like predict damage from earthquakes, important things like that. Yeah, it's really kind of an interesting way of advancing our knowledge of material science in an indirect fashion. It's pretty amazing. Uh. Also could tell us things about like resonant frequencies, that sort of stuff, which again is very similar to making sure you're you're in engineering things properly in the case of earthquakes or you know, we've we've heard the horror

stories about bridges and resonant frequencies. That kind of stuff really fascinating And uh, we're sitting in an audio studio right now. I wonder if it could have applications in trying to figure out how to best insulate how the best baffle sound studio, which we often find baffling. All right, Well, on that note, members of the audience, have you ever heard drilling in a forward thinking podcast? We tried to edit most to the drilling out, but sometimes some drilling

might get through. Yeah, I blame Ben Bolan and his search for lost gold, but uh, you know. Anyway, Guys, if you have suggestions for future topics we can tackle here on Forward Thinking, whatever they may be, you should write us and let us know. The email address is FW thinking at how Stuff Works dot com, or you can drop us a line on Twitter or Google Plus. We are FW thinking at both of those places. Our

search FW thinking and Facebook will pop right up. You can leave us a message and we'll talk to you again really soon for more on this topic and the future of technology. This is forward sinking dot Com, brought to you by Toyota. Let's go places,

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android