How Shazam and Midomi Work - podcast episode cover

How Shazam and Midomi Work

Oct 28, 200927 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Shazam and Midomi are both types of music recognition software. Tune in as the TechStuff guys compare and contrast Shazam and Midomi and explain how they both work.

Learn more about your ad-choices at https://www.iheartpodcastnetwork.com

See omnystudio.com/listener for privacy information.

Transcript

Speaker 1

Brought to you by the reinvented two thousand twelve camera. It's ready. Are you get in touch with technologies? With tech stuff from how stuff works dot com. You've heard the rumors before, perhaps, and whispers written between the lines of the textbooks. Conspiracies, paranormal events, all those things that disappear from the official explanations. Tune in and learn more of the stuff they don't want you to know in this video podcast from how stuff works dot com. Hello, everybody,

welcomes to text stuff. My name is Chris Platt, and I'm an editor here at how stuff works dot com, sitting across from me as usual as senior writer Jonathan Strickland. Hey there, do you think people can identify our voices pretty easily? Well, I would imagine, so we don't sound exactly alike or anything at all alike, really, I know, But I mean, do you think people could listen to us, people who have listened to the podcast before and say,

I know that's Jonathan and that's Chris. Possibly, although I have heard at least one person claim after we did a phone interview that I was the only person who sounded the way I did on the podcast. But you know what makes you look taller. You were sitting further You were sitting further away from the phone, and it was on speaker phone at the time, so that may have played apart. But um, what we're getting to here is kind of working our way slowly around to the

topic we're gonna discuss today. But that actually comes courtesy of a little listener mail. This listener mail comes from Mason in Iowa, and he says, Hey, there, love the podcast. How about an episode on how Shazam and Medomi work? How does the program segment compare against a database and return result? It's so darn quickly, especially when I'm the

one doing the singing in Medomi's case. Well, Mason, first of all, I should I should point out I would be remiss if I did not point out that we have a sister podcast called Stuff from the B Side, and they actually did an episode about this kind of software. However, we're gonna tackle it ourselves because we actually tend to cover the same topics now and then we've we've done both done the electric guitar, so I see no reason

for us not to uh tackle this one. Sure, and there's uh you know, we can add a little stuff that they didn't add. Sure, yeah, like puns that in some updates about what's going on. Oh oh yeah, we can do that too. So let's how about which one do you us start with? Zam or Medomi? Um? Well, she Zam has been around longer, that's true. She Zam has been around since the early two thousands. Like two thousand two was when it first the company first started.

In two thousand thirty was really when the service started to get some attention. Actually I had it down as sam starting in two thousand really wow, okay, so I stand corrected. Yeah, the the earliest version of Shazam was an interesting version. You would you would hold your cell phone up to the source of music and you would uh send that to the Shazam service, and you would get a text message back identifying the song. And I can see why this could be considered like some sort

of weird magic. I mean, how could a service figure that out so quickly because it was usually just a matter of a few seconds between when you sent the the music and when or when you held the phone up to the music source and when you got the reply. Oh it's magic, you know, right. Uh, you have to believe we are magic. We can do this. You appreciate that you do in the Zany reference, well, yeah, you're welcome. I was gonna go with Loving Spoonful, but even I think, Sanna,

do we already lost everybody Loving Spoonful? I don't. There's probably three of you out there who even know who that is. So but yeah, I think you're right. And it's um stuff like this and in voice recognition. I think it's one of those things that just sort of surprises people because you don't think that the computer has enough I don't know intelligence to figure it out. Wait, wait a minute, that's a machine, and the machine figure out what I was doing there. Let's talk about how

Shazam does this. Now, what Shazam does It breaks down any recording into um, just some very simple data. They

call it fingerprinting. They fingerprints songs. And if you were to if you were to try and if you were to try and and and chart out a song from start to finish with all the different elements that go into that, like you know, you're you're essentially assigning data points to every single frequency in that song, you would end up with a fairly substantial amount of data yeah, and Shazam has something like one point seven one point eight million songs in the database, probably even more than

that now because that was the most recent data I saw too, but I think that data was yeah. Yeah. So what sasam does is they take the peaks and the troughs, the highest points in the frequency and the lowest points of the frequency to map out patterns and the fingerprint songs in that way. So they're cutting out a lot of the information in the middle and uh. That actually saves lots of space. It also saves time when you're trying to match one song or or one

clip to a database full of songs. So when you hold your phone up to a SEV speakers or uh, you know whatever whatever is making the music, um, it starts to uh. It sends that clip to the Shazam service, which then immediately analyzes it and looks at those peaks and troughs and tries to find a match in the database. Now, the reason why it's so fast is it Shazam keeps all of this in the computer's memory. It's not stored

on a hard drive. The the database doesn't have to you know, the process sucsessor doesn't have to search the hard drive to find a match. Everything's in memory, which means they have to have lots and lots of machines to uh to be able to hold all those songs within computer memory, or at least the fingerprints of those songs. And so once it finds a match, it then sends that data back to you, and usually it's it's fairly accurate. It's especially accurate for more modern music, the stuff that's

been out for the last couple of years. Um, if you start going back further, you may start encountering songs that just aren't not in the database, and so you won't get a legitimate match. And there are cases where if you have a song that is very similar to another song, it may come back with an incorrect match. But that doesn't happen that often. I suppose if you were to, you know, try and identify a Creed or Nickelback song, it would might come back wrong because all

those damn songs sound the same to me. I mean, seriously, kids, find some better music, all right, So get off his lawn, Yes, get off my lawn. While I listened to, you know, music that they made in the good old days, like the late nineteen seventies in London. Um, like the sex pistols. So not that not that modern stuff that's just junk as an email to anyway, there's there are they're actually

papers online. They go into very great to tail about the algorithms that they that Shazam uses to identify to match up songs and and to send the data back to you. Yeah, they go into quite a bit of depth and as our actually as our sister podcast pointing out, as uh John and Mark talked about, they used a

three dimensional graph to do this. Yeah, it's kind of cool. Yeah, it's called a uh, well you can call it a couple different things of time frequency graph where a spectrogram, um, sorry, spectrogram and uh, basically it's a it's three dimensional and that it goes it shows you over time how the you know, theeks and valleys of the song frequencies change, um. And that's kind of interesting that that it can that

it's able to do that in the first place. But that's how it's you know, that's how it's looking at it, and that's having the element of time in there is crucial because otherwise, um, you know, wouldn't be able to uh look at that Yeah. The the cool thing about this services that you can hold your phone up to any point of the song and and as long as you're able to get about fifteen seconds worth of of of content, then that's enough for sam to work with

to find a match. So it doesn't have to be the beginning of the song, it doesn't have to be the end. It can be at any point, and as long as it is able to map out those troughs and valleys accurately, then you should be able to get a pretty good result. There are some things that can cause some problems besides the fact that some songs do

sound alike. I mean that I was kind of making a joke there, but that could happen if you had if you're holding up a phone to a part where there was it was sampling another song and it was long enough, you could conceivably get the wrong result. But also if there's ambient sound that's interfering in the area, you may not be able to get a result because you know, it can't identify the song because it's getting

all these other sound inputs that are mixing it up. Yeah, I mean that's one of the things they sort of advertise. Wishes am is that you could take your phone around the mall, for example, or you know, on the radio in your car, um and hold it up to the speaker and it'll tell you what song that is. Well, I mean, if you're at the mall, you're dealing with all those people at the mall, You're dealing with perhaps conflicting songs coming from different places. We're acoustics, the distance

from you to the speaker. I mean, there are all kinds of things can interfere with that. Not to mention your cell phone frequence frequency, you know, cutting out suddenly you're yes, so I've uh yeah, I thankfully there was a visual queue there. You guys don't get to see because this is an audio podcast. Brows did go up about a foot. That was nice. I was like, I know, I'm not supposed to talk now. Um No, I've used this this application several times and uh and I have.

I've had varying degrees of success. There's a a theater I go to on a fairly regular basis at a stage play theater, not a cinema theata. I got to the Theatah fairly regularly actually, and they have some They have a pretty cool mix of of pre show and post show music that they play on the sound system, And there have been a couple of times where there's been songs that I played that I just I didn't recognize,

and Shazam is pretty good at identifying those. The hard part is being able to get a long enough section where people aren't talking loud so loudly that it's interfering with the sample, right, So there have been times where I've stood next to the speaker and held my phone up to it. But now there's problems with that too, because if the volume is too loud, then you get some distortion and then it doesn't really work in that

way either. But also there's this incredibly obscure Japanese song that they play that Shasam just doesn't seem to recognize. So I'm gonna have to break down and ask someone at the theater what the heck it is. Yeah, and then there's the the other factor, you know, the part where you're standing there holding your cell phone up to a speaker and people are starting to wonder what the

heck you're doing. But I mean, if I start worrying about what people think of me, now, I mean, come on, I've made it this far, why should I worry now? So let's let's kind of shift over a little bit and talk about Medomi, which is a little different from Shasanna. There's actually a few interesting uh and and fairly major differences, one of which is that the database that that Medomi uses is not it's not collected by the company necessarily,

it's user jim rated. Users of Medomi have sung songs or hummed songs or whistled songs into their phones and identified, you know, tagged the file as being a certain song, and the user community tends to comment on these, vote him up or down? Uh, submit their own version. Yeah, yeah, what are you kidding? That doesn't sound like that. Yeah, here's my version, because god lord, the one that's in

the database is terrible. And you could you could sing your own version and upload it to medomie and then people can vote on which one is the more representative version of the song. The wiki model, right, and the idea behind Medomi is that, you know, with Shazam, you really need the original source or you need you need to be able to hold the phone up to something that's playing the actual song. You can't necessarily sing it yourself,

or even if there's a band playing it live. That doesn't necessarily mean it's going to be able to identify the song because it's looking for a specific pattern, and depending on the way you sing it or the way the band sings it or whatever, it may not match up to anything in their database, even though the song itself maybe in the database. Because of this fingerprint technology, Madomi is different. You know, it may be able to track your the way you sing the song and find

a match within its user generated database. Now, the way Medomi explains that the how this works is a little um vague, I guess you could say, I'd say, okay, So in general, what happens is you submit the song and whatever format that you've chosen, whether it's whistling, singing, humming, whatever. The software they use converts this into a special length computer language they call Crystal language. Um, it's actually computer language is probably the wrong term, but it's their own

proprietary format format exactly. And so then it looks into the database and sees if there are any files that have a similar, uh similar style to what you just sang or hummed or whistled or whatever, and then it gives you a selection of songs that most likely are going to fit what you sang. I say most likely because um, it's this. Yes, I've actually played with Medomie. Now I don't as far as I know, Medomi doesn't

have an Android application. They may have it now, but at the time when I first heard of it, they did not have an Android application, so I don't have it on my phone. But you can use the service on the web. You can use it with a computer, so as long as your computer has a microphone, you can give it a try. And I decided to do this, and the first couple of songs I tried it was surprisingly very accurate. You know, I am not a good singer, as I'm sure many of you can imagine, my singing

ability is is very poor. Um. I can do character voices for certain musicals and that's it. But I decided to give it a shot, and the first couple of songs I tried came out pretty accurate. I think the first one I did was uh, Sedated by the Ramans and I want to be Sedated, and it came came back right away. But the more I tried, the more I was encountering um anomalies, or maybe maybe the correct

one was the anomaly for me. But I was having issues where I would sing something and I would get a result that was totally not what I was singing. The most egregious of these would be Blue Oyster Cults classic hit Don't Fear the Reaper. Uh My version, apparently to Medomi, sounds an awful lot like the girl from Ipanema. There wasn't nearly enough cow bell. No. I clearly was lacking the cowbell, and therefore Medomi was unable to understand

what it was. But uh so, but if I had think of it this way, if I had recorded Don't Fear the Reaper and submitted it to Medomi and then tried to do it again, it would have identified my

version of Don't Fear the Reaper as being correct. There you go, so that that's not, you know, difficult at all, just because they accord all your own songs, so that you can go and go, oh, what's the name of that song that I always yeah, exactly, Well, that that would more It would mostly be to impress or distress your friends, because you could sing a song that sounds nothing like the original, but it would come back and know what it was you were singing, because you were

the one who provided the template for that song. Do you see that that seems problematic for me? Well, because it seems like you could intentionally go in and sing a whole bunch of stuff and tag it and just drive people nuts. Well that's that's why you know, it's it's user user police, so you have to it's one of those services that depends heavily upon the community of users.

If the community is being very uh you know, honest and forthright, then they are going to police the different submissions and make sure that only the ones that actually and accurately represent the songs are the ones that make it into the database. Otherwise, uh, you know, they essentially say, well, this is someone who's just goofing around and trying to

cause problems, and they'll they'll nix it, right. So, um, but yeah, I mean, anytime there's a user, anytime there's a service that depends upon the community of users to to keep it going, it makes me a little nervous because you know, you never know when a group of people is just gonna get a little capricious and decide that you know, they that every song needs to be sung to the tune of the Yellow Rose in Texas and why not. Yeah. So, uh as for the new information, well,

I mean I didn't know if we were. That's pretty much the way that those and again it's it's referring to a database. It's sending you the results pretty quickly. Um. But they they don't Madonie doesn't share as much of it's it's uh back end operations as Shazam does, so we can't for sure. Men say that they use the same sort of setup where they you know, everything stored in memory as opposed to hard drive space or whatever. We just don't know because we aren't privy to that information.

But um, yeah, that's that's all I have on just the basic operations. So what's this new info you've well, um, you know, in in in general, Jasam basically offers the application for free for you know, iPhones and Nokia phones. Um. But actually, in the as of the ninth don't know, late middle part of October two thousand nine when we're recording this, um Chasm actually just got a new round

of funding from Kleiner Perkins, call Field and Buyers. They actually have their own iPhone Apple Cation fund called the I fund this hundred million dollars and uh, you know it's specifically geared towards iPhone developers. Well, Shasam got some of that money, and there, um they're going to continue to offer their services for free for a while. Uh. Well, they actually charge for the BlackBerry version, but the iPhone

and Nokia Phones version. I don't think they charge for Android, or if they have, I am totally unaware of it because I have Shazam for the Android phone and and if they're charging me for it, I need to pay better attention to my bills. Well, you need to pay attention to your bills by the end of the year in two thousand nine, because um, there you will start to get five free song identifications a month, and then four nine a month after that if you want unlimited

usage and all the extra goodies. And they're talking about selling application or i'm sorry, selling items, uh you know as part of the application to like you know, banned gear and you know, possibly selling video. I'm pretty sure you know what. I can't be certain now, but I could. I seem to recall the last time that Sasam correctly identified a song for me. It gave me a link

to where I could purchase the album on Amazon. But maybe maybe I've got that mistaken now because it has been a while since I've used it and had it actually work. Because it turns out that most music that I really like, I already know what the song is. Yeah, it's just the really really obscure stuff that tends to like I'm like, wow, that's so cool. I've never heard that before, and apparently neither has Shazam, so I haven't

had much call to use it recently. I don't have a Nokia phone, and I don't have an iPhone or an Android or BlackBerry um, but I do have the iPod and I could download Susanne. But I have the first generation iPod Touch, and so I could stand there all day holding up my microphone lists iPod Touch up, and She'sam's just gonna go anytime? You just just I'll

be waiting. Actually, I could use a microphone that ugs in, but you know that costs money, right, and then you don't and you have to carry around an extra pace of gear and you have to be someplace where there's WiFi for you to be able to actually indeed, you know that could be problematic that's a lot of qualifiers for a single application. So maybe I'll invest in a smartphone eventually. Yeah, that would um, that would be my recommendation.

You know, there's a there's this awesome new phone called the Motorola Droid. Have you heard about it? I do think I have. It's kind of hard not to, um so at any rate. Yeah, that's that's interesting stuff. Uh, I mean, I think these these services are really cool ideas, especially for people who happen to be out and about a lot and they hear a lot of you know,

encountering a lot of new music. Um. It's it's a really interesting way to to to educate yourself about stuff that you like that you know, you you encounter, but you don't necessarily you're not really familiar with it. Um. I mean, there are a lot of other options to Like, there are a lot of like HD radio stations which will identify the song that you are listening to. And there's some HD radio applications where you can even get access to buying the song off of something like Amazon

or iTunes. So uh yeah. Of course satellite radio also identify as those and um, you know there are other and this really isn't all that especially new. You know, Grace Note alias c d dB, you know, has been identifying songs mathematically, uh, you know, based on identification numbers, and at least, you know, it's a little bit different from the way that Shazam and Medomi do. And then there's my buddy John, who seems to know every song

ever recorded. So I'll be like, hey, John, what's that song that goes has He'll be like, oh, that's such and such by the so and so's and so. Yeah. If you if you don't have access to Shazam or Medomi, you should totally call my buddy John because he probably knows a song. Yeah, you know. And I bet that somewhere, uh, in our brains there is a mathematical algorithm going on to converts those bits of information you have to be

able to recognize something. Well, yeah, you think about the brain is pretty remarkable because you can recognize a song even if someone is mangling it. Yeah, you know, it doesn't you may not sound anything like the original song, but because we're able to recognize that, we can say, hey, that song is as you know, such and such like, or you can always do my favorite thing, which is Hey, who sings that song? Oh that would be the Beatles. Yeah, let them do it if nice. Um no, I'm a

I'm a kind guy. But yeah, I mean, you know, they say that kids who play a musical instrument are a little better at math. You know, it's sort of there's a connection between the two. So I don't know, maybe there's a thing. I'm I'm a pretty mean hand with the slide whistle on and not no, not with the slide rule, lack at ish sh mac and you do. Oh there you go. See those are the kinds of lame jokes that we bring. That's what we bring to the table that you just aren't going to get from

stuff from the B sides. Yeah, you will get some very valuable information about music from stuff on the B sides and instruments, you won't get really horrible jokes. So thank them for that. Yeah, yeah, definitely, Um so yeah they If you want to learn more about it, I do recommend listening to that podcast. It's you know, you'll have to dig back in the archives a little bit to fink it's from February. Yeah, it's been quite a bit, but shorter podcasts and yeah, that was a group that

was back when we would we would or ten minute podcasts. Yeah, but um, yeah, if you want to learn more about it, I would recommend listening to that podcast, and if you haven't already, you should subscribe to it because it's a good show. Yeah, definitely. So they also covered some some great classical music, um topics that I had never really explored before. So I learned quite a bit from that show, like classical, like in all the way back to the sixties. Yeah,

the seventeen sixties. Snap, all right, so let's let's let's find this. Let's find this down a little bit. Let's take this down on notch. Let's let's end with a little listener mail is listener mail comes from my good friend Ry Adney. She wrote to me and she said, I'm hearing your Touch Screens podcast now and the HTC Hero for Sprints supports multi touch. I've been crushing on this phone for a while, but can't bring myself to

ditch my Palm Centro just yet. So yeah, we were talking about, you know, the multi touch and how Apple had had patented the multi touch technology. Um well as long as as Google and HTC are creating a multi touch system that doesn't infringe on that patent. They're fine if they found a different way of doing it otherwise they could might they may see a letter from Apple's lawyers. It's not that Apple is known to be litigitus or anything, not at all. And on that happy note, let's wind up, guys,

thank you for listening. Remember we have tech Stuff Live. It's a show that's live every Tuesday, one pm Eastern. You can find that at the house stuff works dot com blogs. Let's just go to the house stuff works dot com home page. Look on the right side. You'll find links to the blogs there and you'll just look for the one that says watch tech stuff Live or some variation thereof, because that's where you can see our show. And uh, we're starting to get some really good feedback

on it, but we need more more viewers. So I know that one pm you might be working or in class, you know, just find a quiet corner that you can hide in and watch and you'll love it. You'll love It's the best use of twenty two minutes of your time right there. Yeah. Also, remember that we both contribute to the blogs at how softwork dot com. We I write articles occasionally for the website uh thus List Senior

writer title. I'll there you go, so visit how stuff Works dot com and Chris and I will talk to you again really soon. For more on this and thousands of other topics, visit how stuff Works dot com and be sure to check out the new tech stuff blog now on the House Stuff Works homepage, brought to you by the reinvented two thousand twelve camera. It's ready, are you

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android