You know when you look at an old school wind up toy, like one of those little tin cars, there's this illusion of intelligence. You wind the key, you set it down on the floor, and it just zips across the room. It looks, I don't know, purposeful, right.
It has a trajectory. It's interacting with the physical world in a very literal sense, just moving its mass from point A to point B.
But it's completely blind. Oh, it's just releasing potential energy from a spring on a fixed mechanical path. Yeah, I mean, somebody drops a shoe in front of it. It just crashes into the leather and spins its little wheels until the spring dies.
Yeah, it has absolutely no idea the shoe is there. It's not making choices exactly. Well, it's more than just blind, really, it's completely agnostic to reality. It's a machine, sure, but it lacks any mechanism to perceive its environment or alter its behavior based on what it finds.
And crossing that massive evolutionary gap from a blind wind up toy to a machine that can actually perceive that shoe, stop, think and navigate around it. That takes an incredible amount of engineering.
An unbelievable amount yeah.
Yeah, today we are exploring exactly how to make that leap. Welcome to the deep dive where we are going under the hood of Lent and Joseph's manual Learning Robotics using Python.
It's such a great source to dive into.
We're going to see what it actually takes to build a differential drive service robot, specifically a robotic waiter.
And it's a brilliant journey for you to follow along with because it forces you to synthesize disciplines. You can't just be a programmer, and you can't just be a mechanic. You have to bridge the physical world and the digital world.
Okay, let's unpack this because before we can even think about building our robotic waiter, we really need to define what separates a tree robot from just, you know, a complex machine. You really do, and honestly, we have to fight through a century of science fiction myths to get to a working definition.
Oh absolutely. Even the origin of the word itself sets up this expectation that engineers have been wrestling with ever since. I mean, the term robot didn't originate in an MIT laboratory. It was coined in a theater in nineteen twenty a check writer named Carol Kappek wrote a play called ru Are.
Which stands for Rossum's Universal Robots right exactly.
He was trying to come up with a term for artificial human workers, and his brother Joseph actually suggested robode, which carries a.
Very specific cultural weight, doesn't it very much?
So it stems from the word robota in Czech and Slovak, which translates to work or more accurately surf labor or hard drudgery.
Wow.
So right from its birth, the concept implies a machine designed to take on the heavy, undesirable lifting for humanity.
Though I mean spoiler alert for a play from the nineteen twenties, those robotic serfs eventually revolt and wipe out humanity.
A narrative trope that has completely dominated our perception of robotics ever since. But you know, if we strip away the Hollywood dramatization and look for a functional, modern defy definition, roboticists like Maji Jamataric offer a much clearer lens.
Okay, what's her tick?
She defines a robot as an autonomous system existing in the physical world, able to sense its environment and act on it to.
Achieve goals, which fundamentally disqualifies my washing machine.
Yes it does.
My washing machine doesn't sense anything. It just runs a pre programmed timer, much like the tin wind up toy. Yeah, but I was thinking about this definition, and it also seems to disqualify something incredibly smart, like my email spam filter.
Ah, that's an interesting point.
Right, because a spam filter learns, it adapts, it acts autonomously to keep junk out of my inbox. Yeah, but it doesn't live in the physical world. The data is just perfectly spoon fed to it in a stable digital environment. Right. Is the sheer chaos of a messy physical bedroom what makes building a true robot so uniquely difficult.
That is the absolute core of the challenge. A spam filter never has to worry about a sudden gust of wind, or the floor being slippery, or a dog running by exactly or the lighting in the room changing and blinding its camera. The physical world is wildly unpredictable. The friction on a carpet is different from the friction on hardwood.
Sensing that unpredictable chaos, making sense of it, and then physically exerting force upon it, that continuous loop is what separates true robotics from pure software engineering.
So if a true robot must sense an act in this chaotic physical space, how do we build a brain that can process all that messy sensory information and decide what to do well.
This brings us to the fundamental anatomy of a robot. You have the physical body. You have sensors which measure the environment to create what we call a digital state. You have effectors or the motors and arms that do the actual moving and sitting between the sensors and the effectors, you have the controllers.
The brain, precisely the brain.
And giving a machine a brain that could handle physical chaos really started around nineteen sixty six with a robot developed at the Stanford Research Institute famously named Shaky. Shaky, Yeah, Shaky. It was a milestone because it didn't just execute a list of hard coded commands. It could actually reason. How So, if you gave Shakey a high level command, it could break that complex goal down into smaller, solvable intermediate steps.
I mean, trying to get Shaky to navigate around literal blocks in a room actually led researchers to invent the a star search algorithm.
Wait, really, the same pathfinding tool used in computer science today, the very same.
It's foundational. But Shaky got its name for a reason. It was incredibly slow and jerky.
Oh, because it had to think so much.
Yes, it would move a tiny bit, stop, think for a very long time, and then move a tiny bit more. Trying to plan every single movement perfectly just isn't practical.
No, especially in a dynamic environment with people walking.
Around exactly, And that tension led to the development of different control paradigms. You have the hierarchical or deliberative paradigm, which is what Shakey used.
Okay, what does that look like.
The sequence is sense the world, plan a route, then act. It's very intelligent, but computing that plan takes critical time. On the opposite end, you have the reactive paradigm. This is simply sense then act. There is no planning, just immediate response.
Let me see if I have this right, because I was trying to map this onto human biology all reading. Sure, if I accidentally brush my hand against a hont stove, I don't stop sense the temperature, formulate a spatial plan to move my arm, and then execute the movement.
No, you'd burn your hand.
Right, My spinal cord just bypasses my brain entirely and yanks my hand back in a fraction of a second. Is reactive control. Basically a digital reflex.
A spinal reflex is a phenomenal way to picture it, actually, but with one major caveat A robot's reflex is completely programmable. And if we connect this to the bigger picture of the robotic waiter we are trying to build from the source material, you quickly realize you can't rely purely on one paradigm.
You need accommodate.
You need a hybrid model.
Yes, because it has to be smart enough to find the table, but fast enough not to run over a customer.
Exactly that. The robot needs a hierarchical deliberative plan to note that payble four is across the dining room, to map an efficient path around the stationary booths, and to calculate an arrival time. But if a patron suddenly steps backward out of their chair holding a tray of drinks, the robot does not have three seconds to sit there and recalculate a new global map.
It would crash.
It needs that reactive reflex to slam on the brakes of the millisecond it's light ar detects a sudden obstacle. The hybrid model layers these systems, a slow smart planner running in the background and a fast reactive safety system running in the foreground.
All right, so we understand this hybrid brain. But a brain needs a physical vessel to carry the soup. If we are actually engineering this robotic waiter, how do we design the physical machine from scratch?
Well, you start by defining your hard constraints. Engineering is all about solving specific problems. For this project, the robot needs to carry food, so the text specifies a payload capacity of up to five kilograms.
Okay.
It needs to travel at roughly a human walking base to not be disruptive, which means a target speed between er point twenty five and one meter per second.
That makes sense.
It also needs more than three centimeters of ground clearance to get over doorway thresholds without getting stuck. And crucially, it needs to be relatively low cost.
And the mechanical solution the book walks us through is a differential drive system. Yes, it's modeled after the turtlebot architecture. If you picture it, it looks kind of like a three tiered wedding cake made of metal plates, a base plate, a middle plate, and a top plate. That's a good visual,
But the drive system is what caught my eye. It only has two powered wheels on the left and right sides, and then these unpowered castor wheels like you'd find on the bottom of an office chair on the front and back, just to keep it from tipping over.
Differential drive is brilliant in its simplicity. By independently controlling just the yeat and direction of those two side wheels, you can steer the robot.
Effortlessly without needing a heavy, complex steering rack like a car uses.
Exactly, well, you can't just slap any motors onto those wheels and call it a day. You have to calculate the physics.
Right, because if you guess wrong, the robot either moves at its nail's pace or it completely stalls out when you put a bowl of soup on it.
Exactly, you are managing the tension between speed and torque. The text walks through the calculations. If you want this robot to travel at a modest point three to five meters per second, and you are using wheels with a nine centimeter diameter, to maintain that ground clearance.
Okay, right.
In the math, the geometry dictates the wheels must spin at roughly seventy four revelations per minute. The author standardizes this by selecting an off the shelf eighty rpm motor.
But speed is only half the battle. Torque is the muscle, right right.
Torque is the twisting force required to overcome inertia and friction. Think about pushing a heavy box across a room. The hardest part is always that initial show to get it to budge from a dead stop.
Yeah, once it's sliding, it's easier.
Our robotic waiter. The chassis plus that five kilogram payload weighs about fifteen kilograms total. That weight is pressing down on.
The floor, and the floor has friction, a lot of it.
If you factor in a friction coefficient of zero point six for a standard indoor surface, the math reveals that the motors need to generate ten point three to two kilogram centimeters of torque just to break that static friction and get the robot moving.
Wow.
If you buy a motor with only five kilogram centimeters of tork, it won't matter how fast it can spin. The wheels will just hum, heat up, and stall.
That transition from abstract code to high school physics is wild. You have to literally calculate the friction of the floor, and once that math is locked in, you have to design the parts.
Yes, the actual model.
He also uses libricad for the two D bluefrints and Blender for the three D models. But here's where I got really confused reading the source. Oh, when it's time to build the three D model in Blender, the author doesn't use the mouse. They don't drag and drop shapes. They open Blenders, Python API, the boopy module and write lines of Python code just to generate a three D cylinder.
Ah.
Yes, Why would someone write a script to draw a shape when they could literally just click and drag the mounts and be done in three seconds.
I know it feels entirely counterintuitive until you consider the reality of iterative engineering. When you drag a mouse to draw a cylinder, you are introducing human imperfection. You might be off by a fraction of a millimeter, right, But more importantly, a hand drawn shape is static. Programmatic modeling allows you to completely parameterize your dimensions.
Meaning you link the shape to the.
Math exactly Imagine you build the robot and you realize your waiter actually needs to carry ten kilograms of food instead of five. The physics change.
You need bigger motors.
Which require bigger mounts, which means the base plate needs to be wider. If you drew it by hand, you have to manually resize the chassis, reposition the wheel axes, and check every clearance by I.
That sounds like a nightmare.
It is, But if you used Python, you simply open your script, change the variable payload weight to.
Ten, and run the code and it does it all for you.
The math automatically recalculates, resizes the cylinder, shifts the wheel mounts, and spits out a flawless, updated three D model. Plus the script can automatically export the STL files needed for simulation without any graphical artifacts. You aren't drawing a picture, you are engineering a responsive system.
Okay. I love that the code itself becomes the ultimate precision tool. So now we have our three D model, perfectly generated by Python, sitting on as differential drive wheels, right. But to make it actually navigate a virtual restaurant, we run into the physics of driving, and this introduces kinematics.
Yes. Kinematics is the study of motion, the mathematics of how things move without worrying about the underlying forces like mass or friction that we.
Just discussed, so just the geometry of it.
Basically, for our robot, we have to look at its degrees of freedom or its pose on a flat restaurant floor. The robot operates in a standard X and y coordinate system like a graph, but it also has a heading the direction it's facing, which is represented by the Greek letter theta, So its pose is always defined by x, y and theta.
But because it uses that two wheel differential drive, it has a major movement limitation. The source calls it a non holonomic constraint, a very intimidating term which sounds like a terrifying math term, but it just basically just what I deal with when I try to parallel park my car.
Actually, yes, because my.
Car can't just slide sideways into a spot. I have to roll forward or backward based on where the wheels are pointing. I have to do this whole geometric dance of turning, reversing, and straightening out just to move a few feet to the left. Face does our robotic waiter have the same problem, like, if table four is directly to its left, can it just strafe sideways?
It absolutely cannot, and your parallel parking frustration is the exact manifestation of a non Holow constraint. The robot cannot change its position on the x or y axis without first changing its data it's heading wow. To reach of that table to its left, it must execute a specific foundational algorithm for inverse kinematics vrot vahead v rot.
Rotate, drive ahead, rotate again correct.
It rotates in place to face the target coordinates. It drives ahead in a straight line to reach them, and then it rotates a second time to face the final desired orientation, say, facing the customer to deliver the plate.
Got it.
What's fascinating here is the geometry of how it turns. Because the two wheels are fixed on a single horizontal axis, anytime the left wheel and right wheels spin at different speeds, the robot naturally sweeps through an arc.
Okay, I can picture that it.
Rotates around a specific invisible pivot point located somewhere along that extended axis line. This is called the instantaneous center of curvature, or ICC.
And the math is constantly crunching the wheel speeds to figure out where that invisible pivot.
Point is continually forward. Kinematics is the mes asking, given the current speed of my left wheel and right wheel, where will my X y and theta be in five seconds? In Verse, kinematics asks the much harder question, I need to be at this specific X y and theta Exactly how fast should I spin my left wheel versus my right wheel to make my ICC carve the perfect arc to get there?
But wait to calculate any of that, the robot needs to know exactly how far its wheels have actually turned in the real world. It doesn't have GPS indoors, how does it know it drove two meters and not one ah.
It relies on wheel adometry. Attached to the motors are sensors called wheel encoders encoders Okay. The text explains that these encoders output binary signals microscopic digital clicks or steps as the physical wheel rotates. If you know exactly how far the wheel travels in one tiny step, you just count the steps.
And multiply, so it's basically counting its own footsteps in the dark.
Yes, if you think of the cameras and light oars as the robot's eyes. The wheel encoders are its inner ear. They provide proprioception. They give the robot a constant mathematical awareness of its own body moving through space, feeding real numbers back into those kinematics equations.
Oh yeah, so we've got the Python three D model. We have the torque calculations that we have the complex kinematics and encoders all.
Figured out everything we need.
But before we spend a single dollar ordering real motors or cutting metal plates, the book has us take all this logic and plug it into the matrix the simulation.
Which is the most crucial step in modern robotics. Simulation gives you a sandbox to test your real production ready coming on virtual hardware. It's low cost and more importantly, zero physical.
Risk, no broken parts.
Exactly. If your inverse kinematics math is inverted and your virtual robot accelerates to maximum speed and drive straight into a virtual wall, you don't have a pile of broken metal and a fired engineer. You just hit reset on the simulation.
Here's where it gets really interesting for me. The tools used for this are gazebo and ros because zebo makes sense. It's the three D physics simulator. It renders the restaurant, the tables, and calculates the virtual gravity. But ROS, the robot operating system, is what actually runs the brain, although the text makes a point to say it isn't actually an operating system like Windows or Linux. It's a meta operating system and a distributed framework.
That's a vital distinction ROS. It's on top of a traditional operating system. At its core, ROS is a communication framework. A modern robot runs dozens of separate executable programs simultaneously, one for vision, one for driving, one for mapping. ROS calls these individual programs nodes nodes. Yes, you have a central directory called the ROS master that keeps track of
what nodes are running. But the actual data sharing happens through a brilliant published and subscribe model using data streams called topics.
So this publish and subscribe thing, I'm trying to wrap my head around it. If a wheel encoder has new odometry data, does it just shout it out? Is it sort of like a social media feed where a node tweets its speed to a topic and hope someone else is reading it.
That is an incredibly useful way to conceptualize it. The wheel encoder node publishes its data to a topic named, for example, slash odometry. It has absolutely no idea who is reading it, or if anyone who is reading it at all.
Oh wow.
Conversely, your motor control node needs to know how fast the robot is moving, so it subscribes to the slash o doomitory topic. It essentially follows.
That feed, just waiting for updates. Yeah.
Whenever new data appears on the topic, the motor node reads it and adjusts the power. The two nodes never speak directly to each other. They just interact with the feed.
They don't even need to know the other one exists exactly.
And this decoupled architecture is what makes ros so powerful. Because the nodes are completely independent, they don't even need to be written in the same programming language. Wait really, Yeah, your wheel encoder node could be a simple Python script running on a tiny Raspberry Pie strapped to the robot's base. Meanwhile, your complex path planning node could be a heavy performance intensive C plus plus program running on a massive server
in the restaurant's back office, communicating over Wi Fi. It's a decentralized peer to peer.
Network, which means you can easily borrow code from other people.
Right, that's the real magic code reuse. A university team in Tokyo might write a brilliant, highly optimized node for avoiding dynamic obstacles. Because their node just subscribes to standard laser scan topics and publishes to standard velocity topics. You can download their open source code and drop it straight into.
Your waiter robot and it just works.
As long as the topic names match, it works instantly.
It's an entire collaborative ecosystem operating under the hood. So what does this all mean for you listening? Let's recap the journey.
Let's do it.
We started with a nineteen twenties play about artificial surf labor. We defined what a robot actually is, a machine that must sense an act in the physical world. We gave it a hybrid control system that can deliberatively plan a route while reflexively hitting the brakes.
A digital spinal cord.
Yes, we calculated the physical torque needed to move a tray of food, parameterized a three D model using Python scripts, and solve the parallel parking geometry of non holonomic kinematics, and Finally, we plug all those independent nodes into the ros published and subscribe matrix to simulate a working waiter.
It is a remarkable synthesis. You are stacking mechanical physics, computer science, and geometry to create something that ultimately feels almost alive.
And the practical takeaway here is that the next time you see a food delivery robot rolling down the sidewalk, or even just watch your roombow vacuuming your hallway, you won't just see a plastic shell roaming around.
Now you'll see the matrix.
You will be able to visualize the hidden architecture, the kinematics constantly calculating instantaneous pivot points, the invisible nodes tweeting data back and forth, and the hybrid brain fighting to survive the chaos of the physical world.
It profoundly changes how you view the machines around you. But to leave you with a final thought to mull over. Earlier we touched on the history of robotics, which the source text outlines, and you can't discuss that history without mentioning Isaac Asimov's famous three laws of robotics. Asimov's first law states that a robot may not injure a human being. It's an ethical philosophical mandate designed to keep us safe.
Right. But from everything we've just discussed, there is no philosophy module in Python precisely.
And this raises an important question for the future of engineering. In the reality of ros, everything is just nodes, topics, wheel encoders, and coordinate geometry. How do you map a human value onto a publish and subscribe network When our simulated waiter is calculating its instantaneous center of curvature to avoid a table, At what point does a mathematical threshold
cross over into a moral decision. We know how to program a machine to understand rpm and torque, but how do we eventually program it to understand us.
Math crossing into morality a massive question to leave off on. From blind wind up toys crashing into shoes to ysical machines calculating the geometry of human safety, we'll leave you upon to that one. Thanks for joining us on this deep dive.
