Welcome back to the Elon Musk Podcast. I'm thrilled to share some exciting news with you over the next two weeks. We're evolving. We'll be broadening our focus to cover all the tech Titans shaping our world. And with that, our show will become Stage 0. You'll still get the latest insights on Elon Musk, plus so much more, so stay tuned for our official relaunch at Stage 0 coming soon.
So for the past four or five years, I've been bringing you in depth, no nonsense insights from the world of Elon Musk. But I need your help to keep the show alive and growing. If you'd love what you hear, consider supporting Stage 0 on [email protected]/stage 0 News. By joining our Patreon community, you'll get exclusive content, early access to some episodes, and a chance to shape
future topics. Everyone has a voice, and your support goes directly into making this show absolutely better, and it helps me keep bringing you the cut that you enjoy every single day. If you're getting value from Stage 0 news, becoming a patron is the best way to make sure this journey keeps going. So let's make the next five years even bigger together. There's a link in the show notes just for you. Why would Google train an AI to
talk to dolphins? That's not what you hear every day, and it's exactly the kind of question that sums up what might be the most unpredictable, chaotic, and fascinating week in
AI so far this year. Not only are researchers now attempting to decode animal communication using lightweight neural networks, but we also saw humanoid robots, runner of literal half marathon, AI tools that can animate pets into dancers or bring comic panels to life with one click, and Open AI launching its most intelligent models yet. Every one of these stories is a question in itself.
Why now, How does it work? And what does this mean for the future of interaction, creativity, and intelligence itself? Now to start, Google made headlines this week with something called Dolphin Gemma, a compact AI model trained to understand it even generate dolphin vocalizations. Now, what makes this unusual isn't just the implication, it's that the model runs directly on your phone. Researchers use Google Pixel devices to process real time dolphin chatter.
Using a framework based on audio tokenization, they recorded every sound dolphins make, clicks, squawks, whistles, and converted them into discrete audio tokens using Google's Soundstream codec. This tokenized data set was then used to train a smaller variant of Google's JEMA model, which comes in at around 400 million parameters. That's small enough to run efficiently on mobile hardware without external computing and beyond understanding.
The model can also synthesize new dolphin like sounds, which is a potential breakthrough for researchers aiming to translate interspecies communication into something that humans can eventually understand. Now the significance of this extends well beyond dolphins. The architecture is general enough that it could be retrained to understand and emulate vocalizations of other animals. And in theory we could eventually support real time two way communication with certain animal species.
Could you imagine talking to your own pets? Now that poses the question into strange territories. If an AI can mimic a language like structure in dolphin chatter, are we on the verge of developing machine driven cross species translators now? While AI was speaking to marine life, animation tools were speaking to the internet's
favorite content creators. Uni Animate DIT, which is a new plugin built for the open source model Animate DIF 1.2 allows users to animate any character image with a motion reference video up like a static image or any photo of a person, a cartoon or even a pet, and combine it with a short clip of someone dancing or moving around. The tool extracts the pose data from the video, then applies it to the static image, producing a fully animated clip with smooth transitions.
And what stands out is that the model can guess unseen angles like the back of the character and animate flowing fabric or hand movements convincingly. All this runs locally with a minimum of 14 gigs of VRAM, meaning creators can use a tool without relying on cloud services. And I've used it. The results are surprisingly good. Even characters with complex appearances or unusual anatomy, like fictional anime designs or animals, can be animated with minimal artifacting.
And since everything is released open source, artists and animators now have access to a new kind of puppetry. This is accessible. That's downloading a GitHub repo now. The companion tool emerged this week from Tencent as well. It's called instant character. It's focused is accuracy in reference based generation. So you have an image of a fictional character. Instant character can place the same character down to the facial structure, outfit details and accessories into a new
scene. You can render them in a studio playing piano or walking in a snowstorm in full anime style. And the model is based on Flux, one of the highest fidelity open source diffusion models available, and it uses Laura adapters to style outputs in everything from Studio Ghibli to Makato Shinkai's signature look. Now, unlike most existing character transfer models, this one keeps attributes consistent across varied scenes and does so across 2D3D and photo realistic
styles. So why this matters to you isn't just for cosplay creators or fan artist. In a world increasingly dominated by visualizations and virtual characters, the ability to preserve identity across generated media becomes critical, especially as avatars, V tubers and AI generated influencers grow more complex and integrated into media ecosystems. Now there's a new thing called Parkfield. It's a project from NVIDIA. It's focused on a very different
kind of segmentation. This time it's 3 dimensions. It's a part segmentation model for 3D objects capable of breaking complex meshes into individual label components. Now think about 3D model of a robot or a car with part field. Each part, arm, leg, wheel, mirror is isolated into its own labeled region, enabling texture swaps, physical simulations, or animations to be applied to each section independently.
This is obvious implications for anyone building physical simulations, robotics, or gaming assets. Compared to previous segmentation models, it not only performs better, but is much faster completing tasks in a fraction of the time thanks to more efficient tokenization and inference architecture. Now one more thing. A slightly surreal real world scene 1/2 marathon for humanoid robots just took place in Beijing.
More than 20 companies from across China participated, entering bipedal robots that walked, jogged and ran across and around a race track in full view of cheering spectators. Some entries were clunky, barely able to maintain balance, while others like Unitaries G1 and Beijing Humanoid Innovation Centers Chiang Jiang Ultra, manage smoother gates and even completed longer runs now. Footage of this shows some robots falling or freezing mid run, but others powering through
the full event now. Qiangyan Ultra in particular drew attention for its speed and its stability, suggesting real progress and bipedal locomotion design. The event might sound like a novelty though, but it reflects a changing reality. Robotic mobility is evolving fast enough that humanoids may soon be deployed in public facing or physical labor rolls. Holding a marathon may just be like a publicity stunt, but it's also a benchmark for these
robots. Tests, endurance, adaptability, and real world stability, these traits that are notoriously difficult for machines to master. Hey, thank you so much for listening today. I really do appreciate your support. If you could take a second and hit subscribe or the follow button on whatever podcast platform that you're listening on right now, I greatly appreciate it. It helps out the show tremendously and you'll never
miss an episode. And each episode is about 10 minutes or less to get you caught up quickly. And please, if you want to support the show even more, go to patreon.com/stage Zero and please take care of yourselves and each other and I'll see you tomorrow.