A fully finished 23 second video intro beat generated from a single slash goal prompt. Wow. There was no step by step babysitting involved at all. It took one hour and 15 minutes to complete. Yeah, that is wild. And it didn't even max out the one million context window. Right. And it used three hundred fifty seven thousand tokens to do it. This completely shifts how we have to look at open source capabilities today. Welcome to the deep dive. We are very glad you are here
with us. Absolutely. Today, our mission is to unpack a really rigorous hands -on guide. We are testing the open source GLM 5 .2 model, and we're doing this directly inside Cloud Code. We are going to walk through some actual real world tests here. We pit GLM 5 .2 against the heavyweight OPUS 4 .8. Right. We will outline a fascinating 80 -20 workflow split for your daily tasks. We will also break down the surprisingly
simple non -technical setup. Finally, we will reveal why this specific shift makes open source AI impossible to ignore. Beat. But to really understand why this matters, we have to look at the data. Yeah, what actually happened when these two models went head -to -head in the wild? That is the most important question. There were several highly practical tests run here. Right. We wanted to see if GLM 5 .2 could genuinely compete on real work. Or did it just feel good
because it was the cheaper option? Well, the first test was a one -shot web design task. The exact same prompt was used for both models. They both had to build a fully functional landing page. GLM finished the job in 3 minutes and 59 seconds. Wow. And Opus took 14 minutes and 59 seconds to finish. That is an absolutely massive difference in time. And GLM was around five times cheaper to run. Yeah. But the really fascinating part to me is the actual quality. Exactly. These
were not clunky early 2000s wireframes. Right. Both results were highly polished and visually impressive. The pages had movement built into the design. They had properly structured sections and very clear calls to action. This was not some flimsy budget result you would have to rewrite anyway. No. It was a completely viable starting point for a real project. Looking at them side by side, you would have a hard time justifying
that massive price tag for Opus. When the final result is that close, the price starts to matter heavily. It really does. But we didn't stop at visual design. Next up was hard coding. So GLM crushes basic web layout, which is great. But visual spacing is one thing. Logical reasoning is another entirely. Absolutely. That naturally makes you wonder what happened. Less forgiving. It was a complex coding assignment was evaluated using codecs. Using codecs kept the evaluation
entirely neutral and fair. Exactly. And GLM 5 .2 actually did very well here. It handled the bulk of the task perfectly. But. Opus caught one very subtle edge case that GLM missed entirely. tricky database values, it caught the subtle difference between true and one, or one and one point now. Oh, wow. If you have ever stared at a broken database at two in the morning, you know exactly why that matters. Yeah, bugs always live in those tiny details. It feels like a very
natural division of labor. GLM is kind of like the fast junior developer. It writes the bulk of the code very quickly. And Opus is the senior architect catching the tricky database bugs before they launch. That is a perfect analogy for how this works. GLN is fantastic for fast implementation and initial scaffolding. Right. And OPUS is necessary for careful reasoning and tricky edge case handling. Exactly. I have to be honest here. I still wrestle with prompt drift myself. beat. You know, you
give an AI a long list of instructions. And by step three, it completely forgets step one. So watching an open source model hold its focus over complex instructions is deeply impressive to me. It really is an incredible leap forward. Right. But there's a specific speed quirk we definitely should mention. Right. GLM is not always the faster model. Yeah. On a highly creative HTML task, GLM took 35 minutes. Opus finished
that exact same task in only 11 minutes. The rule here seems to revolve entirely around reasoning. Exactly. The more reasoning required by the prompt, the slower GLM feels in practice. Execution -heavy tasks are incredibly fast. Planning and creative taste just take much longer. They both succeeded on the creative task, though. GLM built an interactive page called the Anatomy of Attention. Right. It featured moving background elements and token visuals. and Opus built the life of a Death Star.
It was a beautifully structured timeline -style page. Both were excellent one -shot results. The final test here was deep research. This utilized a storm -style workflow. For anyone unfamiliar, that just means multiple sub -agents work together using different personas. Right. They compiled a very rich, highly detailed HTML report. It combined different expert lenses and actively challenged its own assumptions. And GLM managed
this complex agent workflow beautifully. The significantly lower cost changes the math entirely for us here. Running multiple autonomous agents suddenly becomes justifiable for daily work. Yeah, you can test 10 different angles without burning through your cash. But wait, if GLM struggles with heavy reasoning, doesn't that make it too risky for professional work? Well, not necessarily. It is really about matching the model to the risk profile. You have to match the specific
tool to the specific task. Right. Scaffolding a basic website layout carries very low risk. Exactly. But migrating a production database carries extremely high risk. You always use the heavy reasoning model for the high risk work. So you don't compare final answers, you compare your acceptable risk for the task. Precisely. Which leads us to a much broader mindset shift we need to discuss. Yeah. The real skill today is knowing exactly when to use GLM 5 .2. AI work
is not a simple contest anymore. It is not about the best model wins mentality. No, it is a multi -step, highly iterative process. Most work involves researching, drafting, and continuous testing. Then you edit, make decisions, and finally ship the product. Each of those steps demands a completely different level of intelligence. Exactly. Let's break down the 80 % work first. This is the natural domain of GLM 5 .2. It easily handles first drafts and gathering initial research. It does basic
web design and cleans up your messy notes. Right. It generates your initial options for a project. Cost matters immensely here because you are doing constant iterative testing. You are exploring different ideas. You don't want maximum pricing when you are just sketching a rough draft. Exactly. Then we have the remaining 10 to 20 percent work. This is the domain of OPUS 4 .8. This is the heavy thinking. Final reasoning, edge case review, and high -risk coding tasks. The system context
is crucial here, too. The harness really matters. Right. For clarity, a harness is just a digital workspace where the AI can use tools. Cloud Code provides that powerful harness for us. It lets the model read local files and actively run terminal commands. So GLM is not just a cheap OPUS replacement. It is a cheaper worker operating inside the exact same system. Mm -hmm. Beat. But aren't we just complicating things by juggling multiple workers for a single project? It might seem that way
at first glance. But using one expensive model for basic sorting tasks burns cash needlessly. Right. Splitting the work optimizes both your budget and the applied brain power. You get significantly better efficiency across the board. Let the cheap model gather the lumber. Let the expensive model build the house. Exactly. It is about working much smarter within the environment you already use every day. Placeholder for sponsor reads skip promotional text from newsletter, use provided
sponsor copy. Hearing about those massive price differences makes me want to try this immediately. Yeah. But whenever we talk about open source routing, it usually involves spinning up servers or Docker containers. Right. Is this actually feasible for a normal user to set up? It absolutely is. You are not learning a brand new tool here. You are just routing the model call to a different location. You simply route it to z .ai instead of Anthropix server. Exactly. Step one is going
to the z .ai API console online. You can actually test the model out right there first. They have 3D generation tools and small mini games available. It gives you a great feel for how the model responds. Yeah. Then you choose your preferred billing method. You can pay per token or choose a set monthly plan. Those monthly plans run roughly $16, $64, or $144. There's some really great practical advice on this point. Keep your Claude plan active and just add z .ai on the side. Yeah,
you don't have to choose just one platform. You use both of them together. Next, you generate a secure API key inside the console. This brings us to editing a specific local file. It is called settings .local .json. You find this file sitting inside the .clod folder on your machine. What is brilliant here is that you aren't installing heavy new software. Clod code is already looking for a brain at a specific web address. Right. All you were doing in that settings file is changing
the address book. You pointed away from Anthropic servers. You route the base URL directly to z .ai instead. You leave the Anthropic API key completely blank, and you put your new z .ai key in as the auth token. Finally, you set the specific model name you want to use. It is surprisingly simple. But there is a genius trick we need to highlight here. You create two entirely separate folders on your local machine. Yeah, this is incredibly clever. One folder is named slash
gm. The other folder is named slash opus. You put the custom routing configuration file only inside the slash glm folder. Wait, so just by changing directories in your terminal, you instantly switch the brain powering your workspace, zero friction. That is brilliant. You open clod code in the glm folder for your rough drafts. You open it in the opus folder for your final code reviews. Exactly. But I have to ask a security
question here. If I put my API key in a local JSON file, isn't there a risk I accidentally share it? Oh, there is absolutely a major risk there. That is why you must be extremely careful with this file. Right. Treat that API key exactly like a banking password. Keep it completely out of public repos or shared team screenshots. Treat the key like a password. Never push your local settings to the public. Exactly. Keep it secure and the entire workflow remains brilliant and
safe. Now, we really need to look at the bigger picture here. Why does GLM 5 .2 make open source AI models impossible to ignore right now? Well, open source is finally practical for the daily professional workday. It is no longer just a theoretical weekend project for developers. The massive scale of this model is truly staggering to think about. GLM 5 .2 operates with around 753 to 756 billion parameters. Yeah. Let's pause
on that number for a second. That is far too massive to run locally on a normal computer. You would need serious, incredibly expensive server hardware in your house. That is exactly why API renting through providers is necessary. It is the pragmatic middle ground for users right now. You get all the massive power without the massive infrastructure headache. This brings us to the underlying cost of tokens. Tokens are just tiny puzzle pieces of text. It is kind of
like stacking logo blocks of data. Exactly. The AI processes these tiny pieces to understand and generate language. When you look closely at the token pricing, the math is undeniable. Right. Opus costs $5 in and $25 out per 1 million tokens. GLM sits at roughly $1 .40 in and $4 .40 out. Two secs silence. Whoa. I mean, imagine scaling to a billion queries without bankrupt. yourself. That five times price difference changes
everything about how we build software. It is especially critical for complex agent workflows. These autonomous models constantly read folders, revise their own work, and call subagents. They burn tokens invisibly in the background while you drink your coffee. Exactly. Significantly lower prices encourage you to experiment freely without watching the meter. There are also performance benchmarks to consider. GLM actively beats GPT 5 .5 and Opus 4 .8 on some specific software
benchmarks. Right, but there is a very crucial point about those numbers. Benchmarks are just a signal. They simply tell you a model is worth testing yourself. Daily usefulness on your own specific files is the only real test. Yeah, how does it handle your unique code and your messy personal notes? There is also a major strategic advantage we should point out here. It acts as a necessary hedge. Closed platforms change their rules and their pricing structures all the time.
A vital feature can suddenly move behind an expensive paywall tomorrow. Learning how to route open source models protects your daily workflow from those sudden shifts. But think about this dependency for a moment. If open source models are this huge, won't we always be dependent on cloud providers to run them anyway? Yes. For now, that is the reality. Yes. API rental is the necessary middle ground today. Right. But it still fundamentally
breaks the monopoly. You aren't relying entirely on one single closed ecosystems rules anymore. It's about having a backup plan when the closed platforms change their rules. Exactly. You maintain your professional options and your creative freedom. So let's bring all of these different pieces together for you. We are looking at a fundamental shift in how we approach knowledge work. The era of treating open source models as clunky
budget alternatives is officially over. The true modern skill isn't finding the one perfect AI to do everything. You are essentially becoming an orchestrator of multiple minds. Right. You have to know exactly when to deploy a fast, cheap worker. You use GLM 5 .2 for the heavy, repetitive lifting and the rough drafting. And you must know exactly when to bring in the expensive protectionist. You call on Opus 4 .8 to close the deal and ensure total precision. It is a beautiful synergy when
you set it up correctly. We want you to look at your own daily tasks today. Identify the 80 % work you are currently doing manually. Where are you overpaying for AI compute right now? Where could a fast, highly capable open source model handle the drafting and sorting for you? Finding that exact balance will completely change your daily productivity. Thank you for joining us on this deep dive today. We always appreciate you spending your valuable time with us. It has
been a truly fantastic exploration today. We want to leave you with one final provocative thought to mull over. If an open source model is already handling 80 % of the workflow today at a fraction of the cost, what happens to the value of closed models when open source naturally creeps up to cover 90 or 95 %? Does intelligence become essentially free?
