#11 Neil: How to Fully Control Your AI - A Guide to 8 Key Settings | AI Fire Daily podcast

00:00

OK, so have you ever found yourself building an AI agent, maybe for your business, maybe for content, or perhaps like a customer chatbot, and you just realize its responses? They aren't quite hitting the mark. Maybe it repeats itself or gets a bit stuck. Doesn't always follow instructions. It's super common, honestly. Almost like you've got this incredibly powerful engine, but you just can't seem to get it out of second gear.

00:24

It's frustrating. And what a lot of us are discovering, sometimes much to our surprise, is the real fix isn't always about writing some super complex prompt or even shelling out for a bigger, pricier model. Often it's actually tucked away in these powerful but pretty simple settings that, well, many people just don't know about or maybe overlook. So today we're doing a deep dive. And this is really custom tailored to help you unlock that

00:47

true potential hiding in your AI agents. We're going to demystify eight crucial, let's call them knobs and dials that can seriously boost your AI's performance, its reliability, and frankly, how smart it seems. Okay, let's get into it. It's so true. It really is Yeah, what often gets missed is that real control, you know genuine control over an AI agent It goes way beyond just what you type in the prompt box. It's about understanding those Fundamental mechanics that shape the output.

01:15

Yeah, essentially gives you the power to sculpt its behavior with well surprising precision, moving from guesswork to really granular control. Exactly. When we first start with AI, yeah, all our focus goes straight to the prompt. Or maybe we spend ages debating, you know, should I use GPT -4 or Claude 3? And look, those things are absolutely important. Don't get me wrong. But there's this deeper layer, this really powerful control layer working behind the scenes. These

01:39

are what we call model parameters. And these are the settings that genuinely shape how an AI model generates its entire response. It really does remind me of like stepping beyond the auto mode on on a fancy camera. For quick snaps, auto is fine. But a real photographer, they know that to get that perfect shot, that nuanced image, they have to manually adjust things like aperture, ISO, shutter speed. It's the same idea here. These AI settings let you fine tune everything.

02:06

Creativity, randomness, even the length of the response or what topics it focuses on. And this is where it gets really interesting. You gain that sort of artistic control. Right. And the great thing is how standardized these parameters mostly are. So whether you're hitting an API directly or maybe using an aggregator like OpenRadar or even building workflows and tools like, say,

02:26

N8n, these skills are totally transferable. What you learn here, you can pretty much apply it across all the major AI models and platforms out there. All right, let's jump right into these essential settings, the ones that are really going to transform your AI agents. First one up, frequency penalty. Have you ever noticed your AI agent just repeating the same words or phrases? making the output sound a bit, well, like a broken record, or just unnatural. Yep,

02:50

seen that many times. That's exactly what frequency penalty tackles. It basically discourages the AI from reusing words and phrases it's already put out in that specific response. So you get much more diverse, less robotic -sounding text. The scale usually goes from about negative 2 .0 to 2 .0. Positive values, say somewhere between plus 0 .5 and plus 1 .5, they penalize that repetition, pushing the AI to find different ways of saying things. But here's a kind of interesting twist.

03:16

Negative values, like negative 0 .5 down to negative 1 .5, they actually encourage repetition. Now that seems counterintuitive, right? Encouraging repetition. It does at first, Claire. Can you give us a scenario where you'd actually want the AI to be repetitive? Why would that be crucial? That's an excellent question because it really highlights how different outputs have different

03:32

needs, you know. So, while you absolutely want that linguistic diversity for say, marketing copy or a blog post, keep things fresh and engaging. Imagine you're generating something like medical disclaimers, or maybe legal terms and conditions. In those cases, exact, consistent wording is absolutely critical. You want the AI to repeat key phrases precisely to assure accuracy, compliance, clarity. So yeah, a slightly negative frequency penalty could actually be incredibly useful there.

04:02

Keeps it consistent. Ah, okay, that makes perfect sense. Consistency versus creativity. So if frequency penalty handles the quality or diversity of the words, what of the quantity? And, maybe more importantly for many people, the cost. That brings us neatly to max tokens. Now, a token is basically a piece of a word. Roughly speaking, about 100 tokens is like 75 words. This setting puts a hard stop, a limit on how long the AI's response

04:26

can actually be. And believe me, this is a critical knob for controlling both your output and your API costs. Absolutely crucial for costs. Right. By default, or if you set it to a mass one, the AI will just keep generating text up to its model's maximum limit, which can be huge, thousands of tokens sometimes. But if you give it a positive number, like say 150, the AI just stops dead

04:46

once it hits that 150 token mark. This is invaluable for things where you need short, concise outputs, like headlines, tweets, SMS messages, maybe little bits of text for user interfaces. I definitely remember an early project where I forgot to set this. The AI tried to write a five -page essay when all I needed was a tweet. Oh dear. Yeah, my API bill that month. Definitely a memorable lesson learned. So beyond just fitting content, what's the bigger picture, the so what, of managing

05:11

max tokens from, say, an operational view? Exactly. And the real sting often isn't just the per token charge itself, but the cumulative waste over time. I've literally seen teams unknowingly generate gigabytes, gigabytes of unnecessary text and background jobs or automated workflows. It turns what seems like a cheap API call into a really shocking monthly bill when you add it all up. So Max Tokens isn't just about fitting content neatly into boxes. It's a direct way to stop

05:40

hidden costs from spiraling. It ensures your AI budget is spent efficiently, not just on Well, extra words nobody needed. That's such a crucial point. Those hidden costs can really, really sneak up on you. OK, but what if you don't just want the text itself? What if you need the AI's output to actually do something in your workflow? That's where response format becomes an absolute game changer. This setting basically forces the AI to output its response in a specific structured

06:05

format. Most often, that's JSON instead of just plain old text. Huge difference. Yeah. By default, the AI usually just gives you raw text. But if you flip the setting to JSON, the AI is then constrained. It has to generate a valid JSON object. Now the key thing here is you still need to clearly describe the JSON structure you want within your prompt. You have to tell it what the JSON should look like. This is so vital when you're passing that AI output along to another

06:30

application or another API, right? Or when you need to avoid messy text parsing. Or maybe you're extracting specific bits of info like names, dates, sentiment scores. Imagine analyzing customer feedback, for instance. You could prompt the AI. Give me the sentiment. the key issues, and a suggested action item, all neatly formatted in JSON, ready to go straight into your CRM or project tool. You've absolutely nailed the core

06:52

value there. The importance of structured data, especially for building robust AI workflows, it just can't be overstated. When you're integrating AI into a bigger system, predictability in that output format is everything. JSON ensures your downstream apps get clean, easily parsable data. It prevents errors cascading down the line, and it significantly boosts the reliability and, frankly, the scalability of your whole automation.

07:17

It's really the difference between getting a messy text file you have to somehow decode and getting a perfectly organized database entry ready to use. Fascinating how just a simple format change can ripple through the whole system like that. Okay, let's move on to presence penalty. Now, while frequency penalty discourages repeating words, presence penalty discourages repeating

07:36

poppics. It's all about encouraging the AI to bring in genuinely new concepts and ideas, stopping it from getting stuck on just one track, you know, going around in circles conversationally. Keeping the conversation moving forward. Exactly. It uses the same Neginesh 2 .0 to 2 .0 scale. positive values, maybe plus 0 .5 to plus 1 .5, strongly encourage the AI to introduce new topics. This is fantastic for things like brainstorming, like maybe you want diverse suggestions for a

08:01

travel itinerary. You want it to cover historical sites, restaurants, and outdoor activities, making sure it hits different types of suggestions. Conversely, negative values, say negative 0 .5 down to 1 .5, make the AI stay really focused on the topics already on the table, drilling

08:15

down deeper into those. So imagine brainstorming marketing angles for a new app, a high presence Penalty ensures it suggests ideas across various channels, you know, social media, content marketing, maybe influencer outreach, instead of just giving you 10 slight variations of a Facebook ad concept. How does this setting really drive that broader ideation compared to just, say, varying the vocabulary

08:35

like frequency penalty does? Well, it's a fundamental shift in how the AI generates the content, not just what words it uses. While frequency penalty ensures, let's say, linguistic variety within a given topic, presence penalty literally pushes the AI to explore new conceptual ground. It stops the model from just endlessly rephrasing or slightly expanding on themes it's already mentioned. It forces it to branch out, to think differently.

09:01

So for creative tasks or maybe exploratory queries where you want novel ideas, this means the AI doesn't just give you more of the same slightly rewarded, it genuinely offers up new avenues of thought. It's really the difference between getting, say, 10 variations on one theme versus getting 10 genuinely different themes to consider. Right, different themes altogether. And now for one that often feels a bit like magic, the one that truly seems to change the AI's personality,

09:25

temperature. This is arguably the most important setting for shaping the creative output, the feel of your AI. It directly adjusts the randomness and creativity in the response. It's like tuning the AI's personality dial. The creativity dial, I like that. Yeah. The scale usually goes from zero up to one, though sometimes you see it go up to two, depending on the model provider. A low temperature, maybe down between 0 .1 and 0 .3, makes the AI extremely predictable. Very

09:51

deterministic, highly focused. Think of it as your super accurate technical writer. This is best for tasks needing precision, consistency, maybe extracting data or generating code where you don't want surprises. Right, factual stuff. Exactly. Then you've got a medium temperature,

10:06

maybe 0 .4 to 0 .7. a nice balance. It's great for general purpose tasks, drafting emails, writing reports, a standard chatbot response, but if you want something really creative, surprising, maybe even a little bit risky, you crank that temperature up. High setting, like 0 .8 to 1 .0. This is your brainstorming artist mode. Fantastic for marketing slogans, creative stories, maybe even artistic content generation. Simple rule of thumb, if your responses are too boring, turn

10:32

the temperature up. If they're too random or nonsensical, turn it down. A good starting point for general used is often around 0 .7. What's so fascinating about how this setting, temperature, directly impacts the feel the character of the AI's output. It's just captivating because temperature doesn't merely change the words on the page. It fundamentally changes the perceived character of the AI interacting with you. A well -tuned temperature setting can genuinely transform an

10:59

AI. It can go from feeling like this rigid factual database spitting out information to feeling like a dynamic, even insightful creative partner. It directly influences that perceived intelligence and usefulness, especially for tasks that demand originality or a fresh perspective. It dictates the overall flavor, the surprise factor of the output. which makes it one of the most powerful and actually quite intuitive ways to direct the AI's creative potential. Character, that's a

11:26

perfect word for it. Okay, now let's switch gears a bit and talk about something more operational. Timeout. This setting simply determines how long your system will wait for the AI to give a response before it just gives up and throws an error message. It's usually measured in milliseconds. Important for user experience. Hugely. The default can be quite generous, actually, often something like 360 ,000 milliseconds, which is a full six

11:49

minutes. That's designed to handle even really complex requests that take a lot of compute time. But you can set a custom value, like maybe 30 ,000 milliseconds for 30 seconds. You definitely want a lower timeout, maybe somewhere in the 20 ,000 to 60 ,000 millisecond range, that's 20 to 60 seconds, for things like real -time chatbots. Or any user -facing app or making someone wait ages would just be a terrible experience.

12:11

Absolutely. For background processes, maybe generating long reports or doing very complex data analysis, you might set a much higher timeout, maybe 600 ,000 milliseconds, 10 minutes, or even more. Give it plenty of time to finish its work. So practical guidelines often look like... Live chat bots, maybe 15, 30 seconds max. Content generation, perhaps three, five minutes. Complex data analysis, you might allow 10 minutes or

12:34

more. How do we effectively balance those user expectations for speed with the reality of the computational work happening behind the scenes? Well, that's the core challenge this setting helps you manage, isn't it? For an application where an end user is waiting, immediate feedback, or at least fast feedback. is critical. Users generally prefer getting a quick error message saying, something went wrong, try again, rather than just staring at a spinning yelp for minutes

12:57

on end. So, in that case, you prioritize a lower timeout. You protect the user experience above all else. But for a backend process, something that can run asynchronously, meaning it doesn't need to reply instantly, allowing more time ensures those complex computations can actually complete successfully. There, you're prioritizing reliability and getting the full, comprehensive result, even if it takes longer. It really is all about the context of where and how the AI is being used.

13:23

Context is king, definitely. Okay, but what happens when things inevitably do go wrong? you know, temporary glitches. That's where max retries comes into play. This setting controls how many times your workflow will automatically try again if a request fails. This is super useful for handling transient issues like a brief network hiccup or maybe the AI model itself is just momentarily overloaded and can't respond right away. Handling

13:47

flakes, basically. Exactly. By default, it might be set to something reasonable, like two retries. You could set it to zero. if you want your system to just give up immediately on the very first failure. That can actually be useful during development or testing when you want to see errors quickly. Or you can crank it up, maybe set it to five, to significantly increase its persistence, its determination to get through. Generally, you'd set it lower when you prefer fast failure for

14:13

debugging. But for critical tasks like, say, processing a customer's order or if you know you're dealing with an API that's occasionally unreliable or flaky, setting it higher, maybe three to five retries can be a real lifesaver. It ensures that crucial task eventually gets completed. But this brings up a really important point, a caution perhaps, about balancing that reliability against the cost implications. While yes, more retries absolutely increase the chances

14:39

of a critical task succeeding eventually. You have to be mindful. If the API call you're making costs money each time you make it, setting max retries to five means you could potentially be charged up to six times for a single problematic request. That's the initial attempt plus those five retries. It can multiply your costs very quickly if the underlying issue persists and you're not careful. Yeah, that's a huge hidden cost multiplier many people probably don't think

15:04

about initially. Good warning. Okay, last one on our list, top P. This is basically an alternative method to temperature for controlling the randomness, the unpredictability of your AI's output. It's sometimes called nuclear sampling. Instead of broadly adjusting the creativity feel, like temperature does, TopP focuses specifically on selecting from the most probable set of next possible words. Yeah, a different way to slice the probability pie. Exactly. It runs on a scale, typically 0

15:29

.1 up to 1 .0. A setting of 1 .0 means the AI considers all possible words in its vocabulary when deciding what comes next. A setting like 0 .5, though, means it only considers the smallest group of the most likely words whose combined probability adds up to 50%. And a really low top P, like 0 .1, means it's only looking at the top 10 % most probable words. This leads to very safe, very predictable, often quite conservative

15:55

text. Now, a key pro tip you hear a lot. Most experts strongly recommend using other temperature or top P, but generally not both at the same time. They can kind of interfere with each other and produce weird results if you try to tune both simultaneously. And get one lane. Right. So when would someone actually choose to use top P instead of the more common temperature setting for their AI agent? That's a great question because they both touch on randomness, but in

16:18

slightly different ways. Generally speaking, temperature is more intuitive for most people. It controls that overall feel. of creativity, the vibe, and it's usually the first thing people reach for. However, top P gives you more direct, maybe more fine grained control over the uniqueness or the range of the vocabulary the AI chooses

16:37

from. So if you're generating text where you want a certain degree of creativity, but you absolutely need to prevent the AI from suddenly throwing in very strange or completely out of left field words. Think maybe technical prose that needs to be precise, but still varied and not robotic. Or maybe certain types of formal writing. In those cases, TopP can be incredibly

16:56

useful. It essentially ensures the AI stays within a statistically safer, more coherent set of word choices, even when you're pushing for a bit more variety than zero temperature. OK, fascinating. So we've unpacked these eight really powerful settings. Now the big question is... How do we actually use them effectively in practice? We've put together a simple, but we think pretty powerful four -step optimization workflow for you to follow.

17:19

Step one, define the problem. You need to be super specific about what's actually going wrong with your AI's output. Is it too repetitive? Okay, you know to like a frequency penalty. Is it too random and chaotic or maybe just too boring and predictable? That points towards temperature. Is it too slow? Check the timeout setting. Are the responses too long or maybe too short? That's max tokens. Do you need structured data? Use response format, probably JSON. Pinpoint the

17:43

issue first. Then step two, change one setting at a time. This is absolutely critical and it's honestly where a lot of people stumble. Resist that temptation to just go in and tweak five different knobs at once. You want to approach this scientifically. Make a small, deliberate adjustment to one single setting, and then test the result thoroughly. This is the only reliable way to figure out what specific change actually

18:06

worked and why it worked. If you change five things, you'll have no clue which one really made the difference, positive or negative. Isolate the variable. Exactly. Which leads perfectly into step three. test with realistic scenarios. Don't just ask your AI to, you know, tell me a joke or write a poem about cats. Use the actual kind of prompts and the real world data you expect it to handle in its final application. Feed it real customer emails if that's what it's for.

18:32

Give it the specific tasks it needs to do. Use the data your business actually works with. This makes sure your adjustments are genuinely relevant and actually impactful for your specific use case, not just some abstract test. And finally, step four. Document and create profiles. This sounds simple, but it's so important. Keep a record, maybe just a simple spreadsheet or notes, of what settings work well for different tasks

18:56

and different desired outcomes. Over time, you'll find yourself building up these specific profiles. Maybe you have a creative content profile with high temperature and high presence penalty settings saved, or perhaps a professional summary profile with low temperature, maybe a slight negative frequency penalty, and a specific max tokens limit for conciseness. So what does this mean for your long -term efficiency? It means you build this reusable library of configurations

19:19

that you know work well. It saves immense time, effort, and probably even costs on future AI projects because you're not starting from scratch every single time. And there you have it. We've kind of journeyed from that initial frustration maybe of unpredictable AI agents all the way to understanding how you can wield this precise, fine -tuned control over their output. Just remember, optimization. It's an iterative process, right?

19:42

It takes a bit of trial and error. Our advice, start with the most impactful settings first, usually temperature and max tokens. They often solve the biggest, most common issues right out of the gate. And then from there, you can layer in adjustments to the other parameters like frequency penalty or response format to really refine that

19:58

output for your specific needs. Yeah. And if we connect this back to the bigger picture for a second, the future of working with AI, it really isn't just about having access to ever more powerful models. It's increasingly about skillfully directing

20:12

them. With these settings now hopefully clearer and more accessible in your toolkit, you're really equipped to build that next generation of intelligent, effective AI agents, whether you're building, you know, a sophisticated business automation or maybe a groundbreaking content engine or even just a more responsive personal AI assistant, which kind of raises an important question for you, the listener. How will you apply these insights to your own projects? What are you going to build

20:38

or improve first? Yeah, and maybe think about this as we wrap up. Mastering these seemingly small knobs and dials, it doesn't just improve the individual outputs of your AI. It can fundamentally change how you approach problem solving and innovation using intelligent systems really in any field. What new possibilities really open up for you when you realize you have this level of precise control over your AI?

Transcript source: Provided by creator in RSS feed: download file

#11 Neil: How to Fully Control Your AI - A Guide to 8 Key Settings

Episode description

Transcript