Building Agentic AI Workflows with Matthew Henage - JSJ 678 | JavaScript Jabber podcast

Speaker 1

00:05

Hey everybody, welcome back to another episode of JavaScript Jabber. This week on our panel, we have Steve Edwards.

Speaker 2

00:12

Yo yo yo to imitate AJ coming at you from a cloudy bit warming up Portland area.

Speaker 1

00:20

I'm Charles max Wood from Top End Devs and this week we're talking to Matthew henij Now, Matthew, you lived near me as far as I remember. We were introduced by my neighbor and we had a good talk about AI and I thought, hey, let's let's have you on the show and dive into this stuff. So you want to let people know what else they ought to know about you and then we can roll from there. Sure.

Speaker 3

00:46

Yeah, it's great to be on the show. So my name is Matthew Henich. I'm full stack developer professional last twenty years, but doing answers.

Speaker 1

00:57

I was a kid so about thirty years ago.

Speaker 3

01:01

Always love jumping into new technologies and so it's working on a project that started working on called wows dot AI about three years ago, and and then chat JPT three point five came out about two years ago and it just blew my mind. Saw a lot of huge utility from it and got super involved with starting using programming, using AI. So, yeah, I'm also from I'm from Lehigh, Utah. So yeah, it's great to be on the show. Thanks for having me.

Speaker 1

01:39

Yeah, thanks for coming. Yeah. So when he says Lehigh, Utah, that's also where I'm at, I'm same town. So anyway, so, yeah, you were showing me when we talked before about your system. What is it w AOS? I don't even know what that stands for.

Speaker 3

01:56

Yeah, stands for a Web app operating system. It's a way if using low code tools to build web apps and then you can create AI workflows that basically control the web app in real time.

Speaker 1

02:12

Yeah. It kind of reminded me a little bit of what's it called zappier, except put prompts in instead of connecting to APIs for different products. That's a good way to go. Yeah. Yeah, So I guess just to dive in because at all the coding meetups that I go to anymore, everybody's talking about AI. They're excited about it, They're excited about what you can do with it. A lot of people are, you know, diving into different aspects, you know, whether it's generating text or images or videos

02:45

or anything like that. I'm a little curious to just I guess kind of get the state of the art as far as you see it of AI and how people might or might not be using it.

Speaker 3

02:59

Sure, Yeah, Yeah. So it's kind of interesting because AI is concept has been around for a very long time, and kind of what a lot of people are now referring to with AI is more of like generative AI, so using different types of architectures like auto regression, which is kind of what l MS use, like CHATCHBT or claude, and then you have other kind of gener of AI like diffusion models, which you might see something like mid journey type kind of use there, and other kinds like gams and.

Speaker 1

03:39

Type so.

Speaker 3

03:42

Kind of those are kind of some of the more popular ones. And so it's kind of interesting because when I first started getting into AI with llms, with like CHATBT three point five, it's one of the things I kind of thought about using it for is kind of like a a universal API where you can ask it to do anything and then have it basically come up with the answer. Kind of find out pretty quickly that that's not really a general little use case. It can't

04:14

do everything. It has advantages and disadvantages. It's kind of interesting A lot of people. There are a lot of businesses that are kind of just a wrapper over over top of something open AI and so which a lot of times you can just use a chat like app and really get a lot of the use out of

04:36

it that way. Some of the advantages of some of those kind of applications are they put a lot of thought and effort into the prompt engineering that's basically a way to explain to an AI like how it should behave and what rules it should follow, and so that there's a lot of utility out of that. But I think one of the big things that we're seeing a lot of move towards is AI agentic workflows. And there's

05:09

different kind of names with that. You might hear like AI swarms, like agent swarms or agent teams, and where the huge benefit from that is instead of using like one agent, like if you have an AI agent. There's a lot of different definitions for that, but one way I like to look at it is is an agent is something that you can give a prompt to and you

05:34

can get some kind of expected response from. And so if you're using something like CHATBT, you can give it like a task or asking a question and it will give you a response that's like one agent and it's more could be more general general. An agent swarm or team or like a workflow is a way to be able to have multiple different agents work together to solve

06:02

a problem or perform some kind of task. And so there's a lot of advantage of that, just like having maybe one person in it for company that has to wear all the different hats, versus having a team of people working in a company to perform some kind of task. So you have different people that are specialized at different kind of roles, and each one of them does really well and what they do, and it comes together as

06:30

a collective to provide the most value. You can do the same thing with a AI agent swarms or teams or workflows, whatever you want to call that. An example of that would be a project I used with WOWS created completely within WOWS is something called speak Magic, and you can check out the examples of that Speakmagic dot AI. And the idea is you can give one prompt and then have an agent swarm of like forty two different agents that basically take a story input, like you just

07:08

give an idea. Maybe you have ideas for a character or something like that, and it'll create up to a two minute video of these different agents working together to create like a scene. And so it'll take the story prompt, it turns that story prompt into scenes. It's turns of

07:28

scenes into scripts. It turns those scripts into shots, and then you have like different characters that are basically acting out the scene, and animates the characters like speaking to each other, and they could add like sound effects and music and different things like that to create the create it. So there's a lot of advantages of that. And so it's kind of like where I see things are going kind of the state of.

Speaker 1

07:53

Yeah, yeah, it's it's interesting because I talk to different people and it seems like everybody doing something different with AI. So you've got kind of like you were saying, where you've got people who are you know, they're trying to make a story and then they're maybe they're making it

08:10

into a story with the video. Right, most of the people that I've been talking to and working with, they're using it kind of you were talking about agentic AI and the you know, kind of building out your team to do different things, and that's kind of where my interest has been, except it's more of a chat agent and less of a voice or a video agent. And so yeah, you give it a prompt and then you also give it a set of functions that allow it to do things, and then it can go off and

08:38

it can do the things. But what I'm finding most people do with that is, like you said, they have

08:44

a team of agents. And so you may have an agent that is kind of the coordinator or support agent, and then you know, it can go and talk to the scheduler agent to get stuff on the calendar, and go talk to the technical agent to you know, get more specialized technical feedback, or it can go talk to you know, another agent that has access to different other APIs to do other things, and and so that that's

09:09

kind of the deal. And so then you get into using something like model fusion for JavaScript or I've been doing a lot with ray r ai X and Ruby to do a lot of this stuff. And you just you write a tool, and a tool is essentially that set of functions and and you make stuff run. And so yeah, one of the tools might be here's the video generation API, and you know it also uses a AI to do its work. And so anyway, it's it's

09:35

really fascinating kind of see where it's all going. At the last code meet up, I went to one of the guys there had actually been using it to write fiction, and so you know, he'd use it to flesh out parts of the story or actually write parts of the story, and you know, he's like, yeah, sometimes it's really good and sometimes it's really not. It's been, it's been. It's

09:59

interesting to to see how far it goes. What I tend to find is that if you do a lot of people want like the one off prompt where you just write the prompt and you immediately get back the feedback that you want. What I found is that in a lot of cases you have to refine it with the AI see what you do wind up doing as you wind up saying and I forgot to tell you this, or oh you know, I need you know, I need

10:26

this scheduled, you know, every Thursday, and then it's okay. Well, when I said every Thursday, I actually meant, you know, except holidays and this and that and the other, you know, or if I'm trying to get it to write code. It's so one example of this I was using lately. I've been playing with rock but this one was on chat GPT and it you know, I said, hey, I

10:48

need an audio player for my website. By the way, if you go to topendevs dot com, if you go to JavaScript jabber dot com and you the player at the bottom of the page is the player that Ai mostly wrote. And I said, hey, I need an audio

11:02

player for the website. You know, I need this, I need that, I need this, right, And so it's like I want a volume, you know, I want to be able to change the volume, and I want the progress bar to go, and I want it to there's a state of the there's a state of the art thing for podcasts where you tell it to not load the audio until it's actually clicked. So until you hit play

11:23

or download right, it doesn't it doesn't eagerload it. And the reason is is because the the metric DuJour for a long time with podcast is downloads, and so you can actually pad your numbers by not telling it or by by forgetting to tell it not to download every time it loads the page, right and so right, But anyway, so I told it, and then I actually asked it what other features should I put in? Right? And so anyway, we're seeing this kind of thing with a lot of

11:55

different people. So some people are willing to go in and use the web interface on something like this or there, what is it? Open web. There's a there's a web interface that you can run on Olama on your own machine to do a lot of these things. And so anyway, there are a lot of options for this stuff, and some people have refined it so that it will automatically use tools to go and make web searches and stuff

12:19

like that. Or if you use grock will it'll tell you that it's thinking, and it'll you can see that it's loading in different pages and you can ask it for its sources. But yeah, it's it's been fascinating to just see where all of this goes. And then of course they're the specialized uses like Cursor, AI and things like that for for programmers or you know, other AI systems for other folks, and so yeah, this guy's kind of the limit for it. I think.

Speaker 3

12:45

Yeah, it's such a broad range of different things that AI can do. We're talking about using things like Cursor. There's also other tools that are more like coding agents, so like cloud code. I don't know if you've seen Mannish kind of hit seen it a week or two ago. That one's been pretty pretty amazing what it can accomplish.

13:11

But just using I typically most of my coding I use is I just open up a chat and I kind of explain a specific feature that I want and then kind of treat it like a junior dev in a way. I think that's kind of one of the best ways to treat it. Kind of give it all the different use cases, what you're what I expect going into it, and what should be coming out of the code, and and why and so, and then of course you need to check the code make sure that it's going

13:49

to do what it wants. It's kind of interesting because we'll see I think we're going to see more and more where AI can do more and more coding. Some of these things like cursor are getting better, but typically the larger of your code base, more complicated get things, get it kind of runs out of context window and right starts kind of not performing quite as well. So usually smaller projects are really great. And that's changing though, so which is well, been all fun, you.

Speaker 2

14:26

Know, it's interesting. I was listening to a podcast this morning.

14:28

They were talking about the topic of vibe coding, which is the idea that using stuff like yeah and so it was syntax FM and so what they are The general definition is someone who wants some simple app for some business purpose or the examples were given like I'm trying to build a little game with my kid and doing it the regular way takes forever in a day, and or a way to do a quick demo of something to see how something would look or things like that,

14:55

and then you're not it's not something you're going to be using, you know, long term professionally, you know, deploy it and reuse as sort of a one off type of thing, and so, you know, usually with this kind of stuff, the code quality, from what I've understood, is

15:13

how do we say less than desirable? But the idea is that you can spin up something really quick, whether it's just you know, to see how something would look, or to give you an idea of, uh, you know, how something could work, and then you could you know, tweak it from there or do something from there. But yeah, like I said, vibe coding is apparently one of the new term all the cool kids are using.

Speaker 1

15:36

Yeah. I'm not sure where the line is though, between vibe coding and actually just having the AI help you right, right, Because Yeah, at the end of the day, like in my example with the player, right, I wound up having to it it Probably seventy five percent of the code that the AI wrote I used, But the rest of it, I mean I had to use my own expertise to make it do what it do and look how I wanted it to, right, it didn't give me exactly what I wanted. So yeah, I don't know.

Speaker 2

16:02

So Matthew, let me ask you this question. You know, when it comes to this auto this uh auto generation coding. My question that I've always had that I've never had answered that I've never because I've never looked into it is what languages and tools are being used? I mean, how do you determine that? Like, you know, my focus of development, I tend to focus with letterbl and view and tailwind and inertia.

Speaker 1

16:24

And some other tools like that.

Speaker 2

16:26

And so if I'm going to tell AI to build me, you know, some little to do app, you know, to beat that one to death or even something more complex, do I say, Okay, I want you to do it with this framework and these tools in this language, or do you just say build me an app, and it builds one for you out of what it determines to be best. How does it determine that? I mean, what's what is the structure that is used to generate uh, these apps that AI is building for you.

Speaker 3

16:53

Sure, So I think there's a lot of different ways to look at this. One is I mean, if your code of yourself, I mean, you're gonna want to stick with the things that you know to some degree so that you can you can edit and and kind of understand why why things are going wrong when they do go wrong, so that you can fix things. And so typically a lot of things when it comes to programming with like using AI, like models and tools and things. There's a lot of things that are geared a little

17:25

bit more towards Python. It's kind of the whole, the whole kind of with with things, and so you have a lot of tools and stuff that are kind of more geared towards Python. But if you're having it generate code, typically how I mean how lms work As you're pulling information from a whole bunch of data from like the Internet, and so the more the more data there is for a particular technology, the more likely it can perform a

17:57

bit better. And so things like having like JavaScript one of the nice things a lot of times when I code, and having like AI help out with the code using something like Chatchyp's artifact or canvas and being able to have it generate code, but it can actually render that code for you, which I don't think it can really render anything else besides really JavaScript right now. And so that's really nice to build you something like that in

18:26

that kind of case. But when it comes to the different technologies, I mean, it really comes down to I mean, I liked sticking with the kind of the languages that I use, React node in the back end, and so using kind of the technologies that you can help help steer it in the right way or connect it to your existing app, I think is kind of the more

18:53

the way they go. Can I see right now, if you're going to go more obscure kind of languages of technologies, you're you're more likely to not get as much support from from AI helping out. One way around that is is getting the context for those technologies, like say if an ELM just doesn't really have much UH knowledge based on that, go into like the documentation for that and like giving that documentation to the relevant pieces of the documentation to t A I to help you with that.

19:30

So it's kind of a way to ram as well.

Speaker 1

19:35

So one one thing with that is because I haven't really run into that. You know, most languages I use are fairly mainstream. I think Steve's in the same boat with PHP and JavaScript. But if you were using an excure obscure language, so maybe Elixir or I'm just trying to think, like, how how far obscure can you get self? I don't know ELM, ELM there might or might not. I guess it probably depends. That's one other thing that I found is that some models do much better with

20:07

certain languages than others. But do you run the risk of so just to give people a little bit of context. So there are different sets of data that your LLM works from. So there's the latent space and that's everything that it got trained on, right, And you see them building these big data centers with all the GPUs, and they're trying to get as much data in as possible, right, So they come out with a new model, and generally what it means is that, hey, we've jammed more data

20:34

into it. It's more training, and so you're going to get better answers from it. And then you've got the context, which is hey, for this particular query or set of queries that I make to the LM, there are prompts. You know a lot of times it's part of the prompt, but doesn't always have to be. You know, here's everything you need to know in order to give me a good answer. Right, So you're counting on its ability to process, plus whatever it's got in the lay and space that

21:00

it already knows. Right. So it may already know stuff about programming in general, programming practices in the latent space because it's already been trained there. But then here are the specifics of the language that I'm using, right, Or here are the functions that are available to me, and here's how they work. Right. So maybe it's not the language. Maybe it's an obscure library that it just doesn't know

21:19

as much about. Do you run the risk though, of making that or of going beyond what the context window will hold? Because the context window is essentially how much you can tell the LM right when you ask the question. It's it's all the data can hold. And so for the layperson. I'm not explaining it to Matthew. Matthew knows

21:39

this stuff. But do you run the risk then of expanding beyond what the what the context will hold, because you know different models will allow you to give them different amounts of data in your context.

Speaker 3

21:53

Yeah, so a lot of models when you go beyond the context window, then they just fail and so it'll just give you an error and that could be very annoying using some different things. The other there's two other issues with main issues with larger context as well. One is the cost. So as your input context goes in like what you're supplying to it, which could be like your whole code base, then the cost for each token

22:27

generation and it all goes up. And then the other problem too is as the context windows starts filling up, the less likely it's going to be able to recall and to be able to per form as well. And so that's where when you're making AI workflows kind of one of the parts is like from the whole solid concept, you have the single responsibility and I kind of apply

22:55

that within each agent within it. And the idea is each agent that you create needs to be specialized, and I like to keep the context window around around four thousand tokens, which is like basically like four thousand words, but not not quite. And it seems to perform best in recall as well as be able to fall instructions

23:21

with having a smaller context. And so that's one of the reasons why I like using just like a traditional just chatbot and having it generate code, is because I can limit the context exactly what I want, and I kind of pull and say, okay, I need this new file that does this thing. These are the inputs coming in. These are the outputs going out. Maybe feed it like a documentation for something you'll be using, and trying to keep that as small as possible because it'll just perform.

Speaker 1

23:48

There that way.

Speaker 3

23:50

So that's kind of how I see things.

Speaker 1

23:54

Yeah, it's it's it's kind of an interesting dilemma, but yeah, I like the I like the approach that you're advocating where you, yeah, you break it up and specialize it because then nothing has to hold too much context, right, all all your you know, your different things have to know about the things the other ones do. Is how to tell the other one to do it or how to get the data back once it's done, and so

24:19

that that keeps your context smaller. Yeah, okay, ahead, So so one other thing that I'm curious about, because you're building a tool that allows people to build these workflows out. So what's that like as far as you know, not necessarily doing the prompts or putting the prompts together, and maybe you are, maybe you are doing that as part of your tool, But what's it like building an AI tool for other people to use that allows them to build these workflows?

Speaker 3

24:47

Yeah, within using WOWS. It's a it's kind of interesting, so in the sense that it's a new paradigm of programming, so kind of kind of bringing what the value of working with AI and AI workflows. I think we kind

25:03

of go back a little bit kind of human evolution. So, I mean we started with language, which was a way to be able to pass on like knowledge to other people, right, and then the concept to be on having a written language where you're actually able to write down some knowledge that can be passed on really changed the way we learned and grew. Then you had like the printing press that allows it to be distributed really quickly, and that

25:33

really changed humanity and that kind of sense too. Then kind of one of the next iterational steps I kind of see is is just traditional programming, where where now like with written language, was a way to pass on and to distribute knowledge, programming allowed us to capture and to pass on intelligence, and so so I could take that knowledge, I could solve a problem, create an algorithm, and then distribute that out through like a disc type of thing to other people. And now we have the

26:09

Internet to be able to pass that on. And that was a way to be able to create and to distribute intelligence in that kind of sense that someone didn't. Someone could actually just reuse that algorithm to get an

26:23

answer or solve a problem pretty quickly. I kind of look at working with AI workflows is kind of like another iteration where you're actually building and being able to distribute wisdom in a sense, so you can create these agents that can take a problem and have that context and understand how to apply different kind of intelligence to

26:49

solve issues. And so one of the great things in that kind of sense is one of the things that AI is really good at is being able to understand like natural language and understand the intention of what someone is trying to accomplish and be able to then create many different ways of like a response. So one could be natural langages coming back and like specifying exactly how that solution could be a natural language, or could be calling a tool to perform some kind of tasks as well.

27:30

So in working with AI workflows, the way I kind of look at it is almost like you're building like a team of people that would accomplish a task, and so you divide out the responsibilities between different agents that will accomplish the task. So a lot of times, like the first agent that you might make might be something called like an orchestration agent or a conductor agent, which basically will take it will take the prompt from the end user and say, okay, what team of agents should

28:05

solve this problem. So if it's just a question that about the software that you're they're using, then you might just send it to this knowledge based agent that basically can give an answer really quickly. Or it might be

28:18

like a task. So in the sense of speak Magic AI, you give like a story prompt and then it says, okay, all right, this is the very first prompt, So we need to create a story and we need to divide that up, so it sends it to this list of agents over here that then sequentially go through and work together to build a video basically. And that's one of the kind of different paradigm shifts as well as I

28:45

think we'll see a lot more of is is. Even though a UY is very helpful and I think we'll always have a UI for like websites and things like that, there's a new kind of wave of input of natural language that we can start working with AI. So it's almost like having instead of using just a software like maybe like a SaaS product, it's almost like having an assistant that you can work with so that helps onboard you with that that or can even perform tasks for

29:17

you within the software. So you say, instead of like trying to figure out, okay, let's say if we have a CRM, instead of being able to figure out, okay, how do I add a new customer type of thing and what's available here? With traditional kind of like a SaaS product, you have a real estate that is limited. So I mean, you don't want to stick a thousand different buttons on one page and expect someone to figure out how to use all these different buttons to do

29:48

different things or forms and things like that. That would that's too much information for the end user. But the cool thing about working with something like that language is you could have thousands of different tools and tasks that it can perform for you. It can understand the whole website, it can navigate you to where you need to be.

30:10

It can explain how to use it, or it can accomplish those tasks for you and fill in those forms for you, and they could once you fill those forms or have those kind of filled out, it can look at those and say, hey, I see that the way you fill this out is this way, but here's some more context on tips and tricks to make this better or maybe just improve improve it for them and then ask them to confirm, hey, this is exactly what I'm

30:36

looking for. So working with it's definitely it's very different in the sense that with traditional programming, a lot of times we talk about like pure functions, where this one input is always going to give you this output. That's kind of one of the differences working with AI too. It's a little bit more of a black box where this input could be, uh, the answer coming out could

31:03

be multiple different kind of outputs. So one of the things I like to do when making AI workflows too, is anytime anytime there's something that could be done with traditional programming, like say, if if it's parsing out information, it's doing math type of thing, whatever, it's typically you want to kind of steer away from AI, and so you want to parse the information out from the answers and then and then have kind of traditional programming actually

31:38

a figure out what what the output should be and then AI use AI usually where where it performs best basically, and so and because one of the things that you dealing with AI's hallucinations, and so that's one of the things we're working with workflows as well, is dealing with

32:02

hallucinations and understanding when it's important when it's not. Like you said, you had a friend that using AI to write stories, helping out that process, like that's kind of an interesting use case and something that kind of deal with with speak magic is because there's not necessarily a right answer or wrong answer necessarily, but there could be a preferred answer or a preferred response versus a non response, and so it could be a little bit more difficult

32:33

on judging what is good and what's not. Except for having a human going and basically say hey, I like this and I didn't like this type of thing, and refining the prompts that you use with the agent versus something like where you might have a specific right or wrong answer. Then it's a lot easier to to improve the prompts and then just doing a whole bunch of iterations and tests and say, okay, this is you know exactly if this is a correct answer or correct response or incorrect response.

Speaker 1

33:11

It's kind of one of the ways working with it too. That's a little different too. So when you're writing these tools for other people, then are you doing some of that where you're actually trying to like beyond what they give you, you know, you add other things to the prompt get them better answers or yeah, so let it ride.

Speaker 3

33:36

Yeah, And that's one of the things ISH is trying to steer it down certain paths that will give you

33:46

reliable outputs to what they're looking for. So uh and so one one of the ways, I mean, there's different kind of things to kind of look for in that one is when you getting like a prompt, one example was speak magic is is okay, we have it create like a story first, like a summary, and then we have it actually we actually have it parse out, like so we have within the story, we might add like some like H one tags that basically break out the story and says, okay, we're on act one or something

34:29

like that. It actually chooses a story model first, and then it has these different steps within the story, and so we're having it kind of guiding it down a certain kind of framework, and then we split out that story by like having it add like h ones for

34:46

the different sections, different steps of the story. And then and then we'll have it turn that portion of the story into different scenes and and then so like one of the things it'll do is like, okay, One thing I found is I would have it to do, like, say, if we're turning it into a scene. I was having

35:07

it do too many different things at once. So I was saying, hey, I want you to write a scene for this portion of the story, and I want it to be in this very specific format that then I can then parse out to make shots from that scene.

35:21

And so one of the issues I was running into is it was having a hard time doing both of those steps, and so what I ended up doing is say, okay, well, and all of them is actually fairly good at making a scene like a screenplay like a script basically because it has a lot of that information how to do that.

35:42

And so I broke that out and said, okay, do that first, and then I have it's and it's response of that script to the next AI agent that says, okay, now let's add the formatting on here, and gave it all the rules of how to add that formatting to that script for that scene, and then that allows it to then parse and say, okay, this is how we we define all the different shots that make up that scene.

36:08

And so then I have JavaScript basically go through and split all that into different pieces and into different shots, and then I can have the NEXTAI agent basically parse the shot out and figure out, Okay, what do we need to do to create this shot?

Speaker 1

36:28

Awesome? So we're using Jason mode because I know some of these allow you to send the data over as Jason as opposed to straight text or other formats. Most llms, I will say, are pretty good about pulling text apart and putting it back together and figuring out what you want. The Jason just gives you more accuracy, is what I found. No, I totally agree.

Speaker 3

36:51

So like with speak Magic, I'd say about eighty percent to ninety percent of all the agents I use actually using the tool use the Jayson mode and and that there's a lot of vantages for that. So if it's writing story, then I just typically I'll just have it just write the story using kind of just plain kind of texts. It's usually marked down, formatted coming out.

Speaker 1

37:20

And so.

Speaker 3

37:22

But with a Jason there's, uh one of the cool things that that one of the strategies I use with that is is with a Jason mode, you can actually like say okay, I want it's basically have a jasent object that has different properties that needs to fill in. And so one of the cool things about it is that actually has an order of how those things are because elms are sequential, so they do token by token

37:49

and so it'll the top. Even though like maybe an object in JavaScript isn't necessarily has has an order when it's filling this out, it's actually has an order. So so one of the things I've done with it is is have it actually think through step by step of how to approach a problem and maybe several of these properties that's filling out. It never really ends up using. But it's a way to force it to think like lineally in a certain way that prepares it to give

38:20

accurate answers, if that makes sense. So it's kind of like a form of like the thinking model that you see things like, oh, one model that kind of does, but you're you're able to then specifically guide it through a whole step of like, Okay, let's build our own context of like and have it think through different steps by filling in these different properties and then being able to use that context and it's thinking to be able to force it to then give the answer that or

38:51

the properties that you're actually going to use for the next steps or for the end user.

Speaker 1

38:56

Basically, yeah, makes sense. So one other thing that I'm looking at, so just give a little bit of context for people here. I have two things going that I'm wanting to put together. One of them is I would like to have some kind of AI help agent kind

39:12

of thing on top end depth. Right, So if people are looking to go through some of our courses or things like that, you know that there's essentially an AI agent that can help you figure out what to learn next or you know, maybe next steps for your career or things like that, and so I could see that as a kind of a coach if nothing else. And I'm trying to figure out how to manage the context right because people may ask a lot of questions or

39:42

give it a lot of information. The other thing I'm running into is if somebody leaves and comes back, right, then do I summarize the previous context and hand it to a new query or can I just pick up where I left off? So, do you have any recommendations on something like that where maybe it's not one continue as session?

Speaker 3

40:06

So me, maybe can I re explain that that question again?

Speaker 1

40:11

Yeah, so so let's say that it's it's an ongoing tool, right. So the other the other idea I have that I want to build out is essentially I've hired virtual assistance in the past to help me do a lot of kind of routine things with the podcasts, And so with that one, I kind of see more of the team model and I can just come in and you know, I can just use a tool for it to go look up the information about my podcast, and then you know, I can tell it to do tasks one off at

40:39

a time. And so for that one, I'm more thinking Okay, how do I just make sure that the workflow works. But for the other one, you know, where I wanted to remember things about the people who are coming and asking for help, and I want if they show up and say, okay, I just got a job interview, how do I prepare for this? You know, it's smart enough to say, okay, well, what can you tell me about

41:04

the company? You know, maybe get a little more information, and then turn around and actually remember enough about them right to help them. Do do I have to store that myself to get the continuity or can I you know, is there some form of cashing the context window that the different lms do, because most of the time you're hitting these over APIs and so anyway, that's kind of what I'm wondering, guys, is allowing people to pick up where they left off when they're using an agent.

Speaker 3

41:35

Yeah, so when using all these different LM models, every time you send a request the API, you have to actually give it the full context of the history. And so that was one of the things that kind of

41:52

hit me. I thought, for some reason I thought when I first got it involved a couple of years ago, is that you just remember your past calls like maybe you get like like a conversation idea or something like that, and it would handle all the state and the memory of past conversing with that like conversation idea or something

42:11

like that. But it turns out like it's actually more stateless, and so you have to send it all the information, which has advantages of disvantage one and the disvantage would be be a lot simpler just to maybe have it manage it. But then you have a whole bunch of control over what kind of context that you're giving to it, and so you need to keep track of that information that could be I mean typically would be like in a database or something like that, and then you can

42:43

pull It's also important though too. We talked about context window, and different models have different sizes of context window. I mean you're looking at like different ones. I mean some could be like only eight thousand output tokens. Some of them have like one hundred and twenty eight thousand, like opening eyes. A lot of them one hundred and twenty

43:06

eight or two hundred thousand context window. And so sometimes what you need to do is if you want kind of the more full context of things, once you're you start using more and more up is to summarize the information or to pull out key pieces of information that are really important. And that's one of the things we do, like speak Magic for instance, is we have kind of a kind of in a sense like an adjson object that keeps all the important information like what kind of

43:42

style does the video you need to be? Is it more animated? Is it live action like type of thing, because you want that consistency from shot to shot, from scene to scene for the video, like who are the characters and like character profiles of each one, so we have very descriptive of a lot of different things of that character. And that way, when when we're generating an image of that character, it will have all that context.

44:09

It's okay, this character has curly black hair or something like that, and and different kind of things, so that when it generates the image before it turns into a video.

Speaker 1

44:19

It'll be correct.

Speaker 3

44:20

But and we also don't want to overload the context as well, so it doesn't need So it's basically, when when you're working with a call, you want to do kind of a need to know like context, and so you want to be dynamically being able to decide on what context is important. For that specific call. This is really important also, so and you kind of have different

44:45

types of memory as well. So you're going to have memory for what is the state important for this specific call at the moment, there's going to be kind of memory where it has to do with the whole run of the workflow, especially when you have cycles of humans going through so like you get a question or you give a prompt, you get a response, and then you're continuing that conversation, and so you kind of need more of a more global kind of memory that matters as well.

Speaker 1

45:18

Gotcha, Okay, Yeah, I guess my concern was if I'm gathering information, you know, if I have to store information when people are asking more personal things like should I quit my job or should I you know, you know, they tell it I'm not happy where I'm at and so I'm looking for another place or things like that.

45:39

I mean, I think the expectation is that it's confidential, right, I mean, if I'm given it a creative prompt and I'm having a create part of my story, maybe I'm you know, I don't feel as compromised by somebody seeing oh, you know, he generated a video with a you know, with a girl with black curly hair and a you know, and a you know, a villain creature thing that does you know, has certain powers or whatever, whatever, you know,

46:04

whatever my story's about, right, it's different than if it's, oh, well, we didn't know that Chuck was not happy here and was going to start looking for another job, or you know, asked how to answer questions about things that he's not proud of in his career past, and so, you know, so keeping that confidential seems pretty important. But that's a

46:26

common problem for a lot of other things. So you're telling me that, I that's something that I have to figure out because the LLLM isn't gonna keep track of

46:34

it for me. One last question, because we're kind of getting toward the end of our scheduled time, is because I know a lot of people are interested in getting involved in AI and learning AI, and to be perfectly honest, I feel like if you're not if you're not getting a feel for how AI works and what it can do and how it can at least help you where you're at, let alone, where it can help the users of your applications where they're at, then in a year

47:04

or so, you're going to be way behind. And so if people are sitting here and maybe you have a different opinion on that, and you can say that in a minute, But at the end of the day, if somebody's going I need to understand AI, what do I need to do in order to get there? Where do you recommend people start and what are the kinds of things that they ought to be picking up in order to be successful with this?

Speaker 3

47:27

Yeah, I think in some ways I would say kind

47:30

of putch it like the same of like learning programming. So, I mean, there's so many different directions you can go with it, but it's usually best to pick out, like if you want to learn more about something like a programming language, is to pick out like a simple small project and that you're interested in or maybe that's what you want to provide type of things of value and then and then learn the parts of that to accomplish that kind of programming task.

Speaker 1

48:05

And so.

Speaker 3

48:07

I mean one of the great resources for that is is using like a AI chatbot to do the research for you and answer the questions for you. It's actually, I mean, it's a great way I've been using it for I've been learning Java recently, and it's a great way to You could have it like quiz you on what you know, what you don't know, or answer the questions. It's like a fantastic tool to be able to learn

48:39

new technologies and things. But there are I mean, there's different ways of kind of approaching it, and so you can using JavaScript, you can go just plane, vanilla node type of thing setting you're accessing these different APIs UH

49:00

work with maybe one API at first. A lot of them are fairly similar to each other when it comes like l MS, and so you can there's different tools you can use if you want to go beyond kind of figuring out that kind of thing yourself, UH which you can have AI help you write things different parts of creating these workflows and stuff. You can use tools

49:24

though that get you a little further ahead. If you're using JavaScript, like something like line chain dot j s will give you a bit of a framework of how to to work with things. There are other tools out there, like if you're looking more for like if you're more interested in just AI workflows, you can use tools like uh in eight in which allows you to like a

49:50

visual kind of way of putting together these workflows. Wows AI h is a way to do the kind of same thing where you're just using Localde tools, so you're building things out visually, and then you can use JavaScript expressions to basically to parse information, to decide what contexts to use in different places, and to kind of help

50:16

control of how the workflow is ran. But there's I mean, you can use things like Cursor, which would be more of like an assistant that helps you within like your ide to build, to help you generate code quickly that could help you in the process of doing things.

Speaker 1

50:40

Or you could use like more of.

Speaker 3

50:41

A a coding agent like clawed code or like Manus, which I think Manus has a that's a not necessarily public for everyone right now as far as I know, and that will help generate larger kind of projects from single prompts you do more kind of like the vibe coding which was brought up earlier. So those are kind

51:08

of some different tools that you could use. Another tool maybe look at if you want to use a lot of different models, would be something like open MCP, which is a JavaScript kind of implementation of that allows you to help standardize using different models and different kind of

51:32

tools that you could use. I mean, we have something kind of similar that we added a long time ago, and to WOWS that we have like thirty I think it's thirty three different models that you can access, and they have has a very similar architecturecause you're just looking at bring a prompt in and then a response out basically, and then we handle all the API kind of stuff for you. So those are kind of different things.

Speaker 1

52:01

I kind of I.

Speaker 3

52:02

Would recommend for something that's learning different the tools and things that they can get involved with and use.

Speaker 1

52:09

Yeah, one thing that I'll add to that is and you've kind of alluded to it in the way that you've told people to approach stuff, is you know, before you're even writing code, you can just go in and go to like chat, gpt dot com or grock dot

52:23

com or anthropic. I can't remember that you are all to use claude, you know, but just just get in and just start asking it questions and kind of get used to how it works, because ultimately what you're gonna be sending over is prompts a look a whole lot like your questions anyway, and so you can figure out, you know, the different tricks that work and then from there. I also recommend that people go pick up a course or a book or something that does some explanation on

52:55

prompt engineering. The reason is is because the one is knowing how to access tools and access the AI and how to ask go questions. But the rest of it is just going to be down to your prompts and how you format them so that you're getting the best answers possible from your AI.

Speaker 3

53:12

So and there's a lot of iteration. So, yeah, something doesn't work with your prompt engineering, try tweaking things. Pull out pronouns like don't use this or that, be very explicit on what you mean about things. Keep the context low.

Speaker 1

53:28

Yeah, all right, well let's go ahead and do some pics. Now. I don't think you've been on the show before, so let me just explain what they are real quick. It's just us shouting out about stuff we like. So a lot of times people do TV shows or movies or technology tools or anything in between. We'll let Steve go first, and then I'll go, and then you can go last, and that way you can kind of get a feel for how we do it.

Speaker 2

53:56

Yeah, Matthew, Just one thing he didn't mention is that the high point of every episode of ours is my dad jokes, dad jokes of the week, and so anyway, make sure hopefully my sound effects are working properly. Okay. So as an example, having a conversation with my friend and I said I actually have a half brother and he said different mothers. I said, nope, shark attack, thank you,

54:27

thank you. So last week was Saint Patrick's Day, I think was a week ago today, and I bought a diamond ring for my wife, but it turned out to be a fake. They gave me a sham rock. And then my dentist, who's actually a good friend of mine, I'm known for.

Speaker 1

54:48

A long time.

Speaker 2

54:49

He he got this local award where he was voted the dentist of the year. He didn't get a trophy though, he just got a little plaque. And then finally question in King Arthur's time, which of the Knights of the Roundtable collected taxes? Because you know they collected taxes, sir, charge those are the dad jokes.

Speaker 1

55:16

Of the week. All right, well how do you how do you follow that? Very humbly, very humbly. Right. So, yeah, so I don't know if I've played any new board games lately, so I'm just gonna throw out something that I've picked in the past. This is something that we've played before, me and the guys. I did find out though, that that has a different mode to it you can play, so I'm gonna pick it again even though I haven't played the mode. The campaign mode the board game is

55:50

called Heat Pedal to the Metal. It's a racing game, so everyone's in race cars. You play cards in order to move forward. If you go around the turns too fast, then you take heat from your engine and put it into your deck. The heat cards don't do anything, and you have to do specific things to get rid of them. That will often make you less efficient moving forward, and so you start figuring out how to get through as many turns as possible as quickly as possible so you

56:23

can get to the straightaways and take off. The campaign mode is you play multiple races and you collect money and then you can upgrade your car board game. Geek rate waits it at two point one nine, which is pretty casual gamer ish as far as that goes. It says ages ten plus. I think somebody a little younger than that could play. The strategy is not terrible, and

56:51

if you kind of help them. With the mechanics, I think I think you could get like a six or seven year old to play and that would be fine. I've played it with four players that plays up to six and it takes about an hour to play. So anyway, a lot of fun. This is. This is one of the favorites lately. In fact, on Board Game Geek. It actually is number forty one overall on the games. So yeah,

57:20

really enjoying that. So I'm going to pick that. And then lately I've been watching a couple of shows and I think I've picked them over the last few weeks, but I don't remember, so I'm going to pick them again. The first one is nineteen twenty three and it's prequel to Yellowstone, and I'm really enjoying that.

Speaker 2

57:44

Is that.

Speaker 1

57:44

The one is Harrison Ford. Yes, yep, yeah, I've heard that.

Speaker 2

57:49

Him trying to swagger down the street like a cowboys not the best picture, but.

Speaker 1

57:55

Yeah, yeah, he's definitely old, but it's funny.

Speaker 2

58:00

He does a new uh, he does a new commercial for I said Jeep or land Rover, and the very last thing he says is, yeah, yes, I'm doing an ad for them. Even though my last name is Ford.

Speaker 1

58:15

A sort of funny guy.

Speaker 2

58:17

I thought, that's just me.

Speaker 1

58:20

So anyway, watching that, I'm also watching Reacher. I'm enjoying that. And then I'm about done with the book Rhythm of War, which is a Brandon Sanderson book. It's the fourth book in the Stormlight Archives. So every time he releases a new book, which takes like forever to listen to on Audible because it's you know, I mean, I've been listening to the Rhythm of War, I think for almost a

58:44

month now, just because I listened. You know, I have a ton of time to listen, so I listen when I'm like trying to go to sleep or when i'm you know, out in the car or something. But yeah, so he released wind in Truth and I just haven't gotten yet. So anyway, Rhythm of War I'm really enjoying as well. And then I think that's pretty much all I've got for picks this time. But Matthew, what are your picks? Okay?

Speaker 3

59:13

So I love doing research on AI, so I'm always trying to keep up on the newest models. It seems like lately a lot of new audio like texts to speech audio models have been coming out, which I've been waiting for I had some ideas of how to kind of produce some myself, but now with these coming out, like I'm pretty excited to add that to speak magic. So one of them that kind of hit that was amazing.

59:42

It's called sessing Me, and it brought a lot of has a lot of emotion and kind of context, aware of of of how to provide like in real time like conversations. It just sounds a lot like a human. There's been a few others that have come out has recently been great too. One of them from open Ai. If you go to OpenAI dot fm, it has a way to be able to control like how like how the the speech should be generated, so you can put emotion in it, you can give it different things like

01:00:28

have a kind of accents or different things too. It's pretty incredible as well. So those are kind of some of the new AI models that have been pretty exciting coming out recently. So see and then I brought up before kind of looking more into open MCP, which is seems like a using JavaScript a way to be able to connect to different models and different other kind of

01:00:59

services as well. Is pretty exciting. Let's see, I've watched some Reacher too, So I guess they have one more episode left on that's next week.

Speaker 1

01:01:11

That's been Yeah, it's definitely getting there where it's yeah, it's gonna wrap up.

Speaker 3

01:01:15

So it sounds like it's a lot closer to like the series and like the movie, like the books basically than the movie. I guess it's pretty cool. Yeah, that's kind of that for me. But that's uh.

Speaker 1

01:01:35

Yeah. I meant to throw out one AI pick and I forgot about it, and that is if you want to play with some of the large language models, especially around text. I've been using an open router. I don't know if you've used them. You can also get some of them to run on your own machine if you get them off a hugging face. So those are the two resources that I'm going to recommend. One of open Router.

01:01:58

You can use their libraries to connect to them, and then they connect to all the other models, so you can try out the LAMA three model, the open AI GPT models, you can try out Claude, and you can switch between them so you can see which ones work best without having to do a whole lot of extra work to program against each one. And then hugging Face they have their a whole bunch of other models and you can run those all locally, and then I think

01:02:31

that's huggingface. Dot co is where you get those nice all right, matt If people want to find your stuff, where do they find you online?

Speaker 3

01:02:41

Yeah, check out speak Magic ai. You can see kind of we need to add some newer examples on there. Our quality has gone up and with some of these new models that come out, they'll be even better. So you can check out Wiles at Wiles dot ai and uh so, kind of interesting kind of looking at transition there. So I'm trying to decide on how to take things further with that, possibly maybe going to open source kind of route or kind of make it more community driven.

01:03:23

So if you're interested with that, you can send me a message at Matthew at Wiles dot ai. Yeah, that's kind of kind of two kind of places for me.

Speaker 1

01:03:35

Awesome. All right, Well, let's go ahead and wrap it up here until next time, folks. Maxxed out

Transcript source: Provided by creator in RSS feed: download file

Building Agentic AI Workflows with Matthew Henage - JSJ 678

Episode description

Transcript