Getting better at LLMs, with Zvi Mowshowitz

⁠¶ Intro / Opening

00:01

Welcome to Complex Systems where we discuss the technical, organizational, and human factors underpinning why the world works the way it does. Hi to you, everybody. My name is Patrick McKenzie, better known as Patio11 on the Internet, and I'm here with my buddies, Vy Moskowitz. Also known as Vy on the Internet. We're here at the Lighthaven campus taping during the Less Online conference, although who knows when this audio and or video will hit the internet. But...

00:32

This has essentially evolved into the people talk about writing, people talk about AI, people talk about AI and writing events this year. And Zvi has a newsletter, Don't Care About the Base, which in recent generations, the newsletter has been very AI focused and is also... something of a power user i don't consider myself a power user but i think everyone who cares about this topic is a power user relative to many of the people who don't care about this topic and so

00:57

Just explaining some of the, to use a phrase that you use all the time in the newsletter, mundane utility that you can get out of LLMs and how you can get better at them. Sounds like an interesting idea for a conversation.

⁠¶ Understanding system prompts

01:08

With that prompt out of the way, let's start with the system prompt, which is a feature that many of the cutting-edge LLMs have these days. You get transformatively better outputs if you're good about writing your system prompt. Can you just tell people who might not be familiar with this?

01:22

A, why is that true? B, how would one start writing a system prompt for yourselves? So the system prompt is a message you essentially give to the language model before every output, before every conversation. It tells the system... here's who i want you to be here's how i want you to respond here's rules for like what types of outputs i want how i want you to act what i want you to do

01:47

it's not perfect. He won't always do the things that you request. And there's various tricks you can do to emphasize this more or less. This will complement the prompt that you've been given. from the company. So you'll add your prompt basically to their prompt. And the prompt you use should be directed at customizing the model to act the ways that you want to act.

⁠¶ Customizing LLM behavior

02:12

that are different from the default way if they set it up to act. And based on what you're trying to do, that might be various different things, and depending on which model you're trying to use them for and what its strengths and weaknesses are.

02:25

There are various ways to use both ChatGPT and Claude in particular with multiple system prompts via various projects so that you can swap between them easily depending on what purpose you want to use them for because the prompt you use when you're coding...

02:39

is probably very different than the prompt you want to use when you're writing. So when you consider these products as software products, they are like many software products with a growth curve, which is almost absolutely unbelievable. But these are... like very broad products, which are optimized by very intelligent teams. And so they want the out of the box behavior to be good for like a very common denominator of early adopter.

03:01

You are not a common denominator. You have stuff which is unique or maybe not unique about you, but descriptive of you that you could rattle off to a couple sentences to someone. And just rattling off those couple sentences will make them better almost out of the box for you.

03:15

So a thing that I tell LLMs early in the system prompt is, my name is Patrick, I have an engineering degree, and I'm a sophisticated user of LLMs. I understand you're an LLM. You never need to explain that to me. And I am... sophisticated with regards to your limitations, and you don't have to explain those to me either. And then there are some just things that, for whatever reasons, the alchemy of the math or the training runs that the company has run.

03:39

tend to bias LLMs in the direction I'm doing. And I say, you know, often you are quite unctuous to a user. I don't need you to be contrarian just for the sake of being contrarian, but... I treat you like a middle seniority trusted colleague who has earned some level of being able to push back against me. And so if you feel strongly, please do push back.

03:59

I virtually never use you for emotional support. And when I do, I will tell you I would like some emotional support right now. So please don't be overly deferential and attempt to perform emotional support for me. Recently, they seem to know what I mean. You have this thing that is the engagement close where would you...

04:14

like me to do, blah blah blah blah, I typically find that annoying. I will tell you if I want you to go in a particular direction, so unless you really think I want it, don't do that.

04:22

and so similar things to that and you can do this by a conversation by conversation level etc etc like we just had a talk at the conference where someone said the prompt explain this like i'm five produces the all i'm output which under predicts my level of understanding i really liked your answer with respect to that what would you tell someone who is repeatedly saying explain it like i'm five and being unhappy that they get explained like they're five explain it like i'm ten

04:50

Explain it like I'm 15. Or in my case, explain it like I'm S.P. Moshawetz and I actually spell my name because I have so much data at this point in the training corpus that it knows who I am in my style of approach and thinking. So I can just do that. And that actually brings pretty good results.

05:05

and if there is a person who routinely writes at your level of understanding or you enjoy their work or etc etc you don't literally have to be matt levine to say hey explain this like you would explain it to matt levine and And that, by the way, like being able to go through different personas of user and.

05:22

like you can tell an LLM explicitly model me as if blank and then answer this question so you know model the person asking this question as young person who works as a staff of a member of Congress and is quite sophisticated with regards to parts of those jobs but doesn't understand the domain, for example, will get you a very different answer than, model me as if I have an engineering degree but not specifically in the subfield.

05:47

We'll get you a different answer than, you know, model me as if I'm a bright high school student. And you can pick on a case-by-case basis which one works best for you. The LLMs have memory features too, which are somewhat obviating the need to be like...

⁠¶ Memory features in LLMs

06:03

quite prescriptive in your system prompt but when I find something that I really really want to stick in an LLM's memory rather than saying please remember that blah blah blah I usually explicitly promote it to the system prompt because then I know they are much more likely to remember it. System prompt is a lot more impactful than a memory so chat gpt currently has memory both in terms of you create memories and it can reference your previous chats

06:27

The other LMs have varying degrees of it. Gemini has some degree of it, I believe, as of filming. Anthropics Claude, which is in fact my current model of choice, it is June 1, 2025, by the way, does not currently have that. So you do have to remind it of any information you want to know in some other form. I expect that to change soon, probably in a matter of weeks, maybe a few months. But you do need to be cognizant of how all this works.

06:55

But actually, memory can go both ways. A lot of readers have reported, users, readers of my column, that memory caused a bunch of sycophancy and other issues with ShetchHPD's models in their observations. And they had to turn memory off, at least during one point when GPT-40 was being rather obnoxious. And this is a general problem in the internet, where if the company is training on everything that you tell it, then...

07:23

both you have to worry about what everything you do is telling it, and it might respond in ways you're not pickery like, and it could get stuck in some sort of mode that you don't want it to be in, and it can be very difficult to get out of that. And in many ways, I kind of prefer the ability to curate exactly.

07:38

what it does and does not have in the context and what it does and does not know about me. And for ChatGBT, I don't turn memory off, but I also consider deleting chats if the chats seem to be causing issues in some form, or I just don't. particularly want that to be in context. I occasionally have this problem with memory myself, but not enough that I default to memory off, I default to memory on. And the sort of thing that can happen, so the synconcy one problem where...

08:04

Also called glazing. I don't even know what that word means. I'm a millennial, not a gen whatever it is. But think a donut. Putting all this sugary glaze on the donut. Okay. I'm not even sure if I'm going to have to edit out this portion. But do I have to look this up on Urban Dictionary? Okay, be the resume.

08:21

Occasionally, they will also ridiculously over-index on facts that they have memorized. So, for example, ChatGPT knows I use Vallejo paints, which is extremely relevant when giving me painting advice, which is why I told them, please remember that I use Vallejo paints, so if you need to call it a reference, like telling me that...

08:36

exact name of the Vallejo paint is useful. When I'm doing generic research with regards to, say, payment method related topics like vallejo paints being present in the output is usually a mistake and yet that will occasionally happen because yeah you told me you like vallejo paints i will paint the world in vallejo for you the the thing i do when i would like it to temporarily forget something is

08:55

I believe ChatGPT has a feature where you can have the equivalent of an incognito window within the instance of ChatGPT, where it will neither use anything that it knows about you, nor... remember anything from that conversation in other chats. I find that useful both when I would like a clean slate to work from and also when there are topics where I don't really want to know what you as an LLM would tell me specifically. There is a persona that I really want you to target right now.

09:25

Like, I care what that person is hearing from the LLM, so please adopt that persona exclusively and don't even know who is pulling the strings behind that. Or you want to know what the generic answer is, or you want to have an answer you can defend as generic because... You know, when I use O3 in JetGPT at this point, there's enough between my system prompt and it remembering my various other inputs that you can never be sure how much of that output...

09:53

is it being biased by who it's talking to. So if you wanted to like give a truly objective output from O3 and it really mattered. you would want to use an incognito window, I believe. A lot of this reminds me of the SEO game from back in the day where there were various sets.

10:08

tactics for using google where you could more closely approximate what sort of the generic user of google was getting as output rather than you know search history customized for you based on your previous interactions with google and so

⁠¶ Generative Engine Optimization (GEO)

10:21

I don't think anyone has invented the field of LLM optimization yet, either what you need to put on the internet to get future LLMs trained on your stuff to say more things that you like, or even... Oh, you mean GEO? GEO. Generative engine optimization. It's a thing. It's a thing? It's a thing. What works for GEO right now? So, I don't have too many of the details. I wrote about this a bit in my last update, but... definitely people like what they're focusing on is the search capacity so

10:52

It's a offshoot of SEO because LLMs will search in particular ways. And so you're trying to anticipate what searches will be triggered by LLMs. And you're trying to get at the top of those particular searches and then serve up things that will cause the LLM to... onto your things as relevant. So you're trying to match the types of things the LM will be on the lookout for. So one way to get better at that is for...

11:15

chain of reasoning models, such as the ones that come out of the box with ChatGPT and Cloud these days, they will give you a partial explanation of what the chain of reasoning is, and that will often show you the exact search query they're running. And I think they're running it, and it's presumably different for different companies, but what was it?

11:30

braves search or something in the background or it almost really varies by a company i believe claude will tell you what it's searching for and you can expand to see like extra details but i don't know which search engine it is on the top of my head yep you can see the search query. They're often painfully generic as someone who is old hand at like crafting search queries. Like they are not as skilled as using the tools as they are skilled at producing other outputs.

11:54

perhaps that's intentional but you know given that you could influence a query which is the very generic way to phrase something but they're often six or eight or ten word queries that they type in and influencing a ten word query is

12:07

not that hard relative to seoing for like you know low-cost mortgages so simply getting better at seo can get you better at geo i guess The other obvious thing, but it deserves to be said, the lambs are crawling or using people who have crawled large portions of the open internet. you will show up more in the training set if your stuff is on the open internet. And so there are decisions you can make at the margin to have more stuff beyond the open internet.

12:34

particularly at a conference for writers, the paywall is a useful thing for generating money out of a variety of subscriptions. But I would tell aspiring writers how to think very seriously about the mechanics of that paywall versus the binary.

12:48

an off-ness of it. For example, if one is writing a periodical, I would strongly consider, okay, paywallet for the first, you know, 60 days, 90 days, 365 days, whatever of the life of an article, at which point its value to you as news is low, but its value to you as like fodder for SEO and geo is much higher and then moving out of the paywall at that point.

13:11

and assume that you're selling something about, you know, your presence on the beat and your particular voice and parasocial relationship, maybe, versus selling strictly access to your oldest of old words. Yeah. As a writer, you look at the view counts for older posts, and with notably rare exceptions, the numbers do not move after the first few days. Almost nobody goes into the archives.

13:34

So there's no bigger reason to hide the archives. It's not going to motivate subscribers very often. There is, in fact, one thing I have subscribed to because I wanted to mine the archive for a while, but that is, in general, very rare.

13:48

it's also generally speaking like as a business practice a poor way to make money because the person who is attempting to get something out of the archives for from 2002 will subscribe for a minute to get that thing from the archives and then immediately unsubscribe And sometimes go the extra mile and either ask you for a refund for the subscription or chargeback. All of that is complications that you as a business owner don't really need.

14:13

Assume the value of the archives is like very low from the perspective of directly generating revenue and more useful as a strategic tool and then take the obvious steps to make it more valuable to you as a strategic tool. The obvious conflict is, you know, you hear a lot about like... They stole our data. You know, it's not fair. We don't want the AI companies to use our data unless they're paying us or unless they work out some sort of deal. And then there's people like us who are thinking,

14:40

We don't want to accidentally be left out of the training set because that would be a disaster. We want to sculpt the cognitive features of the world. We want to be imbued into the collective intelligence. We think this is a good thing. obviously being paid for it would be even better. But, you know, on the margin, I think most of us should welcome that. Yeah, I think there should be an awareness of one's likelihood to get a bespoke deal with the large AI labs.

15:06

Very plausibly, the Wall Street Journal gets a bespoke deal. Very plausibly, Simon & Schuster gets a bespoke deal. Very plausibly, the New York Times gets a bespoke deal. I think it is very unlikely that I get a bespoke deal unless I put a stupid amount of my relationship points into getting that and then get something which is worth a very small amount of money to me rather than things that are worth a stupid amount of relationship points. And therefore, I will not ask for that bespoke deal.

15:31

I think that other people who don't have the relationship points to spend are just vanishingly unlikely to get a bespoke deal. And so if the alternative is get paid nothing but...

15:42

Feel a little bit of moral righteousness as being left out of the training set or get paid nothing. Be in the training set and feel moral righteousness because you're a good person and I have produced something the world wanted. I would pick the second thing a hundred times out of a hundred. Yeah, be in the training set, obviously. if you use the New York Times, try to get paid for it. I think the acknowledgement of an ad read sounds cooler in Japanese. Cool, right?

⁠¶ Sponsor: Vanta

16:09

Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit, or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001.

16:33

to centralize security workflows, complete questionnaires up to five times faster, and proactively manage vendor risk. Banta can help you start or scale your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Banta gives you time back so you can focus on building your company.

16:55

Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vanta to manage risk and prove security in real time. For a limited time, listeners get $1,000 off Fanta at Fanta.com slash complex. That's V-A-N-T-A dot com slash complex for $1,000 off.

⁠¶ Art and AI: Enhancing creativity

17:17

I have a small amateurish interest in art, and we'll pivot this from the craft of writing to the craft of art for a moment. But as we are talking about people who feel some sense of moral revulsion, in various parts of the artistic community, there's this...

17:30

Oh, they'll never actually be able to produce art. Okay, they can produce something which kind of looks like terrible art. Okay, they can kind of produce something which looks very much like art, but it's because they stole all of their stuff as training data. Bracket all that for a second.

17:45

And I don't want to try to convince anyone from their aesthetic or moral judgments. I'll just say, as someone who is an amateur artist, who is very unskilled at the particular thing I do, which happens to be painting a small miniature models three years ago. and is now slightly more skilled these things are pretty wonderful for like going up skill curves due to the

18:05

what the comms teams, for whatever reasons, call multimodality. And what I call it, you can just take out your iPhone, take a picture of anything, and then ask questions about it. And so I think that works. disgustingly well is take a picture of something and then ask like motivated questions like for me it's here's a work in progress you know it's a dragon or whatever and

18:27

I know enough about the art to know that my goal is to have more contrast on this model than it has right now. It does not have much contrast right now. And I don't know what to do to fix that. That is not obvious. Can you tell me? Sometimes... I do know what to do to fix it, because I've tried it, and because my skill at execution is limited, it just hasn't worked very well. Like, here's a piece I have. I've done X, Y, and Z. That's the procedural history.

18:51

I don't love what I'm seeing right now. Here's the reason why I don't love what I'm seeing. There was a troll, and I'm like, okay, it's too dark, and I don't want to push it to cartoony levels. I don't want this to be a blizzard troll, but I do want it to be green. tell me what you do and it gives great answers to like questions like that about pictures

19:09

And so simply like choosing to use the kind of tough to discover buttons on like attach a picture, attach a screenshot, et cetera, is much more useful than people would like discover by themselves. Yeah. You can also just like. point the camera at it for the live feed for a few of these like product astra for google in particular and get the same result and i have in fact found it to be useful in practice for just navigating like stupid questions in various forms

19:36

Their ability to OCR text out of images is also extraordinarily good. Better than their ability to produce text in images until the very recent history. The thing that I do frequently is rather than figuring out, okay, what's the step to export this data from the website I'm looking into like a CSV to be able to upload it successfully, like just...

19:58

take a grab of the screen, a grab of the graph I'm looking at, et cetera, et cetera, paste in his image. All right, tell me what you see here and then operate on it. I think you broadly get... better results if you say, tell me what to see here and then ask questions versus simply asking it to operate on what it sees here. Because it seems to me that the process is creating some internal representation of what they see and then operating at it.

20:21

the internal representation has less fidelity than if you explicitly ask them to verbalize the representation. But I don't know if you've seen the same thing in your usage. I haven't experienced that problem, but it also isn't a thing that comes up for me very often. Cool.

20:35

writes other things. Recursive use of AIs, so AI to write output that you put through another AI, I feel is a somewhat powerful technique. Have you used this before? Yeah, like Opus wrote it on System Prompt that I've been using.

⁠¶ Recursive use of AIs

20:50

because I had a very specific goal. I was just, stop glazing me. No more sycophancy. This is the only thing I have a serious problem with with this model. I'm going to develop my entire system prompt with a few minor edits. to solving this problem. And I'm going to ask you, how do I hammer this? And that seems to have helped. It definitely could use more. So I'm going to try punching that up a bit again every so often. But yeah, often you'll like...

21:16

A classic thing you'll do is you'll write a deep research report, and then you'll feed it into either the same or a different LLM and tell it to summarize the research report. I typically find I get much better output from LLMs when... I, rather than asking a one-shot question, like for example, here's a draft of an essay, give me comments or find all the spalling mistakes or similar, structure it more like the conversations that they have been.

21:39

trained to do well. And so for the task of reviewing an essay, I often paste it in one paragraph at a time, which is less time efficient. But the thing you can do at the end of any sort of conversation is, okay, recap this conversation for me. and tell me what prompt would have gotten most of the value out of this conversation without going through this winding thing. And you can like...

22:01

introspect yourself, should any of that prompt be in my system prompt? Or, you know, if you're not willing to do a few seconds of introspection, well, good news, cognition is free now. Just ask the LLM, what of that would you promote to my system prompt if you could?

22:14

and then make a decision on its recommendation. Granted, they just love telling you what they think you want to hear, and so they'll probably buy some direction in promoting something. That's what I ask, but the first thing you have to fix in the system prompt, and then you can fix the rest of it. Any behavior you don't like, you can just tell it.

22:28

stop doing that and then suddenly no i really mean it stop doing that no i really really mean it and repetition sometimes works with them in in ways you wouldn't expect and things like formatting it starts to make sense if you consider like

22:41

if you understand what they're doing and then like you ask like how would this likely continue from here at this point would it be likely to double down would it be likely to understand that no you really need it and then like to actually stop doing it you know maybe you want to start a clean slate and then like start with a better way than

22:56

of expressing yourself it's often true but yeah like basically the way that you i think improve your system prompt essentially is every time the ai does something you wouldn't have wanted it to do you have a mental habit to ask, like, is that because I prompted it wrong? Is that because system prompt could improve? Is there something...

23:14

I could tweak about this model to make it better. So the thing that's been reported to me by a few people who have actually shown me their interactions with AIs is this is not addressing me on quite the right level. And I think they have fallen into a pattern where... Because the interfaces look quite similar. They talk to the AI like they would talk to a buddy of theirs and the buddy has, you know.

23:33

years of context on their personality and etc their own system prompt if you will where the ai has just the input that you've given it and any explicit system prompt and so The words you use kind of matter. Like if you speak in an erudite kind of fashion where it essentially identifies you as a grad student even without you explicitly saying that you're a grad student, you're going to get like a gradual level answer back. And if you talk to it like, well...

23:57

yo dude what this then you're going to get something which felt like naturally completes the sentence oh bro it crazy blah blah blah i don't even know if that's right maybe i'm hallucinating yada yada you know if you personally are just not the kind of person who writes text-in-text messages at the level that you want to be answered at. Again,

24:19

Recursive use of these is stupidly powerful. Rewrite the following one to two sentence quick loss as if it has been asked formally by a professor on a test. Copy-paste output into new window. Here's the box. and then you will get a professorial level answer. Yeah. The transforming text into similar text, especially transforming more text into less text, is one of the best things elements do in terms of their skill level.

24:46

They are scary good at style transfer. And there's still gaps. Imitate the style of an author that I really like. The more you like the author, the less happy you will be with the output, and the better a writer you are. You'll see the seams more, but they're scary good. Like, this recipe for blueberry pie and the style of the Shakespearean sonnet is a thing that they've been able to pretty much nail for a while now. And will it be the best Shakespearean sonnet ever?

25:11

Shakespeare has, you know, a few claims to fame there. Will it be the best Shakespearean sonnet about blueberry pie you've ever read? Yes. And these days it will very plausibly not invent too many of the steps in making a blueberry pie. What else would you tell someone who is like, okay,

⁠¶ Addressing LLM frustrations

25:25

I've played with these tools. I agree that they're kind of powerful, but yet I'm frustrated a lot of the time on the basis of experience. I mean, I'd ask them, I guess, exactly what the frustration is. A lot of people have very different frustrations with them. A lot of people express frustration that it's not sufficiently reliable for them. And that's probably the number one thing I've heard this weekend. So my response to that is you don't want to put it in a situation.

25:54

in which it will try to hallucinate or it will have the impetus to do so. So what that means is why does a modern current top LM hallucinate? Well, mostly it hallucinates when... the natural continuation of what it's saying would involve knowing a piece of information, reciting a source, or otherwise, like, filling something in in a gap. But it doesn't know what that's supposed to be. And when it doesn't know, it's failed to find.

26:23

then it has this impetus of, oh, I'm supposed to put something here. Nothing I know of fits, so it makes something up that would fit. And now you have a hallucination. And that... in my experience, is by far the most common way for that to happen. So there's an obvious way to avoid this, which is to not put it in that situation, and ask it for things in ways that you know that it has an answer to them, or it has a natural out, where it can like...

26:50

expressed that doesn't know in a sort of free-falling natural way another thing you can do is check its work so you can check its work yourself but i don't really see why you have to do it i want to do this for a reason you can do this iterative thing

27:04

with the output in that way, right? Like in our recent talk, like we were talking about the possibility of, you know, just saying at the top of an O3 output in particular, I checked this and there were no hallucinations and it is a big deal. Well, you know, who else can check for hallucinations?

⁠¶ Checking for hallucinations in AI outputs

27:18

Opus? yeah here's an output i got from a less capable llm critique it works pretty well and it even works across modalities like a thing i do and i will either put an image up on screen or drop a link in the show notes recently just as a lark was dolly felt kind of magical three years ago when it

27:34

came out now and the outputs were filled full of artifacts but you know i saw the best painting of a bunch of rabbits attending a seminar on human anatomy that i'd ever seen in my life you know seconds after giving that prompt over to uh the model recently i said okay like here's the prompt here's the image it got back quickly start artistically and got a several thousand words back about what it would definitely not do these days and i'm like great on the basis of that critique

28:00

generates something for me that is responsive to that prompt. And I don't necessarily know that asking for critique and then asking that now you try again and do it better works very well, but it's very cheap to try. That's something that I would really love to emphasize for folks.

⁠¶ Experimenting with AI models

28:15

cost in like the typical you are typing into one of these like the marginal cost is just your time basically and maybe possibly some usage or quota but yeah people talk about how these things are so expensive sometimes But compared to any other form of cognition or art generation or anything similar, they are absurdly many orders of magnitude cheap. If you are reading the outputs, the price is effectively zero on the margin.

28:41

and you should treat it that way and you should run a lot of experiments and like almost nobody is properly experimenting to see what they can generate considering the cost benefit i definitely am not doing enough yeah like the

28:56

The obvious thing to do is if it doesn't work in, you know, Frontier model of your choice, A, immediately run it on the competitor's thing and see if you like that output more. Like you can literally just copy paste the same prompt. But, you know, there are various things to try, like... generally speaking that, okay, it works great 30% of the time that I use it, then ask the question three times and you're...

29:16

You know, possibly it's falling into one of those failure modes just because of the nature of the question. Possibly you just rolled the dice wrong or you got the bad result on the dice. But the law of large number works just like... you know ask it three times in slightly different ways and you'll get slightly different outputs and maybe one is like transformally different on the margin for you and if not

29:34

okay it cost you like 60 seconds to to rule that out versus like the alternative might be a phone call to your lawyer that you're billed at at you know a hundred dollars every six minute interval you also have like

⁠¶ Optimizing AI prompts and outputs

29:46

five different windows open. You can literally just paste the prompt into all the different windows and come back. And for research in particular, I'm trying to just build the mental habit of like, no, if you ask a research style question, you ask all of them. You always ask all of them, why wouldn't you? There is no downside to doing this. Maybe you don't even check the second and third ones if the first one's good enough, but why not generate them so they're there if you need them?

30:10

When you say ask all of them, is that ask all the models that have deep research? Yeah, the big three. So when you're just satisfied with an output and you can articulate a reason or even gesture at a reason why you're just satisfied with the output, immediately tell it that you're just satisfied with the output. And they will often index quite highly on that in an attempt to give you what you want. And now...

30:32

If you tell it that you're dissatisfied with physical reality and you want it to pretend that the sky is pink, a lot of them will go quite in a pinkish direction for the sky for you. But there's a... classic joke on twitter about trying to take all the shrimp out of the novel all by it on the western front which contains no shrimp and the l and being driven to madness in this one instance

30:54

And it both works as human. Also, you feel really badly for the character that is the LLM in the moment. But if you're not doing this in that sort of crazy and unproductive fashion, if you just say like, okay, I think you're not considering it enough from this angle.

31:08

Can you think much more about this angle? Then you will often get a new output, which we'll consider that angle a lot more. And sometimes it overweights on that. Then you could say, I think you overweighted on that correct back. And, you know, he's our...

⁠¶ Using AI for writing and editing

31:21

very replicable strategies for getting value out of these wonderful magical new machines. Yeah, that sounds right. So, you know, there's a fact out there in the world that I want, and I want you to find it for me, where the fact is not factually exist is a great method to get.

31:36

a result that will dissatisfy you. Are there other like pits of failure at the moment where we can look back in two years and say, oh yeah, remember that thing that they used to do all the time that they no longer do? Leading questions where it's obvious what answer you want.

31:49

so it's going to be inclined to say great idea boss it's going to be inclined to agree with you as it is for most of the major models not necessarily all of them you definitely don't want to encourage it so like one thing i've learned is you know if i wanted to critique a piece of my writing

32:04

I don't tell it that it's critiquing me. I just tell it's critiquing a piece of writing. Because that's how you get a realistic response. But having to guard against that is probably going to become, I hope, less of an issue. Yeah, I think it is probably increasingly likely that they're doing steganography or the opposite of steganography, what do you call that? Stylometric analysis or something much weirder than stylometric analysis, but which has the same payload. And so...

⁠¶ AI as a research and writing partner

32:33

Not telling it that I wrote something is probably not effective at disguising the fact that I wrote something. But only if it has memory. If it knows who you are in detail, it's going to be able to figure it out in some sense. But also it might not be salient to it.

32:46

It just might not notice that it knows. Right. And one would assume that, you know, the things you told it most recently are most salient to it. And mentioning that this is the boss's work, racist saliency versus, oh, yeah, this is kind of the boss's work, but...

33:00

in conversations where people say this is the boss's work then right you can also look at it as if i tell you it's the boss's work i'm emphasizing you should act as if it is the boss's work if i don't tell you in fact i explicitly seem to be avoiding telling you i'm giving you the subtle clue that I don't want you to do that. And you might realize what's going on, and maybe you'll do it anyway, but it's a much smaller force.

⁠¶ Prompting AI and humans effectively

33:26

It is a terrifying and true observation about the world that LLMs are plainly superhuman versus everyone in an increasing number of domains, and then they're just better than... A lot of people add a lot of things, and picking up on subtle signals is something that many of us are pretty bad at. I would put myself in that set, or at least would have for a lot of my life, for putting in a lot of practice. All of them are really good.

33:50

at reading some subtle social cues in text. So encourage them to read the right social cues. By the way, you can use them as sort of a prosthesis, like if they're an... absurdly useful thing if you are worried about have i implied something in this piece of text that i do not mean to imply is to like copy paste the piece of text and say have i implied anything here and what

34:12

You know, what is the subtle cue you would get out of it? There were various, let's say, professionally significant essays that I wrote recently where I wanted to be extremely careful about what exactly I was saying. And I would step through on a paragraph-for-paragraph basis and say, okay, what exactly did I say here? And then ask questions like, adopt the perspective of a sophisticated journalist at the New York Times. Would that journalist believe that I am implying X?

34:40

and you know frequently get the no answer that i hope for and i i would throw in some other ones to say are you just telling me what do i want to hear like am i really implying this about my tone where i am and you know seem to have a decent calibration on that Right. You always want to have tests where you want to make sure the behavior actually happens when it's supposed to happen. Yeah.

35:00

Reading the tone from things you have written, I don't always find that they nail it, obviously, but using them as a sparring partner or just riffing with them is more powerful than I would have expected. I once gave one a few paragraphs of an essay that was going to be significant. and said, what tone is this author going for with regards to the subject of these paragraphs? And I remember two words of what came in now, but withering contempt. And I'm like, oh, wow, withering contempt.

35:29

Hmm. Do I really feel withering contempt with regards to this institution? I do feel it. I don't want that in the output. let me tone it down a little bit. But it hadn't been obvious to me that I was showing that much of it in the pro sample I'd give up. Yeah, that's one of the ways in which I most often edit. my writing, is I write a draft, and I go, okay, that came off really hostile. I am making a lot of very specific accusations against this person. They're true.

36:05

They're very obviously accurate, but that's not the implication I would like to give. I would like to extend an hour branch. I would like to, you know, pretend to feign ignorance. that this situation is what it obviously is, while making it very clear to anyone who's paying attention that that is in fact the correct answer, in my opinion.

36:29

But leaving room for a line of retreat, right? Like giving these people the opportunity to do it otherwise or to respond positively. And therefore you go back and you go, okay, yeah, I didn't... actually, can I say that that way, like, you know, including, like, you know, well, there's two ways to interpret this. In one case, I was writing an article about another author, a thing authors spend a lot of their time doing, and ask the alum, you know.

36:57

I'm attempting to hit a narrow bullseye here with respect to this person's worth, please tell me if I'm hitting this bullseye, where I want to be critical, but not hypercritical, and I definitely don't want to cross over the line into personal attack. And the LLM made a decent point, which is like, okay, you've mentioned X and Y and Z, which are all true, but at that point, it seems like piling on. I was like, oh, that's interesting. If I was...

37:22

someone who felt a real sense of editorial restraint at the New York Times. How would I have the same factual payload without piling on? And it said the New York Times would show editorial restraint by doing the following edits. or the following way to communicate the same stuff which was much less specific like less numerically oriented etc etc and whenever that i thought okay

37:46

It's a good thing I don't actually work for the New York Times. If my writing had that style all the time, I would hate it. And it doesn't even matter if the LLM is correctly predicting the New York Times' editorial line. You're just writing co-partners at the moment. I thought, okay, true.

38:00

of these things like feels like a bit much i could just take it out the piece reads well without without it and took it out and i often find that when you're in that rewriting process of things and in a dialogue with the element if it is seen the first version and to see in the version that that came later as a result of you doing internal rewrites internal meaning you know you thought in your own head about the thing and then decided you like some words more asking it

38:27

So I've made some edits to this paragraph, obviously. What do you think of them? And often it will tell you what it thinks you want to hear. Yeah, it's stronger because X, Y, and Z. But you can say like... I'm balancing a few goals with these edits. What do you think my goals are and how do you think it's doing on them?

38:46

much better than you would expect with regards to telling the discursive difference between two things that like paragraph length between them and then it will often tell you yeah you're achieving all your goals but if it completely mispredicts what your goals are with the edits that's sometimes something of a signal

39:01

And again, if it's a signal that is obviously wrong, you've only wasted 30 seconds of your life during the editing process. It's not very different from how you would deal with a human in this situation, right? You don't want to tip your hand as to what you want to hear.

39:15

And you want to make it so that if the person is in fact trying to tell you what you want to hear, or the AI is doing this, then they will get the answer wrong. And you'll be able to know that's what happened. But yeah, I would anticipate that it'd be very good at these things.

39:29

detecting like very discreet little changes and what they anticipate what they mean that kind of thing should be very easy for part of my professional life for many years when writing in a more corporate voice like taking drafts past people asking them for comments etc etc and

⁠¶ Balancing AI assistance with personal voice

39:46

i won't say it is the best writing partner that i've ever had because that is incredibly untrue but it is certainly like relative to skill level the cheapest that i've ever had because you can like bug these things at any hour day or night for free where you certainly wouldn't want to like wake up the head of your department at 3 a.m in the morning or whenever you do your best writing I don't know to go through something that you were puzzling on in a relatively typical email.

40:11

Another thing I found kind of useful, give it part of an essay and then ask what comes next. They are scary good prediction engines and you're... Just using this to get your own edges flowing. But when it successfully tracks your argument, it will often successfully track the next part of the argument and, you know...

40:29

I think the next thing to come is a methodological, you know, deconstruction of the facts of the journalistic investigation you are critiquing. Like, that isn't exactly the next paragraph. I'm doing something right. When it goes off into, like, deep left field... you know then you can have a moment of introspection like okay is it going off into deep left field because

40:50

my synthetic narrator here is not always reliable or is it going off into deep left field because no really my organization this essay could be better like there was an obvious thing to talk about next yeah i mostly don't do this because i write a very unique style where I'm often jumping from point to point that aren't necessarily that related and there is no possible way that I can know what is coming next. I noticed there's kind of a...

41:16

danger in my head when I hear these strategies of like, is this going to cause me to lose my nerve or my voice? Is this going to cause things to become more generic? And I noticed that I often don't ask questions and like, I certainly could. run everything I post. I read a post right before coming here. Letting kids be kids. And certainly there was a lot of interesting statements in that article.

41:45

There are certainly, I could have asked the LM a lot of these types of questions. And something in these, like, don't do that. Just post it. Keep your own voice. Keep your opinion pure.

41:56

don't like kind of smooth off the edges here yeah that would be a good idea i definitely think they they do try to round things in a more smooth off the edges make it more like the least common denominator writing etc etc that's been all but a hallmark of the writing style and to the extent that people are coming to you for your own voice for your own opinions for your own writing style don't lose that i would also generally say like i do not run anything like everything that i write or

42:22

Not every essay and not every word in every essay has the LLM. There's no reason to. It seems that there are many users who are not writing anything except through a dialogue with an LLM.

42:34

I feel no sense of moral judgment on them. If that works for you, great. But you're probably not a professional writer. If you are a professional writer, there are things you write where the first thing you come up with is going to be fine, where you wouldn't get an editor involved if you had your druthers. And there are things where...

42:48

you would tell someone i only need line edits on this i'm pretty you know i'm sure of the facts i'm sure of the the general argument and then there are other things where it's like no really you know writing is a form of thinking and you do thinking over time and i'm writing this to get a reaction from you and for those things like get the first reaction from an lm4 you get the first reaction from a human almost every time given that there's you know

43:10

a cost either a dollar cost or a relationship cost of getting a human in the loop like you're apt to a particular human yeah yeah in software engineering we often call it rubber ducking where prior to disturbing a senior engineer we have a little rubber duck in the room here

43:24

and ask your question to the rubber duck, because often asking the question to the rubber duck will cause you to immediately realize the answer. Like, asking the question to the rubber duck that is the LLM will often cause you to realize the answer to it also.

43:38

this rubber duck can quack back. And sometimes the quacking is absolutely noise, but sometimes it's the answer you wanted. And so you can, you know, optimize out the interaction with the senior engineer or the executive or your lawyer or similar, the person who is valuable. Now granted. There's reasons to chat with people about your work that are not, strictly speaking, attempting to improve your work. Still have those chats, but, you know.

44:00

You are better for yourself and you're better for the other person if you don't ask them for the answer that they will obviously give you in the first 60 seconds after hearing your question. Aside from rapport building and social stuff, et cetera, et cetera.

44:12

I don't think either of your lives are really improved by the first 60 seconds. So just skip it and move on. Or you ask questions to which you already know the answer. Yeah. Similarly, particularly when you have interactions that have a really high cost social or...

44:27

or monetary prior to firing those. I think it's worth like some tiny portion of that cost asking the LLM, what should I ask here? So I think people who have paid for lawyers over the years understand that if you ask broad open-ended questions to your lawyers. you will get a very long discourse, which consumes a lot of billable minutes. You probably don't want to prompt a lawyer with, like, here's an essay I've written.

44:51

do an issue spotter, because lawyers are trained to be very good issue spotters. Sorry, this is a bit of jargon. And a thing you can do within learning the practices of law is to get told a narrative of a case or a commercial history or something. and say, all right, use your best recollection of all the relevant case law and statute and similar, and point out all the things in this procedural history.

45:14

which you could possibly comment on. Lawyers are trained how to do this. They are very good at it. And so if you give lawyers a prompt, like just like, here's unstructured output to an issue spotter. they will give you exactly what you want which is often not what you need and so you know pre-processing that with okay here's the document i want a lawyer to comment on what do you think are like the high salience questions and then tell your, you know, extra human buddy, like,

45:41

Here are three questions that I particularly like your input on. Bang, bang, bang. Also, if there's anything I'm missing, let me know. Then they are often much better oriented and it's a better experience for them too because, you know, sometimes skeptical business people think like, oh. lawyers are only giving me this non-actionable, very verbose advice because they want to run up to billable hours.

46:02

No, they don't have telepathy, and they don't know what you want. And so if you ask them, like, okay, anticipate all of my possible needs, they'll do their, like... best job to attempt to do that and that will burn a lot of a little from column a little from column b yeah but you know prompting humans is just as important actually vastly more important right now

46:25

as prompting ais and you know walking into this conference you definitely get the impression of you know what are the most important things to do is to have good prompts for like what do i want to say to people what do i want to cause them to talk about

46:38

What am I trying to accomplish here? And you can almost see, but when you don't have a good prompt, you see what happened. You're like, oh yeah, I see how that happened. That's entirely predictable. I should not have done that. Or you're not steering things in the directions you wanted, or so on. But...

46:52

If you think about your conversations that way, right? Like, what prompts are you being given? What prompts are you giving back? You know, what does this prompt indicate they want back from me? Not just like, what is the natural continuation of it? Then...

47:06

I think I've improved my interactions with people by thinking more about things this way as well. I think I would broadly agree with that in a way which would have been surprising to me a couple years ago. A few years ago, they felt like really...

47:19

subpar communicators and how can you get better at working with good communicators when you yourself are a good communicator by getting more reps in with a subpar communicator but you know how can you get better as a like middling skilled painter by painting a lot of like low skill

47:34

required thing as well if you haven't totally mastered the mechanics. Like, more reps in it that you can get for free very quickly is a very useful thing to have. And that generalizes across all sorts of domains. There's, I don't even want to...

47:46

pretend that i can quote exercise lore here because i'd be obviously hallucinating it but you know there's some of the amount of time where you know you are just going for attempting to push the boundaries of your own capabilities and sometimes where it's just like no

48:00

You probably haven't put in enough time at mastering a basic skill, like just put in more time. I don't think people, particularly people who haven't used these extensively, appreciate how much of an unlock it is simply that they've... removed like the rate limit on cognition and the rate limit on conversations that you could have because there's an always on tap of them that costs basically nothing. It does cost your attention. Yeah. And you have to train yourself to think of it as free.

48:27

And then also train yourself to realize it's not free. So like when deep research came out, a lot of people were like, this is amazing. I have all these new 10 page reports, these 30 page reports, these 40 page reports. And I found it useful in some particular circumstances.

48:41

But I found it mostly useless. And the reason for that was I already had all the input data that I needed. I had more than I could process coming in from my systems that I'd set up. What I didn't have was time to process it all. So generating a giant report that was kind of slop, but had some good stuff in it, is like, okay, this isn't very good signal-to-noise ratio. This isn't something I want. I only want to do the deep research thing when I have a very specific, like...

49:08

I need you to figure out this particular thing. I know then how to scan this thing very quickly, or I can potentially, although I don't generally do it, fit it into another element to extract the information I actually want from the report.

49:19

I think this involves unlearning some habits that a lot of us have had coming up on the internet where... writers tend to be very broadly read many of us are a bit pack rats with regards to things that we've read in the past and so you know maybe i'll drop a link in the show notes people want to read but there are some like really great things out of the federal reserve in

49:38

it's either st louis or kansas back in the early 2000s about the phenomenon being un or underbanked and so many of us have like you know carefully gardened histories or, you know, areas in Dropbox, etc., where we kept information for forever. Definitely don't do that with respect to LLM outputs. You're kind of doing that, right? Like, you're into LGBT.

49:59

You're asking questions because it has the memory thing. And then you have this series of chats that were like kind of throwaway things that don't really matter. And it's going to remember them forever. And it's going to use that as potential context and like sculpt.

50:11

your future interactions forever the same way that youtube is remembering all the videos you watch and in both of these cases you might want to go into history and clean it up a bit if there's things that you like on reflection think you're sculpting things in a bad way yep

50:24

But you probably don't need to go into pack rat mentality. And there's only been one useful thing that I've ever read about the phenomenon being unbanked. And so I want to make sure I have a copy of that PDF for the rest of my life. Like.

50:35

just regenerate at need versus optimizing for yeah i i think that i could do better at taking notes about these things and putting them in in good places and i do want to have like these things available but mostly i just feel like i can find anything again if i need to find it again and so i haven't worried about pack routing but there haven't definitely been times where i'm like where was that thing and i can't find it on sad okay yeah

51:01

It sounds great. So Zvi, thanks very much for taking time to chat about some of the strategies today to help folks find them valuable. And where can people find you on the internet? You can find me at t-h-e-z-v-i.substack.com.

⁠¶ Wrap

51:16

Awesome. Thanks very much for being out today. And for the rest of you, see you next week on Complex Systems. Thanks for tuning in to this week's episode of Complex Systems. If you have comments, drop me an email or hit me up at patty11 on Twitter. Ratings and reviews are the lifeblood of new podcasts for SEO reasons, and also because they let me know what you like.

51:37

Complex Systems is produced by Turpentine, the podcast network behind Econ 102, Riff with Berne Hobart, Turpentine, BC, and more shows for experts by experts in tech.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Summary

Episode description

Transcript