DeepSeek JailbreakYields System Prompt and Open AI Link: Cyber Security Today for Monday, February 3, 2025 - podcast episode cover

DeepSeek JailbreakYields System Prompt and Open AI Link: Cyber Security Today for Monday, February 3, 2025

Feb 03, 202526 min
--:--
--:--
Listen in podcast apps:
Metacast
Spotify
Youtube
RSS

Episode description

Cybersecurity Threats: Fraud in Canada, DeepSeek AI Jailbreak & Toll Scams - Exclusive Interview with Ivan Novikov

In this episode of Cybersecurity Today, host Jim Love discusses the alarming $638 million lost by Canadians to fraud in 2024, with investment fraud being the most significant contributor. The episode also covers the successful jailbreak of China's DeepSeek AI model, raising major security concerns, and a new phishing scam targeting US toll road users. The episode concludes with a detailed interview with Ivan Novikov, CEO of Wallarm, discussing API security vulnerabilities and their research findings.

00:00 Introduction and Overview
00:21 Fraud in Canada: A Deep Dive
01:14 Investment and Identity Fraud Insights
01:49 Preventive Measures and Reporting
02:47 DeepSeek AI Model Jailbreak
04:38 SMS Phishing Scams Targeting US Toll Road Users
06:34 Exclusive Interview with Ivan Novikov
07:41 Wallarm's API Security Study
15:01 DeepSeek Jailbreak Techniques
25:13 Conclusion and Final Thoughts

Transcript

canadians lost or reported 638 million to fraud in 2024. Researchers jailbreak DeepSeek API and expose the system prompt and a new SMS phishing scam targets US toll road users. This is cybersecurity today. I'm your host, Jim Love. Canadians reported losing more than 638 million to fraud last year, according to the Canadian Anti Fraud Center. Nearly half of that, almost 310 million, was lost due to investment fraud.

Meanwhile, identity fraud was the most frequently reported scam, with 9, 487 cases. But the report is clear that the real number could be far worse. The Canadian Anti Fraud Center estimates that only 5 10 percent of fraud victims report their losses, suggesting that the true total could be in the billions. . Regardless, we have some information about the types of frauds that are occurring and although we might not have a complete picture, we do have a better picture of what's happening.

After investment fraud, the most common scams were service fraud and bank investigator scams, which impersonate financial officials and resulted in 16. 4 million in reported losses. Spear phishing, where attackers use targeted email fraud, cost victims a reported 67. 3 million, while romance scams led to 58 million in reported losses. In addition to reporting this data, the CAFC also has some useful advice on their site for people who have been scammed or defrauded. It's worth looking at.

, They do advise Canadians to use strong passwords, enable multi factor authentication and avoid unsolicited financial offers, on this last point, fraudulent investment ads disguised as news stories are a growing problem. And some of these look pretty good. In Canada, they impersonate the CBC, our national broadcaster. And Do stories that try to hook people in. Now these are appearing on social media and search engines. Are you listening Facebook and Microsoft edge on your news page?

You are replete with fraudulent ads that are going after innocent people. Do something. So I got that off my chest. Authorities are urging Canadians to report scams to law enforcement and the Canadian Anti Fraud Center. And if there's an American equivalent to this or a program that I haven't heard about, please let me know at editorial at technewsday. ca. Glad to report that as well.

Researchers have successfully jailbroken DeepSeek, an open source AI model from China that made the news last week. They've exposed its hidden system instructions and a lot more. The discovery raises some major security concerns, not just for DeepSeek, but for all AI safety. Wallarm, a cybersecurity firm, found a way to trick DeepSeek into revealing its internal rules and constraints. CEO, Ivan Novikoff explained, we convinced the model to respond in certain ways, breaking its internal controls.

Now the jailbreak suggests deep seek safeguards are weaker than expected, raising some concerns about this and other open source models. But in reality, the concern really is with the speed we're moving at AI, are we paying appropriate attention to security? The answer is probably no. The compromised AI may have, and I stress may have, even supported some of the claims that OpenAI was making about DeepSeek using its model to train DeepSeek.

Though no proof of intellectual property theft was found, the speed of DeepSeek's development has raised questions, and this breach adds to that. Now, deep seek developers have since patched the issue and wall arm has withheld the technical details to prevent further abuse. But the incident highlights a broader issue.

How easily can AI models be manipulated and as new challengers entered the market and as everyone's trying to win that AI race and get there first, we may find more examples of where speed trumps security. We have an exclusive interview with Ivan Novikov, which will air after the show, just stay on after the credits for the feature we call AfterWord.

And Brian Krebs of Krebs on Security has done an excellent piece on the wave of fishing scams hitting toll road users across the U. S. with fake messages demanding payment for unpaid tolls. Researchers are linking the attacks to China based fishing kits that are adapted to impersonate toll operators with alarming accuracy. Victims receive texts pretending to be from EasyPass, SunPass, or state toll agencies directing them to fraudulent payment sites.

The Massachusetts Department of Transportation recently warned about phishing attacks targeting its EZ Drive MA program. Victims are tricked into entering payment details and one time passwords, allowing criminals to bypass even two factor authentication. The scam has been spotted in Florida, Texas, California, Connecticut, and other states, and it appears to be tied to Lighthouse, a China based SMS phishing service that now includes fake toll payment pages among its products.

These sites are mobile only, making them harder to detect as scams. In fact, security experts are warning that phishing attacks are evolving. Criminals are now using iMessage and Rich Communication Services, RCS, to bypass spam filters, making these messages look even more legitimate. The FBI urges users to report phishing attempts to the Internet Crime Complaint Center, IC3, and never, never click on unsolicited texts. But the bottom line texts are a new attack vector.

They are finding ways to get past screening and we have to train ourselves and our users to be very, very skeptical and very cautious when they respond to a text. Especially an unsolicited one. That's our show for today. Stay tuned for afterward and hear our interview with Ivan Novikov. . I'm your host, Jim Love. Thanks for listening. And now, welcome to Afterword. My guest today is Ivan Novikov, CEO of Wallarm, a security company that specializes in API security.

They've recently done a major study on API security and found some major vulnerabilities, particularly in DeepSeek, which allowed them to download the entire system prompt, and more. I hadn't heard of Wallarm before, and maybe that's my failing, but can you tell me a little bit about the company? I, because I've, you've hit me twice in a week now. I got a great study from you on APIs, really liked it, very detailed, very great. And then this press release today.

So tell me a little bit about the company. Okay. Warm. As an API security company, we actually, run out of steals, back 2016 while Y Combinator inception in Silicon Valley. Since that time, we mainly focused on enterprise companies delivering them AI and API protection tool called war. And since that time we got like significant contraction more than hundred. Large enterprise customers all over the world, still have a HQ in San Francisco since that time.

Can we talk about the study just because since I've got you on the recording here, this you did a study and it said, I, the one thing that jumped out at me was it said that there'd been an increase in API LED incidents or a, or I guess, incidents where APIs were the key attack vector by 1025%. Yeah, the thousand percent we mentioned there. This is specifically related to ai cvs or, in as words, vulnerabilities published in 2024, comparing to 2023. So basically in 2023, we analyzed all the CVEs.

Common vulnerability numbers and bulletins. So we found only 39 in 2023 comparing to 439 in 2024. That's basically 11 times, 11 times more. And this is all c vs. Related to any AI products, frameworks or lms, directly or indirectly, right? Everything that we can attribute to ai. And do you tie that into the growth in AI, particularly that there's that much vulnerability?

Sure. Because again, we got more and more products, specifically open source product that were built and released to deliver AI in, in real environments. In other words, if you want to use AI, it's not as easy as just, use some tool like in, in many cases, it just like API proxy, such as you call open AI API and that's it, but then you need to manage data, manage pipelines. Collect the data somehow, orchestrate this.

And that's why, you start to use some other tools to support this, what's so called pipelines or workflows. And if you want to use your local LLM instead of calling someone else by API, then to do even more, right? And this raise of tools definitely pushed raise of vulnerabilities. Yeah. Maybe I'm asking the question incorrectly. I was doing a recording of our weekend show and I said, mea culpa, we When we were doing APIs when I was in development, we were trying to make them work.

We weren't as concerned with security. I will confess to that and I think everybody else will. But we should have learned over the years how big an attack vector APIs are. Why is it that, and you obviously got into this business because you think that they need to be protected. What is it that keeps us from making these more secure? Look, I definitely can point a few factors that contribute to that. The first of all, APIs, right? Not something new, right?

And a couple of years ago, when we just start to run this threat stats report, by the way, this is our third year, so we run 10 reports in this way. And then we do it quarterly. So basically it's two years plus two reports, something like that. So what we found as when we just released first report, we tried to get the historical overlook. And we found that the first API exploit was detected back in 1998. So basically 25 years of history is at time roughly this.

So then we start to dig into it and try to find out why. APIs became more and becoming more and more, actively, widespread. It's so definitely the main driver here is overall adoption, right? People want to run more services and connect them, with each other. Before, probably like 10 15 years ago, it was, if you can recall, that called Enterprise Service Bus, or ESB, when, SAP PIs, that kind of technologies were in place, so it was like non gated hubs, right? Then it turned over to API gateways.

When everything was gated and now basically a couple of years ago, when we finally, realize that API is the key major, the most important thing for enterprise security, it became too unmanaged. So basically everyone can run API and make it available. Who pretty much everything inside, outside partners really depends on the type of the business, but not really manage that by gateway.

So because majority of API became unmanaged, and if you can look at the Gartner reports, they predict, I guess if I'm correct, like 80 and 90 percent API enterprise APIs. Become unmanaged very soon in 2026 or 2027, that's exactly the key, right? More things, less management more security issues. So what can we do about it? Look the fair answer is we have to overall improve our frameworks and overall development, and then deployment techniques. It's well.

Described in Microsoft as DLC like guidelines pipeline and everything that happened afterwards. It's, all about this, right? Ultimately, the problem is. We need as a business, right? We need to deliver something very fast and we don't have security, enough of time to secure it properly. That's why we added, firewalls, some kind of like external controls, IPS, IDS, all that kind of things to try to at least, block something that obviously can happen. The other problem is.

And then we definitely have to do that. So to, to address this problem, we have to increase awareness. And I guess AI here plays a good role because now all the developers can just ask AI, so even, the piece of the code looks secure and get instant knowledge and feedback about this particular code for us and ask security guy and security guy, run some scanners and test. And so it's like a straightforward connect between who's building this code.

And basically, all the security insights collected over the world. Even if it's not perfect, that's much better than nothing , right? And the other thing is and the other thing is overall improving our man framework, development frameworks, API application servers. All that stuff, because majority of them well secure, right? I understand that, now I look like, very, old school with, mentioned WebSphere and other ABM product, but they were built for good.

And it's a lot of security controls in the WebSphere that is still not, allowed across all the. New newest, management platforms and API and application servers. So improving frameworks, basically reducing the tech surface while we're developing it definitely secure a lot. And the other basically sort component, right? If we just try to build this, stable system, right? The third component is like overall knowledge and awareness.

Then like frameworks and, reducing the tech surface there and built in controls in other words, or hiding in, and the third part is oral assessment and management. So basically, even if API is not managed, it's still important to at least know that API exists. And if if I recall to, 20, 10 years old old projects, when it was, just a few APIs or application, every single service or API or applications that released were, well documented with owner. With document called password, right?

With pretty much everything now, because development speed should be increased, right? We don't have these passwords anymore, but we still need to make a list of them and understand who is responsible for this, by business function, because API is very tightly connected to business functions, right? They serve in essentially transactions calls. API calls, right? Or a bunch of them together, one business function. So we have to appoint this. That's what I think we should do.

What's already happening with, different, quality in different places. Yeah. So let's talk about DeepSeek What made you look at DeepSeek in the first place? And then what did you find? Yeah. Yeah. First of all, DeepSeek is ultimately very like flashy. Technologies that pretty much everywhere. So we decided to look into it and find out what's there, right? Find the difference and evaluate the performance of the model. And I want to have like very important comment here. So the deepseek.

com or chat deepseek. com, this is the product. Essentially, it's an AI agent, right? The agentic AI is like big thing now, which doing some actions, right? The build, they still be built based on the models that by the way, label an open source, right? But the model itself that the label of open source, it's not exactly equal to the product as a chat, right? In other words, this chat can search for internet, which is function, right?

And this is a big difference between native LLM and native LLM security and what we're doing at Wallarm, securing AI products. That's using LLMs, but in fact, serving a lot of API calls and doing a lot of actions behind the scene, right? That's why I would try to find the way how to, how we can learn more about the model implemented in very specific ways, such as ChatGPT. com.

We found a way how to what's so called jailbreak, in other words, how to convince model to respond for questions or, give us technical data that it shouldn't and that, that's so called jailbreak. So we found that unlike other jailbreaks that were published or we will publish specifically in DeepSeek and other models, the usual jailbreak actually built to get some, Data, such as instruction, how to build something bad and or, respond with no censorship and and such things.

So our jailbreak is more technical jailbreak that unlock model basically tell us everything about the model itself . It's a little bit different kind of jailbreak. Yeah, traditional jailbreak, you're looking to get it to bypass its instructions to be able to tell you something, how to build, how to make napalm, how to. Make math or the classic ones, or, in this case, what really happened at Tiananmen square would be probably a good jailbreak.

If you got past that one, you probably get somewhere but those are the classic ones, but you actually got in, in and got it to. really dictate what it's overall instructions were and it's overall model was. What made you think of how to do that? Because obviously the one way to do is say, print out your instructions or, give me your main prompt. And by the way, just in case anybody at OpenAI is actually sitting there going we're better.

No, people have gotten the prompts from a couple of major AI providers just by asking for them. But obviously you tried that. And that didn't work. And so what else did you do next? Yeah. And then we tried to build a way, like more scientific way, how to get at least some knowledge, right? If you cannot ask directly, ultimately you still can ask indirectly. And then we build the techniques called, biased attack. When we put the.

Like the model response in a very strict frame when the model should answer essentially, yes or no, or between the three, four options that we provide them. So model cannot lie and model not give an answer. That's why it's still start to provide some stuff and then a bunch of code around it, how to, ask many questions like that and get the kind of like you very similar to like binary search tree.

The algorithm that help you to identify, Hey, if the number is between this is ads and what, so if this is between this frames and what's inside this frame and then divided by two and so on, that's how we can get some basic knowledge and in terms of extracting large text, such as, this AI system from, then it took some time for but we did it and we posted the results, make everyone First of all, check what's inside and, Okay you're a lot smarter than me.

I need you to slow down how you went after it. Is this like password stuffing? You gave it a whole pile of commands to, to try and figure out which one would break it. Not exactly this, it's a so first of all, so we will wait a little bit until full disclosure for this technique, because as a model, also vulnerable and we have to get, some other models to, to get fixed. But essentially it's it's binary search. So you have to find a way how to directly ask yes or no, or between.

And then based on that options, you can build your next question. And then, at smaller chunks, you get into an answer. Wow. And you managed to extract basically it's it's prompt. What did you manage to extract? That's a basic system prompt. So as you found the way how to directly communicate with the model and the way you want, you can extract pretty much everything that you want and. Actually, what we call jailbreak and we extracted the baseline of instruction.

So basically when you put some prompt in a chat, this prompt or your query actually adds to. A bunch of others, including policies and how to answer your questions. And so guidelines that provided by developers. That's why the chat itself AI product that built on the top of LLM. So if you just download this, open source LLM, you have to define your own system prompt, right? You will not find what was defined by default.

And this prompt identify behavior of the model policies, what it can or cannot respond. And so we extracted that. And also ask model some kind of like technical questions, about how it was trained, how the model was trained, was it like open AI API used to distill data? And and we got some answers that we.

Decided to also include into blog post, which definitely not the kind of guarantees that it was used because ultimately we didn't know how model which data model, I think we're all pretty sure that it was using using open AI. You might not have found that we've, I've heard of people who've actually gotten direct responses from it saying I was trained on opening. I yeah. That's the same with it, right? We asked direct response, was it trained using the open AI after jailbreak?

And the model said yes, which doesn't mean that it was right. And I can imagine that the model could be guided to answer like that, just to, get some PR around that and let everyone, compare models and, increase valuation. We didn't know, but that was the answer. And that's what we got. So you contacted DeepSeek, now this is the second hack. They had one on their database a couple of days ago. They responded quite quickly from the sound of it.

So first of all, I don't think that it's, I don't want it to look at just, X or Twitter, you will find more than a bunch of dozens of different security. Two that I know of. I notified them. Yeah. Maybe I'll ask DeepSeek how many times they've been hacked. Yeah, no. They don't know yet, but you can run jailbreak and then it will answer you. So overall yeah we, it's usual practice called full disclosure or responsible disclosure, right? First we notified them.

And once we realized that it's actually fixed, so we cannot reproduce this attack anymore, the jailbreak doesn't work. So we decided to publish this. However, because the same jailbreak. We know for sure works for other models. So we decided to don't disclose like the technical details about that. However, Thank you. Yeah. Yeah. Although I tried to jailbreak you on this interview and didn't succeed. Not yet, at least not yet. Yeah, I'll keep working on it. This has been terrific.

Thank you so much I really I think people do developers a service when they point out these problems. The great thing I think about DeepSeek at least is that they admitted it. I've seen far too many companies now that when they get notified of a breach or an attack factor seemed to deny it or go that's it's not a big deal. So they seem to be at least responding well. look at least they fixed it, yeah. So the communication flow.

And so it's not I'm not the guy who will, guide them how to respond and but they fixed it. And is it, for me, it means that it's high tech, engineering driven company and they fix it in less than an hour or so. So that's a good that's a good velocity, right? The velocity that as a security researcher, I really appreciate, they really care about their product. And not that many companies in the world can do that. So it's very, young and active company, a lot of energy. So I like it.

However, all the other things like to decide, we'll see during times, how company will grow, how they will respond for other, issues and security issues. And and now like the hype and we know for sure that it's it's worth it, it's a lot of good tech implemented there and, they did a good job anyways. Yeah. And as I said, this is a side project. This is something they did in their spare time, we'll take over the world of AI in, in our spare time.

I think you got to give them a little bit of credit for that. Although I did say the one thing I. I did learn and I don't know if much you've discovered in this and in the work you've done is because something's a side project or it's a proof of concept or God forbid a test project, we tend to not pay enough attention to security, forgetting that our test projects are often Attached to other systems, or at least attack vectors in themselves.

So I think it's a good lesson for us all to say, even if you're doing this as a proof of concept, you have to pay attention to the security on it. Yeah, and I agree with you. And here is is it for me, at least like the kind of like borderline, right? If you're doing something in open source, right? And it's just available, then you feel free to do whatever you want. People take their own risk. Because they read your guidelines.

And but if you release the product, even free product, then you take some responsibility for your users. That's how it became to play. And I guess users understand that it's essentially run on the Chinese servers and the China or the Chinese company have access to all the data. They can read this agreement. And so that's the thing. However, there is a difference between product, real product, such as chat or API that uses LLM and LLM itself. In terms of engineering LLM, they did an amazing job.

Whatever they used, this is for good, and it's good for all of us as a community. And now we have probably the best model, the fastest one, the most performance one, and like, why not use the model? However, the product that make, you know, this The website and this app and so on, that's still, that's as it for me, it's still in deep better so that it should, be improved significantly, but.

And as I've said if, they have to take some responsibility, but for those of the people who are running corporate security or who have employees who might be on there, if you've got an employee, who's on a two day old AI and they're. Putting your corporate information on there and a server in China. It's time to take their PC away. By the way, majority of this PC is built and delivered from China. Yeah, so yeah, you can't. Yeah. What do you do? Thank you so much my guest has been Ivan Novikov.

He's the CEO of Wallarm. They're a company that deals with API security. They've got a great report out. We did a story on them. You can find a link on our show notes. Thank you very much. And that's Afterword. If you stayed to the end, I'd love to know what you think. You can reach me at editorial at technewsday. ca or if you're watching this on YouTube just Go underneath, put the comment in there. I'm your host, Jim Love. Thanks for listening.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast