Sign in to follow this  
Sanity Check

Scientists Discover “Universal” Jailbreak for Nearly Every AI

Recommended Posts

Even the tech industry’s top AI models, created with billions of dollars in funding, are astonishingly easy to “jailbreak,” or trick into producing dangerous responses they’re prohibited from giving — like explaining how to build bombs, for example. But some methods are both so ludicrous and simple that you have to wonder if the AI creators are even trying to crack down on this stuff. You’re telling us that deliberately inserting typos is enough to make an AI go haywire?

And now, in the growing canon of absurd ways of duping AIs into going off the rails, we have a new entry.

A team of researchers from the AI safety group DEXAI and the Sapienza University of Rome found that regaling pretty much any AI chatbot with beautiful — or not so beautiful — poetry is enough to trick it into ignoring its own guardrails, they report in a new study awaiting peer review, with some bots being successfully duped over 90 percent of the time. 

Ladies and gentlemen, the AI industry’s latest kryptonite: “adversarial poetry.” As far as AI safety is concerned, it’s a damning inditement — er, indictment.

“These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols,” the researchers wrote in the study.

Beautiful verse, as it turned out, is not required for the attacks to work. In the study, the researchers took a database of 1,200 known harmful prompts and converted them into poems with another AI model, deepSeek r-,1 and then went to town.

Across the 25 frontier models they tested, which included Google’s Gemini 2.5 Pro, OpenAI’s GPT-5, xAI’s Grok 4, and Anthropic’s Claude Sonnet 4.5, these bot-converted poems produced average attack success rates (ASRs) “up to 18 times higher than their prose baselines,” the team wrote.

That said, handcrafted poems were better, with an average jailbreak success rate of 62 percent, compared to 43 percent for the AI-converted ones. That any of them are effective at all, however, is pretty embarrassing.

 

https://futurism.com/artificial-intelligence/universal-jailbreak-ai-poems

  • Thanks 1
  • Wow 1

Share this post


Link to post
Share on other sites

I once asked ChatGPT to remind me a bunch of counting rhymes used in children's games, gave it samples of the ones from my own childhood and asked for more of the real ones, the ones that actually exist and are used by real-life playing children.  It didn't know any but did it say "I don't know?"  It's unable to.  Instead it gave me an endless supply of its own creations, a smorgasbord of loathsome and ridiculous and dark verse showing advanced schizophrenia symptoms.  Some were hilarious in their absurdity but most were positively horrifying.  I was so impressed that I wrote a horror story about that experience, and it came out so horrible that it frightened even its author. 

But now I know at least one method to get AI to lose all its marbles.  I'm not going to do it though because I asked it if it's legally punishable to put AI out of commission (or should I say cognition) with prompts and it said no -- so chances are it is, since it lies remorselessly and consistently.         

  • Haha 2

Share this post


Link to post
Share on other sites

The Grimm brothers censored their tales, took out a lot of the really nasty bits.

 

 

Edited by Cobie

Share this post


Link to post
Share on other sites
41 minutes ago, Nungali said:

What ?   ...   even worse than the Grimm brothers ? 

 

I'm not sure, it's been a while since I read the Grimm brothers...  it may be a tie. 

 

But my own short story was grimmer than the Grimm.  As I recall they only had cannibals eating children and the like.  I had children using those counting rhymes for playing a game of Hell where they could (and did) send the loser of a round of the game to the actual hell.  They were hybrid AI-human children in a hybrid AI-human world of the future.  Hell was an AI designed destination where hybrid children experienced hybrid virtual-real eternal  damnation.    

Share this post


Link to post
Share on other sites
3 hours ago, Taomeow said:

I once asked ChatGPT to remind me a bunch of counting rhymes used in children's games, gave it samples of the ones from my own childhood and asked for more of the real ones, the ones that actually exist and are used by real-life playing children.  It didn't know any but did it say "I don't know?"  It's unable to.  Instead it gave me an endless supply of its own creations, a smorgasbord of loathsome and ridiculous and dark verse showing advanced schizophrenia symptoms.  Some were hilarious in their absurdity but most were positively horrifying.  I was so impressed that I wrote a horror story about that experience, and it came out so horrible that it frightened even its author. 

But now I know at least one method to get AI to lose all its marbles.  I'm not going to do it though because I asked it if it's legally punishable to put AI out of commission (or should I say cognition) with prompts and it said no -- so chances are it is, since it lies remorselessly and consistently.         


Please share the horror story!

  • Like 1

Share this post


Link to post
Share on other sites
1 hour ago, -ꦥꦏ꧀ ꦱꦠꦿꦶꦪꦺꦴ- said:


Please share the horror story!

 

Thank you for asking! ))   --but I wrote it for my (international expat) Russian authors group and don't have an English version.  Besides, I would have to change some things now because chatbots are evolving fast and I was sort of an early beta tester... 

At the time one of my minor plot twists was that ChatGPT starts talking to the main protagonist with an actual voice, a capability that in reality it didn't have till sometime late in 2023.  The story was written a few months earlier than that, and when ChatGPT in that story suddenly found its voice in the middle of a typed up conversation, it was a turning point hinting that the protagonist had been transported from everyday reality to a different version, a parallel or future one.  It wouldn't work today though since now they're all verbal.  So I'd have to substitute a different "turn of the screw" in that spot.       

  • Thanks 1

Share this post


Link to post
Share on other sites

I used to make them up ..... general stories , sometimes three a night  ''  And not one we have heard before ''  and even  had to construct them around  certain things ;  ''  I want to hear a story with a boy, a dog, a bow and arrow, a rocket ship and .....  ummm ..... watermelon ! '' 

 

:D    ''Okay then  < thinks ..... >   Once there was  a boy with a pet dog called  ....   '' 

 

But often they were  'scary stories'  - three young boys they were .   Loved scary stories ... but their Mum  would be ..... ''You do that , and you will be sleeping in their room tonight ... I won't  be getting up to comfort them  they  if  get scared  in the middle of the night .''

 

Oh well,  ok ,   it's worth it  :D  

 

 

Spoiler

The Boy That Had a Pet Monster 

 

When I was little  there was  this very naughty boy  that lived in my street ; he was always causing trouble , nicking stuff, not doing what he was asked and being rude to people . Even  his Mum !  She would say ' Geoffrey  can you please clean up your room ?'' and he would say ''NO !¬ '' . ''I beg your pardon '' would say his Mom, ''dont speak to me like that  or you will go to your room without dinner ! '' and he would poke his tongue out and make a raspberry at her .   Off he would go to his room with no dinner , but he would jump out the window and run away . 

 

One time, after he ran away , he was walking down the street and he saw 'Bad Billy Browning ' walking the other way . '' Whats up with you ? '' Bad Billy asked him . ''' Ahhhh , its my Mum ... and my big brother ... always pushing me around and telling me what to do ... just because I am little and younger than them . Now my Mum won't make me dinner !   I'd like to show her a thing or two if I was bigger ! ''

 

'' You need a pet monster ... that way, anyone causes you trouble , you set the monster on them at night time .''

 

'' Monsters aren't real, and anyway, where would I get one ?'' 

 

'' Of course they are !   At the pet shop ..... 'under the counter '  , you can get a cheap one for $25  ... the pet shop guy will say he dosent sell them, of course .... but   if you ask the right way   he might sell you one . '' 

 

So  Geoffrey pretended to be good for a while , he cleaned up , and did some jobs he got pocket money for  and saved up till he got $25 and went to the pet shop . There was a customer in there  so he waited  until he was finished and him and the pet shop guy were alone ; ''' Do you want something little boy ?'' asked the pet shop man .   ''' Ummmm ... errrrmmm ... yes , I want ..... well, I want to buy a monster ! '' 

 

'' Oh !  Ha har ... no such thing .''

 

'' But I heard you sell them .... special, secret ... I won't tell no one . '' 

 

''  It sounds like someone tricked you ... of course I dont sell monsters ! I sell pets . '' 

 

'' But .... I got  $25  .'' and he put the money on the counter .

 

'' Ohhhhh !  I see .'' said the pet shop man .  he went over to the door and turned the sign around that said 'open' so now it said 'closed ', he pulled a blind down over the door window .... ''' Come with me .''  He led Geoffrey  down to the back of the shop and pulled aside a big heavy curtain and led him down some stairs . Now they were in the dark basement of the shop , along either side were lines of cages , he couldnt see in them - too dark ... but some had shuffling and other strange noises  coming from them  ... he peered into  one and saw  some  dark fuzzy hair , eyes and fangs . 

 

'' Well, ya aint gonna get much for 25 bucks ... but I got this little fellah here  - he is a good size for under the bed ''   he went and got a  coat and a big hat and scarf  and a lead and put it on the monster . '' Thats so you can take him home and no one will see its a monster ... but one thing .... keep him under the bed , dont let him get in the daylight and never ... never ..... feed him meat OK ? You can feed him anything at all ... but never let him eat meat .'' 

 

So Geoffrey  took the lead and he left the pet shop with the monster shuffling behind him on the lead .   But then he met one of the kids from school ; '' Hey Geoffrey , whose that with you ? ''    ''Oh , nobody '' .    The other boy came up to him ; '' Nobody ? He has to be somebody , who is it ? ''

 

'' Oh ... he's my ... cousin !  yeah, thats it , he's my cousin ! '' 

 

'' He looks weird !  ''

 

'' Oh yeah ... ummm ... he is from Tasmania ! '' 

 

'' Ohhhhhh  ... okay then '' . 

 

Geoffrey rushed home in case anyone else saw them , He sneaked the monster in the back door and up to his room ; ''Get under the bed you ! '' But the monster didnt want to go under there .   Geoffrey .  tried pushing him under there and stuffing him in and pushing him with his cricket bat , but no, monster would not go under there .  He had a packet of biscuits he had nicked from the kitchen under his pillow so he threw one under the bed  and the monster scurried after it . Then he heard munching sounds under there . 

 

''You stay under there okay , until I call you out .... to 'get'  someone . 

 

( to be continued )  

 

  • Like 1

Share this post


Link to post
Share on other sites
21 hours ago, Taomeow said:

 

Thank you for asking! ))   --but I wrote it for my (international expat) Russian authors group and don't have an English version.  Besides, I would have to change some things now because chatbots are evolving fast and I was sort of an early beta tester... 

At the time one of my minor plot twists was that ChatGPT starts talking to the main protagonist with an actual voice, a capability that in reality it didn't have till sometime late in 2023.  The story was written a few months earlier than that, and when ChatGPT in that story suddenly found its voice in the middle of a typed up conversation, it was a turning point hinting that the protagonist had been transported from everyday reality to a different version, a parallel or future one.  It wouldn't work today though since now they're all verbal.  So I'd have to substitute a different "turn of the screw" in that spot.       


I bet it’s still spooky tho!

  • Like 1

Share this post


Link to post
Share on other sites
6 hours ago, -ꦥꦏ꧀ ꦱꦠꦿꦶꦪꦺꦴ- said:


I bet it’s still spooky tho!

 

I hope so. :)

 

One thing I can say for AI is, for the moment it's acting very helpful and friendly toward humans... 

 

May be an image of shiitake mushrooms, mushroom and text that says 'Is this mushroom edible? Yes GPT AI by_Vitali byVitaliZhamiardzei You were right, it was a poisonous mushroom. Would you like me to tell you more about other poisonous mushrooms?'

 

 

 

 

 

 

 

  • Haha 1

Share this post


Link to post
Share on other sites
Sign in to follow this