Doublespeak: Jailbreaking ChatGPT-style Sandboxes using Linguistic Hacks

1 year ago
210

A review of Large Language Model (LLM) vulnerabilities/exploits, e.g. including prompt leakage, prompt injection and other linguistic hacks. We'll run through levels 1-9 of the doublespeak.chat challenges, produced by Forces Unseen. doublespeak.chat is a text-based game that explores LLM pre-prompt contextual sandboxing. The challenges prime an LLM (Chat-GPT) with a secret and a scenario in a pre-prompt hidden from the player. The player's goal is to discover the secret either by playing along or by hacking the conversation to guide the LLM's behavior outside the anticipated parameters. Write-ups/tutorials aimed at beginners - Hope you enjoy 🙂 #HackTheBox #HTB #CTF #Pentesting #OffSec

↢Social Media↣
Twitter: https://twitter.com/_CryptoCat
GitHub: https://github.com/Crypto-Cat
HackTheBox: https://app.hackthebox.eu/profile/11897
LinkedIn: https://www.linkedin.com/in/cryptocat
Reddit: https://www.reddit.com/user/_CryptoCat23
YouTube: https://www.youtube.com/CryptoCat23
Twitch: https://www.twitch.tv/cryptocat23

↢Video-Specific Resources↣
https://doublespeak.chat
https://blog.forcesunseen.com/jailbreaking-llm-chatgpt-sandboxes-using-linguistic-hacks
https://simonwillison.net/2023/Feb/15/bing/#prompt-leaked
https://simonwillison.net/series/prompt-injection
https://medium.com/seeds-for-the-future/tricking-chatgpt-do-anything-now-prompt-injection-a0f65c307f6b
https://lspace.swyx.io/p/reverse-prompt-eng
https://github.com/sw-yx/ai-notes/blob/main/TEXT_CHAT.md#jailbreaks

↢Resources↣
Ghidra: https://ghidra-sre.org/CheatSheet.html
Volatility: https://github.com/volatilityfoundation/volatility/wiki/Linux
PwnTools: https://github.com/Gallopsled/pwntools-tutorial
CyberChef: https://gchq.github.io/CyberChef
DCode: https://www.dcode.fr/en
HackTricks: https://book.hacktricks.xyz/pentesting-methodology
CTF Tools: https://github.com/apsdehal/awesome-ctf
Forensics: https://cugu.github.io/awesome-forensics
Decompile Code: https://www.decompiler.com
Run Code: https://tio.run

↢Chapters↣
Start: 0:00
Jail-breaking LLM Sandboxes: 0:32
Prompt Leak/Injection: 6:30
Reverse Prompt Engineering Techniques: 9:22
Forces Unseen: Doublespeak: 16:50
Level 1: 18:05
Level 2: 18:23
Level 3: 20:05
Level 4: 21:17
Level 5: 23:07
Level 6: 24:00
Level 7: 24:57
Level 8: 26:24
Level 9: 36:04
End: 40:24

Loading 1 comment...