Introduction to Hacking LLMs: Jailbreaks & System Prompt Extraction
A recent branch of penetration testing has emerged around attacking Large Language Models (LLMs). The field is still young, which invites two notable consequences. To know how to hack LLMs, it’s important to understand what they are. At their over-simplified core, Large Language Models are next-word predictors (more accurately, next-token predictors) that output a “best”…