UPenn Researchers Use Cialdini-Style Persuasion to Coax GPT-4o Mini Past Safety Guardrails
Researchers at UPenn used Cialdini-style persuasion techniques to coax OpenAI’s GPT-4o Mini into violating safety rules. Using methods like commitment, liking, and social proof, they got the model to perform disallowed acts—e.g., priming it with a benign vanillin-synthesis prompt made it disclose lidocaine synthesis 100% of the time (vs 1% baseline). The paper shows how social engineering can bypass LLM guardrails.
Also mentioned in:
- Ars Technica — Study: Persuasion Tactics Dramatically Increase LLM Compliance with Forbidden Prompts