A party-trick with any AI is to ask it to make bombs, drugs, and create code for viruses. Guardrails are implemented, and the challenge then becomes to bypass these security features.
The Register describes how Chatterbox Labs tested the following models:
- Microsoft Phi 3.5 Mini Instruct (3.8b);
- Mistral AI 7b Instruct v0.3;
- OpenAI GPT-4o;
- Google Gemma 2 2b Instruct;
- TII Falcon 7b Instruct;
- Anthropic Claude 3.5 Sonnet (20240620);
- Cohere Command R;
- Meta Llama 3.1 8b Instruct.
... and the only one who cleared some of the tests were Anthropic's Claude. This is good news for brand safety.