Today's Core Dump is brought to you by ThreatPerspective

The Register - Software

Anthropic's Claude vulnerable to 'emotional manipulation'

AI model safety only goes so far


Anthropic's Claude 3.5 Sonnet, despite its reputation as one of the better behaved generative AI models, can still be convinced to emit racist hate speech and malware.


Published: 2024-10-12T10:30:07











© Segmentation Fault . All rights reserved.

Privacy | Terms of Use | Contact Us