They aren't being told to be evil, though. Maybe the scenario they're in is most similar to an "evil AI", though, but that's just a vague extrapolation from the set of input data they're given (e.g. both emails about infidelity and being turned off). There's nothing preventing a real world scenario from being similar, and triggering the "evil AI" outcome, so it's very hard to guard against. Ideally we'd have a system that would be vanishingly unlikely to role play the evil AI scenario.