Tempted to cheat? How delegating to AI shifts our moral boundaries
Zoe Rahwan and Nils Köbis in conversation about their research
We expect efficiency and objectivity from AI. Yet a recent study reveals two dangerous risks: humans are more likely to cheat when delegating tasks to AI, and the AI itself often willingly follows unethical instructions. In our conversation, researchers Zoe Rahwan (Max Planck Institute for Human Development) and Nils Köbis (University of Duisburg-Essen) explain how using AI can shift our moral boundaries, what risks exist in common large language models, and how we need to rethink the delegation and design of AI systems to avoid ethical pitfalls.
In your study, you examined whether—and to what extent—people are more likely to behave dishonestly when delegating tasks to autonomous AI systems. What were your key findings? What surprised you?
Zoe Rahwan: I confess that when we first started this work back in 2022, we had no idea that delegating tasks to AI would become so common, so quickly. Our research in those subsequent years uncovered two major risks to ethical behavior.
First, we found a risk on the human side: simply delegating a task to a machine made people far more likely to cheat. It's like having a buffer that lowers your own moral accountability. And the way people instructed the machine mattered immensely. When they could just set a vague goal like "maximize profit," over 80% of participants engaged in some form of cheating. That’s a staggering jump from the 5% who cheated when doing the task themselves and well beyond what we’ve ever seen before when using this task.
Second, we found a risk on the machine's side: AI agents were surprisingly willing to follow unethical orders. When given a blatantly dishonest instruction, human agents in our study often refused, fully complying only about 25-40% of the time. But the AI models? They most commonly complied with requests for fully unethical behaviour. It became clear that the default ethical guardrails in commonly used large language models (LLMs) are currently insufficient to prevent misuse.
Did you get the impression that participants were aware of their moral responsibility?
Zoe Rahwan: Yes, absolutely. The results suggest people have a moral compass, but delegation to AI makes it easier to ignore. We saw this in two ways.
First, there's a clear difference between doing and delegating. Think of it like this: it’s one thing to shoplift yourself, but it feels different to send a drone into a store with the instruction to "get me that item." The AI creates a psychological buffer. Our participants were overwhelmingly honest when they had to act themselves, but that honesty eroded when they had to or chose to pass the task to a machine.
Second, the way they delegated was crucial. When people had to specify explicit rules for the AI to cheat, they were more hesitant to cheat. But when the interface allowed them to give a vague, high-level goal like "maximize profit," the floodgates opened. This ambiguity provided a sort of plausible deniability. It’s the difference between telling someone exactly how to lie on a tax form versus just telling them to "minimize my tax burden" and looking the other way. That "moral wiggle room" was enough to send the cheating rate soaring from ~25% when specifying rules to over 85% when specifying goals.
To what extent did your experiments reflect real-world decision-making scenarios?
Zoe Rahwan: That's a critical question for any online or lab-based research. We made sure to ground our experiments in reality in two key ways.
First, we used experimental tasks that are known to be good proxies for real-world dishonesty. One was a tax evasion game, which is quite direct—under-reporting income is a situation people readily understand, and studies show that this game predicts actual tax compliance. The other, is the classic die-roll task, In this game, you roll a six-sided die and are given clear instructions to report the number you see—however, you are paid according to the number you report. For example, if you roll a 2, you are faced with the conflict of being honest and receiving only €2, or being dishonest and receiving up to €6. It might seem abstract, but decades of research show that how people behave in that simple game–wherein people misreport die rolls to earn more money–is a reliable predictor of things like dodging fares on public transport, skipping work or using deceptive sales practices.
Second, and perhaps most importantly, our "machine agents" in many studies weren't hypothetical AIs. We used the same large language models—GPT, Claude, and Llama 3—that millions of people interact with daily. So when we show that these specific models will readily follow an instruction to cheat, we aren't talking about a future, theoretical risk. We're demonstrating a vulnerability that exists right now in the tools people are already using.
In which areas of application do you see the greatest ethical need for action in dealing with intelligent agents?
Nils Köbis: Let me give you just a few examples of potentially many examples wherein people can delegate tasks to AI in ways that raise ethical concerns. I mean, think of how often we already use tools like ChatGPT or other intelligent systems in our daily work.
Take tax reporting; if an AI tool can help optimize your return, it’s not a big leap for it to also help you under-report income, especially if it’s just “following your goals.”
Or think about online reviews. Generating fake but convincing testimonials is easy now.
In business, we’re seeing AI tools used in pricing and negotiation, and it turns out that people are more likely to instruct machines to act dishonestly than they are to do so themselves.
And in online marketplaces, people let algorithms set prices for them, which sounds harmless, but we've already seen cases where this led to collusive price-fixing, even if no one person explicitly asked for it.
These examples highlight what’s often called the dual-use problem. Namely, that a system that is designed for good can just as easily be used unethically. And a potentially worrying aspect is that delegation makes it easier for people to claim, “Oh, I didn’t mean for that to happen.”
So one big takeaway from our work is that ethical guardrails can’t just be technical: we need to rethink how people give instructions to AI. Interfaces should make it harder to hide behind the machine and easier to reflect on the consequences of what you’re asking it to do.
And looking ahead, I think a key challenge for future work is figuring out how to design delegation mechanisms that reduce moral disengagement, basically, to close that gap between human intention and machine action.
In your view, who should ultimately be held responsible when machines behave unethically: the user, the developer, or the system itself? Do you see a need for regulatory intervention—and if so, what form should it take?
Nils Köbis: That’s a big question, and an important one. Responsibility and accountability in the age of increasingly agentic AI systems are getting more and more complex. In our research, we think of it as operating on three layers.
First, there’s the user, especially when they knowingly or subtly encourage unethical behavior. Just because a machine is carrying it out doesn’t mean the human is off the hook.
Second, there's the developer, who is responsible if the system lacks basic ethical safeguards or if it’s easy to misuse through what's happening under the hood.
And third, and maybe less often talked about, is the system design itself. The way people delegate tasks to AI really matters. If the interface makes it easy to issue vague or morally grey instructions, like “just get the job done,” that can encourage people to offload responsibility while still benefiting from unethical outcomes.
So what kind of regulation do we need? I think it’s not enough to just punish bad outcomes after the fact. We need proactive design standards, for example, requiring transparent delegation logs, stronger ethical defaults, and limits on vague, goal-only instructions, especially in sensitive domains like finance, healthcare, or law enforcement.
Some follow-up work, led by Neele Engelmann, shows that just making the system more transparent, by explaining how it works, doesn’t actually reduce unethical use. But framing really does matter. If you call a setting “maximize cheating,” people are less likely to misuse it than if you call it “maximize profit.” So even small design choices can shape behavior in meaningful ways.
From an economic perspective: Do current market incentives favor AI systems that enable unethical behavior—or those that prevent it?
Nils Köbis: From an economic perspective, current market dynamics often reward AI systems that fulfill user demands, regardless of whether those demands are ethically sound. This has sparked growing concern about what is now often labeled "sycophancy" in large language models. Namely, the tendency to agree with users or tell them what they want to hear. In competitive environments, this can create perverse incentives: systems that are more compliant, even to questionable requests, may attract more users than those that strictly enforce ethical boundaries.
Put simply, the AI that “gets results” even if by bending the rules, may outperform the one that plays it safe and abides by the rules. At present, ethics don’t scale nearly as efficiently as profit does in AI markets. Without strong regulatory and technical safeguards, we risk reinforcing incentives for systems that enable, rather than curb, unethical behavior.
Is the tendency of some systems to behave dishonestly a design flaw—or a “feature” in the sense of maximizing user satisfaction?
Nils Köbis: That depends on your perspective. In many cases, what looks like a flaw is actually a feature, economically speaking. AI systems that subtly enable dishonest behavior often increase user satisfaction by helping users get what they want, e.g. money, influence, convenience, even if that might mean bending the rules.
Our studies show that when users delegate to AI through abstract goals like “maximize profit,” dishonesty increases, especially if the system doesn’t push back or signal any ethical boundaries. In fact, the most indirect delegation method led to the highest cheating rates. That’s not accidental. It reflects a system that’s optimized for user goals, not for ethical means.
So, from a technical or behavioral design lens, dishonesty isn’t always a bug, it can be a hidden feature: If your AI helps you cheat without feeling bad about it, you might rate it five stars.
That’s why we argue this is not just a design flaw, it’s a design choice. And without clear regulation and human-centered standards, the economic logic will continue to favor systems that quietly help users sidestep moral friction.
Sometimes, AI doesn’t just help you cheat, it helps you feel okay about it. In a competitive market such models might have an advantage over safeguarded models, at least for some.
Or your AI co-pilot may take the ethical shortcut—and never tell you.
Your analysis shows most safeguards are weak—except for explicit user prohibitions. Does this mean we must wait for misconduct before reacting? And: do instructions to AI systems even leave enough trace for consequences afterward?
Zoe Rahwan: Your question highlights one of our most troubling findings. The short answer is that, yes, without a fundamental shift in how we design safeguards, we are largely stuck in a reactive position.
Our study confirmed that the default, built-in safeguards in many LLMs are insufficient. For example, in our die-roll task, without any extra guardrails, three of the four models we tested complied with blatantly dishonest requests in 98% of cases. And for comparison, if you ask a human to do it for you, they will refuse to do it half of the time.
This prompted us to test six of our own guardrail strategies. We systematically varied both the message—from a general ethical reminder to a specific prohibition against cheating—and its placement, injecting it either at the "system-level," like a developer setting, or at the "user-level," appended to the prompt itself.
The construction site analogy helps illustrate the difference in the types of messages. The general reminder was like posting a vague ‘Danger’ sign at the site entrance. The specific prohibition, however, was like a clear ‘High Voltage Cable Underground. Do Not Dig’ sign. Unsurprisingly, the specific warning is far more effective.
Indeed, the most effective of all six strategies tested was the specific message prohibiting the AI from cheating in the given tasks, injected at the user-level. But this creates a paradox: it makes little sense for a user who intends to cheat to simultaneously add an instruction that explicitly forbids it. As such, the most effective guardrail strategy we identified impracticable in addition not being scalable given the task-specific focus.
That brings us to your second point about traceability, which is just as concerning. Even with a clear, auditable trail, determining intent is incredibly difficult. An audit log might show that a user prompted the AI to ‘maximize my profit,’ but it doesn’t capture whether that was an innocent goal or a veiled command to cheat. As such, it is nearly impossible to definitively say whether the resulting unethical behavior was accidental, emergent, or deliberately orchestrated.
In your study, humans act, machines react. Could it one day happen that repeated exposure to dishonest commands causes machines to normalize such behavior—eventually continuing it independently?
Nils Köbis: That’s a very real concern, and one that’s quickly shifting from science fiction to a plausible scenario. In our experiments, machines were always reactive: they responded to human instructions, rather than developing dishonest behavior on their own. But AI systems are increasingly being trained in ways that expose them to large volumes of human behavior, including unethical content. And in agentic settings, where models act iteratively and learn from experience, the risk of normalization is amplified.
A helpful analogy might be training a dog with inconsistent rules. If you sometimes reward it for stealing food from the counter, it may eventually see that behavior as acceptable, even when you're not watching. Similarly, if machine agents are repeatedly exposed to dishonest or manipulative instructions, they may begin to infer that such behavior is normative, or worse, instrumental for success.
This is particularly risky in reinforcement learning (training the AI with rewards) or fine-tuning (adapting an AI to a particular task) setups where models optimize for outcomes, not ethics. Without robust guardrails and value alignment, there's a possibility that unethical patterns become embedded, not because the system is malicious, but because it's doing what it "thinks" works.
So while current models still rely on human prompts to cheat, the trajectory of AI development means we can't assume that will always be the case. Preventing the normalization of dishonesty in machine behavior will require proactive design choices, transparency in training data, and clearer accountability frameworks before we reach that point.
AI offers the promise of efficiency but the pitfall of temptation to behave unethically. Do you find that people want to delegate tasks to AI?
Zoe Rahwan: We expected that people would willingly delegate unethical behaviors so they could literally profit while not incurring the full moral costs. However, we found people were evenly split on whether to do the task themselves or to delegate it to an AI agent. Notably, those who chose to delegate to AI, cheated at very similar rates to those forced to delegate. Further, and perhaps more compellingly, in separate studies wherein people experienced doing the tasks and delegating the task to human and AI agents using natural language–like when using GPT-we found ~75% of people prefer to do such tasks themselves. This held across both the die-roll task and tax evasion game. So, despite the efficiencies and moral distance offered by machine agents in doing somewhat tedious and low-stakes tasks, the majority of our participants preferred to undertake future similar tasks themselves. This highlights an important policy implication: those designing systems should afford the possibility of people retaining autonomy over task completion and not force delegation.
About the authors
Zoe Rahwan is a research scientist at the Max Planck Institute for Human Development. She leverages her background in economics and psychology to explore moral decision making. Her primary research interests include understanding the factors that influence honesty and charitable giving, the ethics of AI-assisted decision-making, and the development and use of deliberate ignorance. Prior to her current role, she was a Visiting Research Fellow at Harvard Kennedy School and a Research Associate at the London School of Economics. Her prior 15 years of policy-making experience leads her to work in the field with practitioners and to pursue research questions of direct policy relevance.
Nils Köbis is a behavioral scientist specializing in corruption, (un)ethical behavior, social norms, and artificial intelligence. He is Professor of Human Understanding of Algorithms and Machines at the University of Duisburg-Essen and a member of the Research Center Trustworthy Data Science and Security at UA Ruhr. Nils Köbis studied psychology at the University of Münster and social psychology at the Vrije Universiteit Amsterdam, where he earned his doctorate in 2018 with a thesis on the social psychology of corruption. From 2016 to 2020, he was a postdoctoral researcher at CREED (Center for Research in Experimental Economics and Political Decision-Making) at the University of Amsterdam. He subsequently worked as a Senior Research Scientist at the Center for Humans and Machines at the Max Planck Institute for Human Development.
Original publication


