AI Snake Oil by Arvind Narayanan & Sayash Kapoor

AI Snake Oil

Arvind Narayanan & Sayash Kapoor

Format: Audio/Print Personal Score: 9.4 / 10

Use AI where it works, not where you wish it did.

Essence (why this landed for me)

This is a grounded mirror for AI work. It cuts through hype and reminds me what not to do and what to care about: real-world evals, clean splits, simple baselines, and honesty about limits. The research feels approachable, like something I can practice too, not just watch from afar. It keeps me steady when the rest of the world chases shiny.

Insights (mapped to mental models)

Takeaways grouped by mental models, with a short action you can use now.

Benchmarks are maps, not the territory

ACTION Test on a fresh slice.
HOW IT SHOWS UP IN THE BOOK Great leaderboard results often fade when the data distribution shifts.
MENTAL MODELS Map ≠ Territory, Distribution Shift
MODEL CLUSTER Logic & Reasoning

If you optimize the metric, you may break the meaning

ACTION Guardrail the KPI.
HOW IT SHOWS UP IN THE BOOK Overfitting to public benchmarks creates systems that game scores, not outcomes.
MENTAL MODELS Goodhart’s Law, Second-Order Thinking
MODEL CLUSTER Systems & Adaptation

Correlation cannot carry causal questions

ACTION Label the causal gap.
HOW IT SHOWS UP IN THE BOOK Predictive models fail at decisions that require cause-and-effect reasoning.
MENTAL MODELS Correlation ≠ Causation, Causal Inference
MODEL CLUSTER Logic & Reasoning

Simple baselines beat fancy models more than you think

ACTION Run a baseline first.
HOW IT SHOWS UP IN THE BOOK Clear heuristics and small models often match complex systems in practice.
MENTAL MODELS Occam’s Razor, Pareto
MODEL CLUSTER Growth & Focus

Data leakage makes lies look like breakthroughs

ACTION Seal the splits.
HOW IT SHOWS UP IN THE BOOK Hidden overlap between train and test inflates reported gains.
MENTAL MODELS Error Hygiene, Path Dependence
MODEL CLUSTER Human Judgment & Bias

Calibration matters more than raw accuracy in decisions

ACTION Check confidence bins.
HOW IT SHOWS UP IN THE BOOK Well-calibrated probabilities help humans act; overconfident errors harm.
MENTAL MODELS Bayesian Updating, Decision Hygiene
MODEL CLUSTER Logic & Reasoning

Abstaining gracefully beats bluffing

ACTION Allow ‘I don’t know’.
HOW IT SHOWS UP IN THE BOOK Refusal or deferral options reduce high-cost mistakes in edge cases.
MENTAL MODELS Error Minimization, Risk Management
MODEL CLUSTER Systems & Adaptation

Human in the loop is a design, not a slogan

ACTION Draw the handoff.
HOW IT SHOWS UP IN THE BOOK Specify when the model suggests, when people decide, and how feedback returns.
MENTAL MODELS Interface Design, Feedback Loops
MODEL CLUSTER Systems & Adaptation

Fairness depends on the definition you choose

ACTION Write the fairness metric.
HOW IT SHOWS UP IN THE BOOK Different fairness goals conflict; pick and justify one tied to context.
MENTAL MODELS Trade-offs, Base Rates
MODEL CLUSTER Human Judgment & Bias

Provenance and consent travel with the data

ACTION Track data lineage.
HOW IT SHOWS UP IN THE BOOK Unknown sources and scraped sets create legal and ethical risk.
MENTAL MODELS Chain of Custody, Moral Hazard
MODEL CLUSTER Systems & Adaptation

Adversaries collapse brittle systems

ACTION Red-team the inputs.
HOW IT SHOWS UP IN THE BOOK Small perturbations and prompt attacks expose shallow shortcuts.
MENTAL MODELS Antifragility, Threat Modeling
MODEL CLUSTER Systems & Adaptation

Explanations are nice; behavior is what matters

ACTION Audit outcomes.
HOW IT SHOWS UP IN THE BOOK Post-hoc stories do not fix bad outputs; measure external behavior.
MENTAL MODELS Behavior over Intent, Falsification
MODEL CLUSTER Logic & Reasoning

Economic incentives shape the hype cycle

ACTION Map who benefits.
HOW IT SHOWS UP IN THE BOOK Vendors, media, and buyers each face incentives that inflate claims.
MENTAL MODELS Incentives, Principal–Agent Problem
MODEL CLUSTER Human Judgment & Bias

Use case fit beats model size

ACTION Right-size the tool.
HOW IT SHOWS UP IN THE BOOK Task clarity, latency, and cost often favor smaller or non-ML solutions.
MENTAL MODELS Fit for Purpose, Cost–Benefit
MODEL CLUSTER Growth & Focus

External audits catch what internal tests miss

ACTION Invite a third-party check.
HOW IT SHOWS UP IN THE BOOK Independent evaluation reveals blind spots and leakage.
MENTAL MODELS Independent Verification, Red Teaming
MODEL CLUSTER Human Judgment & Bias

Decide where not to use AI

ACTION List hard no-go zones.
HOW IT SHOWS UP IN THE BOOK High-stakes domains without ground truth or recourse are poor fits.
MENTAL MODELS Circle of Control, Barbell Strategy
MODEL CLUSTER Growth & Focus

Absorption Notes (short essay)

Treat this as a checklist for calm deployment. Write the real task and who acts on the output. Build a simple baseline and a sealed fresh split. Measure on realistic data and report calibration, error bands, and failure cases. Add abstain and human handoff paths and make the feedback loop explicit. Track data lineage and the fairness metric I intend to optimize. Red-team with distribution shift and known attacks. Prefer simple, documented systems when they meet the bar. Invite an outside audit before scale. Keep a short no-go list for high-stakes, low-ground-truth areas. Map incentives around the project so claims stay honest. This turns caution into a repeatable method I can actually use.

Reflection Prompts (product × design × engineering)

Questions to apply the ideas across projects. Pick one or two and use them today.

Task clarity

What exact decision will this model inform

Fit for Purpose

State the action.

Baseline first

What simple heuristic or small model should we beat

Occam’s Razor

Write the bar.

Fresh split

How do we guarantee no leakage in evaluation

Error Hygiene

Seal the splits.

Shift check

How will this behave under distribution shift

Distribution Shift

Hold out a slice.

Calibration

Are predicted probabilities aligned with reality

Bayesian Updating

Bin and test.

Abstain path

Where should the system say I don’t know

Risk Management

Design deferral.

Fairness choice

Which fairness metric fits this context and why

Trade-offs

Pick one.

Provenance

Do we know sources, consent, and licenses for the data

Chain of Custody

Document it.

Red team

What attack or edge case will we test next

Threat Modeling

Schedule one.

No-go list

Where will we refuse to deploy this system

Barbell Strategy

Write the boundary.