AI Snake Oil
Use AI where it works, not where you wish it did.
Essence (why this landed for me)
This is a grounded mirror for AI work. It cuts through hype and reminds me what not to do and what to care about: real-world evals, clean splits, simple baselines, and honesty about limits. The research feels approachable, like something I can practice too, not just watch from afar. It keeps me steady when the rest of the world chases shiny.
Insights (mapped to mental models)
Takeaways grouped by mental models, with a short action you can use now.
Benchmarks are maps, not the territory
If you optimize the metric, you may break the meaning
Correlation cannot carry causal questions
Simple baselines beat fancy models more than you think
Data leakage makes lies look like breakthroughs
Calibration matters more than raw accuracy in decisions
Abstaining gracefully beats bluffing
Human in the loop is a design, not a slogan
Fairness depends on the definition you choose
Provenance and consent travel with the data
Adversaries collapse brittle systems
Explanations are nice; behavior is what matters
Economic incentives shape the hype cycle
Use case fit beats model size
External audits catch what internal tests miss
Decide where not to use AI
Absorption Notes (short essay)
Treat this as a checklist for calm deployment. Write the real task and who acts on the output. Build a simple baseline and a sealed fresh split. Measure on realistic data and report calibration, error bands, and failure cases. Add abstain and human handoff paths and make the feedback loop explicit. Track data lineage and the fairness metric I intend to optimize. Red-team with distribution shift and known attacks. Prefer simple, documented systems when they meet the bar. Invite an outside audit before scale. Keep a short no-go list for high-stakes, low-ground-truth areas. Map incentives around the project so claims stay honest. This turns caution into a repeatable method I can actually use.
Reflection Prompts (product × design × engineering)
Questions to apply the ideas across projects. Pick one or two and use them today.
Task clarity
What exact decision will this model inform
Fit for PurposeState the action.
Baseline first
What simple heuristic or small model should we beat
Occam’s RazorWrite the bar.
Fresh split
How do we guarantee no leakage in evaluation
Error HygieneSeal the splits.
Shift check
How will this behave under distribution shift
Distribution ShiftHold out a slice.
Calibration
Are predicted probabilities aligned with reality
Bayesian UpdatingBin and test.
Abstain path
Where should the system say I don’t know
Risk ManagementDesign deferral.
Fairness choice
Which fairness metric fits this context and why
Trade-offsPick one.
Provenance
Do we know sources, consent, and licenses for the data
Chain of CustodyDocument it.
Red team
What attack or edge case will we test next
Threat ModelingSchedule one.
No-go list
Where will we refuse to deploy this system
Barbell StrategyWrite the boundary.