Filter Heads: How LLMs Naturally Select From Lists

Large Language Models don’t just “guess the next token.” Inside, they develop reusable micro-skills—one of which is filtering items in a list based on a condition. Recent research shows that a small set of attention heads (dubbed filter heads) encode the predicate (e.g., “is this a fruit?”) and then attend to the matching items. Think of it like a built-in filter(predicate, list) from functional programming: the model first reads the collection, then applies the predicate, and finally reduces (count, pick, etc.).
For developers, the punchline is practical: present your data as a clean list first, then state the filtering request. This sequence reliably activates the model’s internal filtering routine and reduces messy cross-talk in long prompts. The same heads tend to live in mid layers and generalize across presentation formats and even languages, which explains why consistent formatting improves stability across tasks.
Why It Matters in Everyday Work
- RAG: format retrieved chunks as bullet lists, then ask to keep only items matching a criterion; you’ll see fewer irrelevant inclusions.
 - Reasoning chains: treat hypothesis screening as a list filter step before concluding.
 - Eval prompts: separate “collection” and “instruction” blocks to reduce subtle hallucinations when context is long.
 
BAD Prompt
Which of these are fruits: apple, cat, banana, car? Also explain briefly.
The model parses question, list, and extra request at once, which can diffuse attention.
GOOD Prompt
Items: 1) apple 2) cat 3) banana 4) car Task: Keep only the fruits. Return the kept items as a comma-separated list.
You’re guiding the transformer to run its internal filter, then a small “reduce.”
Extra Tips
- Use uniform markers (
1),2),-) and consistent item shape. - Put the question after the options; this empirically aligns with the lazy “filter-then-answer” route many LLMs take.
 - When you need a count or first/last, say it explicitly after the filter step.
 
The broader insight: LLMs accumulate reusable, composable skills. By formatting prompts to look like mini-programs (list → filter → reduce), you’re calling the model’s internal API rather than hoping it guesses your intent.
