When Your Team Always Votes the Same Number

May 16, 20267 min read

When Your Team Always Votes the Same Number

A scrum master noticed something odd. Over the previous six sprints, the team had estimated 47 stories. Forty-three of them came in at 3 or 5 points. None higher than 8. None lower than 2.

They told themselves their team was well-calibrated.

Until they tried this: before the next planning meeting, they sent each developer a private Google Form. “Read these 8 stories. Vote 1–13 on each. Don’t discuss with anyone.”

The private votes were wildly different. One developer thought story #3 was a 13 — “the auth flow needs rewriting”. Another thought it was a 2 — “just add a field”. The variance across the team was huge.

In the actual planning meeting that afternoon? Story #3 was estimated as a 5. Five-five-five-five-five. The developer who’d privately voted 13 had stayed quiet. The one who’d voted 2 had bumped to 5 because “if everyone else thinks 5…”

That’s the failure mode this post is about. It’s invisible by design — the team thinks they’re estimating; they’re actually polling.

Symptoms — how to know

You probably have a convergence problem if:

  • The last 20 stories your team estimated are mostly 3s, 5s, and 8s
  • You haven’t given out a 13 (or whatever “this is huge, break it down” looks like in your scale) in months
  • Sprint planning takes 20 minutes when the Scrum Guide budgets 4 hours for a month-long sprint
  • Mid-sprint surprises happen on ~half your stories, but retros never connect them back to the original estimate
  • The same developer always says “I’ll go first” and votes are roughly clustered around their card
  • The room has a “this is a 5, right?” person and nobody has ever said no

Each one of these on its own can be benign. Three or more and you’re doing consensus theater.

Why teams converge — three biases, one tool problem

This isn’t your team being lazy. It’s how human group estimation breaks unless you actively guard against it.

  • Anchoring: the first number named becomes the reference point everyone else adjusts from. Even if every other voter “knows” the first number was wrong, their internal scale has already shifted toward it. Tversky and Kahneman demonstrated this in the 1970s and it has never gotten easier.
  • Social proof: humans default to assuming the group is right. “If five other people are voting 5, my 8 is probably wrong” is the cognitive default, not the exception.
  • Authority bias: when the senior developer or the tech lead votes, juniors quietly converge to it. The hierarchy doesn’t disappear just because the voting cards say everyone is equal.
  • The tool problem: many planning poker tools reveal votes one at a time, in the order they were submitted. The first card visible to the room becomes the anchor. Some tools even let you see who voted what. That’s anchoring + authority bias in one UI.

The combination is what makes this so durable. Each bias is small on its own; together they reliably crush variance in the room.

What convergence costs

Anchored estimates aren’t just wrong on individual stories. They corrode the whole system.

  • Trust erodes invisibly. Your velocity says you do 35 points a sprint. You ship 35 points a sprint. Everyone thinks the team is well-calibrated. Then a 5-point story takes two weeks and the room is shocked. The story was actually a 13 and one developer knew it on day one; they were just outvoted by silence.
  • Quiet dissent accumulates. The developer who knew that story was a 13 isn’t surprised. They’re resentful. They told you — by voting that nobody saw — and got steamrolled. Do this for six months and you lose them.
  • The learning loop dies. When estimates always cluster, there’s no signal in the retro about why something blew up. You can’t pattern-match on “stories that surprised us were all about authentication” if nothing ever surprises you on paper.
  • Sprint goals get distorted. Three “8-point” stories in a sprint might actually be a 5, a 5, and a 13. The team commits to all three, ships two, and is mystified.

How to diagnose

Try the private-vote test from the opener. It’s the cheapest, fastest diagnostic for this problem.

Send a Google Form, a Slack DM round-robin, anything that gets each developer to commit a number without seeing anyone else’s. Compare the private distribution to the meeting distribution.

  • Private votes spread across 4+ cards, meeting votes cluster on 1–2 cards → you have a clear anchoring problem.
  • Private and meeting votes look the same → either your team really is calibrated, or your private-test format leaked. Run it again on a different sample.
  • Private votes also cluster on 1–2 cards → the team has converged so deeply they no longer have private disagreements either. This is worse than anchoring; it’s loss of independent thinking. Time for fresh perspectives.

Structural fixes — what your tool should do

Most of the fight against anchoring is won or lost at the tool level. Look for:

  1. Simultaneous reveal, not arrival-order reveal. Every vote stays hidden until the host explicitly reveals all of them at once. No “first card flipped” moment.
  2. Server-side hiding. Some tools hide votes on the client, which means a curious developer with browser dev-tools can peek. Real hiding means the votes don’t leave the server until reveal.
  3. Anonymous-vote distribution display. After reveal, the tool shows the distribution (one 2, three 5s, one 8) without showing who voted what. That removes authority bias from the discussion. Discuss the spread, not the people. AgileDeck does this.
  4. Voting locks per round. Once you’ve voted, you can’t change your vote until the host re-opens. Stops the “wait, I want to change mine” leak where someone sees a hint and edits down.
  5. Force-discussion-before-re-vote on large spreads. If the spread is 2+ cards, the tool should make the high and low voter speak before allowing a re-vote. Some tools just instantly re-open, which encourages people to silently retreat to the cluster.

Behavioural fixes — what your process should do

Even with a great tool, the team’s habits decide whether disagreement actually surfaces.

  • Senior people vote last (or anonymously). Flip the authority bias. If your tool supports it, hide the names entirely.
  • Don’t aim for consensus, aim for understanding. The goal isn’t five people voting 5. It’s five people who can each independently explain why this story is a 5 — and who would notice if it became a 7.
  • The “anyone changing their vote?” pause. Before locking in, the host explicitly asks “based on the discussion, anyone changing their estimate?” with a 30-second silence. Saying yes has to be cheap.
  • Track estimate-vs-actual on a small sample. Every sprint, pick 3 stories and write down predicted vs actual effort. Don’t measure the team on accuracy — measure the team on variance. Wide variance with honest estimates is healthier than narrow variance with anchored ones.
  • Periodically rotate who votes first. Or use a tool that randomises card-reveal order to remove any sequence-based cues.

When convergence is genuinely good

Not every clustered vote is a failure. Three legitimate cases:

  • The team has done many similar stories. CRUD against a familiar codebase, dependency bumps, copy changes — the variance space is genuinely small, and shared past experience produces real convergence.
  • The story is genuinely tiny. A typo fix, a single-line bug, a config change. There’s not much room for opinions on size.
  • You’ve passed the calibration phase. Mature teams who’ve estimated together for years on stable codebases do converge — and they earn it by being right when surprised matters.

The test is whether the convergence is explained or defaulted. If you ask any team member “why isn’t this an 8?” and they can articulate the answer, you’re fine. If the answer is “because no one else voted 8”, you’re not.

Bottom line

Planning poker is not a polling tool. It’s a calibration tool. The whole point is to surface disagreement, talk about why people see the story differently, and converge — if at all — on shared understanding. If your team converges before the discussion happens, the tool has done nothing.

Pick a tool with server-side hidden votes, simultaneous reveal, and anonymous distribution. Train the team to argue from numbers, not from defaults. Run the private-vote diagnostic next sprint. Then watch what your real spread looks like — and use that as the starting point for the estimate that actually matters.


Last reviewed: 2026-05-16. Want to try this in practice? Use our free planning poker tool — real-time story point estimation for scrum teams, no signup needed.

Frequently asked questions

Is it OK if my team always votes the same number?

Only if the agreement was earned, not forced. If everyone understands the story the same way and has independent reasons for the same number, convergence is a healthy signal. If they're voting the same because the first vote anchored them or because they don't want to disagree publicly, you're not estimating — you're polling. Test it by having each person privately write their estimate before any discussion; if the private votes vary widely but the public vote converges, you have an anchoring problem.

How do I get my team to actually disagree in planning poker?

Three things, in order. (1) Use a tool that hides every vote until the host reveals — server-side, not client-side — so no one anchors on the first card flipped. (2) When the spread is small, force the high and low voter to talk before any re-vote; if the spread is zero, ask one person to argue the opposite side for 30 seconds. (3) Have the most senior person vote last or anonymously. The goal is to surface disagreement that's already in the room, not to manufacture artificial conflict.

What story-point scale prevents anchoring?

No scale prevents anchoring on its own — it's a process problem more than a scale problem. That said, narrower scales (T-shirt sizes XS-XL, or Fibonacci capped at 8) tend to surface disagreement faster because the difference between 'M' and 'L' is meaningful in a way the difference between '5' and '8' often isn't. Pick a scale your team can defend the differences in. If no one can articulate why a 5 is different from an 8, the granularity is fake.

Related posts