BonVivant Blog

LLM-as-Judge with Feedback Loops: How We Got an 85% Rejection Rate Down to 38%

Building a self-correcting editorial pipeline for venue content at BonVivant.

·June 15, 2026·7 min read

## The Problem We run an 8-stage AI enrichment pipeline that processes venue data into editorial content. Stage 03 (Writer) uses Claude Sonnet to generate long-form venue descriptions. Stage 07 (Judg...

Axis	Threshold	Weight	What it measures
Factual	0.85	35%	Does the editorial contradict Google data?
Voice	0.80	25%	Does it match the neighborhood voice profile?
Differentiation	0.80	25%	Could this body be swapped with a competitor's?
SEO	0.60	15%	Uniqueness from Google blurb, keyword richness

Axis

Threshold

Weight

What it measures

Factual

0.85

35%

Does the editorial contradict Google data?

Voice

0.80

25%

Does it match the neighborhood voice profile?

Differentiation

0.80

25%

Could this body be swapped with a competitor's?

SEO

0.60

15%

Uniqueness from Google blurb, keyword richness

Metric	Before	After
Pass rate (Little Italy, n=20)	10% (2/20)	37.5% (6/16)
Voice avg score (rejected)	0.59	0.64
Primary failure axis	Voice (100%)	Voice (still, but improving)

Metric

Before

After

Pass rate (Little Italy, n=20)

10% (2/20)

37.5% (6/16)

Voice avg score (rejected)

0.59

0.64

Primary failure axis

Voice (100%)

Voice (still, but improving)

Axis	Threshold	Weight	What it measures
Factual	0.85	35%	Does the editorial contradict Google data?
Voice	0.80	25%	Does it match the neighborhood voice profile?
Differentiation	0.80	25%	Could this body be swapped with a competitor's?
SEO	0.60	15%	Uniqueness from Google blurb, keyword richness

Axis

Threshold

Weight

What it measures

Factual

0.85

35%

Does the editorial contradict Google data?

Voice

0.80

25%

Does it match the neighborhood voice profile?

Differentiation

0.80

25%

Could this body be swapped with a competitor's?

SEO

0.60

15%

Uniqueness from Google blurb, keyword richness

Metric	Before	After
Pass rate (Little Italy, n=20)	10% (2/20)	37.5% (6/16)
Voice avg score (rejected)	0.59	0.64
Primary failure axis	Voice (100%)	Voice (still, but improving)

Metric

Before

After

Pass rate (Little Italy, n=20)

10% (2/20)

37.5% (6/16)

Voice avg score (rejected)

0.59

0.64

Primary failure axis

Voice (100%)

Voice (still, but improving)

LLM-as-Judge with Feedback Loops: How We Got an 85% Rejection Rate Down to 38%

LLM-as-Judge with Feedback Loops: How We Got an 85% Rejection Rate Down to 38%

The Problem

The Architecture

Stage 02: Classifier

Stage 03: Writer

Stage 07: Judge

Diagnosing the 85% Reject Rate

Issue 1: Voice Profile Mismatch

Issue 2: Differentiation Without Context

The Fix: Three Changes

1. Voice as Primary Instruction

2. Non-Dominant Cuisine Guidance

3. Longer Competitor Context

The Feedback Loop

Results

What We Learned

Architecture Diagram

Stack

The Problem

The Architecture

Stage 02: Classifier

Stage 03: Writer

Stage 07: Judge

Diagnosing the 85% Reject Rate

Issue 1: Voice Profile Mismatch

Issue 2: Differentiation Without Context

The Fix: Three Changes

1. Voice as Primary Instruction

2. Non-Dominant Cuisine Guidance

3. Longer Competitor Context

The Feedback Loop

Results

What We Learned

Architecture Diagram

Stack