Evolving a Flappy Bird AI: 100 Solutions Explored, 6.6x Improvement, $12

February 7, 2026 • By Weco Team

We gave Weco a simple Flappy Bird AI that scored 2.76 points on average. 100 iterations and $12 later, the AI scored 20.9, a 6.6x improvement, with no human guidance beyond the initial setup. Here's how it works.

Try the interactive demo →

The Setup

Flappy Bird is a perfect testbed for code optimization. The game is simple (tap to flap, dodge pipes, don't die) but writing a controller that consistently scores high is surprisingly hard. Pipe gaps vary in height, the bird's physics create non-trivial dynamics (gravity, terminal velocity, flap impulse), and a single misjudgment means game over.

We wrote a baseline controller in Python: a should_flap(observation) function that takes the bird's position, velocity, and pipe locations as input and returns True or False. The baseline uses a simple strategy: compute a safe zone within the next pipe gap, flap if below it. It works, barely. Average score: 2.76 across 30 evaluation seeds.

Then we pointed Weco at it. After running pip install weco && weco setup, we opened Claude Code and typed:

/weco optimize the flappy bird controller

The Weco agent skill takes it from there: it figures out how to evaluate the code (average score over 30 game seeds), then kicks off the optimization run. No reward shaping, no architecture search, no hyperparameter tuning. Just "here's the code, make it better."

How Weco Works

Weco uses evaluation-driven code optimization, a tree-search algorithm that treats code as the search space and evaluation metrics as the objective. At each step, Weco:

Selects a promising parent solution from the search tree
Generates a mutated variant using an LLM
Evaluates the variant against the metric (here, average game score over 30 seeds)
Updates the tree with the result

This is different from single-shot code generation. Instead of asking an LLM to write the best controller it can in one try, Weco evolves the code over many iterations, keeping what works and discarding what doesn't. Each iteration builds on the best solution found so far.

The Search Tree

Over 100 iterations, Weco explored a tree of 101 solutions. Not all branches led somewhere useful. Some mutations broke the code entirely, others made the AI worse. But the successful mutations compounded.

Explore the full search tree interactively →

The best path through the tree passed through 11 nodes, with each breakthrough introducing a qualitatively different strategy:

Step 0 → 6 (score 3.2 → 4.1): Cleaned up the baseline, tightened margins
Step 6 → 18 (4.1 → 9.9): Added trajectory prediction, simulating the bird's future position before deciding to flap
Step 18 → 27 (9.9 → 16.9): Introduced look-ahead to the next pipe, not just the current one, blending targets based on distance
Step 27 → 84 (16.9 → 20.9): Refined adaptive margins, descent handling for downward pipe transitions, and velocity dampening

Score improvement chart showing optimization progress from 3.2 to 20.9 over 100 iterations

Optimization progress over 100 iterations. Green dots mark breakthroughs on the best path.

What Changed in the Code

The baseline controller was 71 lines (mostly comments explaining game physics). The optimized controller is 82 lines of pure logic. Here are the key differences:

1. Trajectory Prediction

The baseline just checks "am I below the target?", a reactive strategy. The optimized version simulates the bird's physics forward by 2–4 frames (depending on distance to the pipe) and acts on where the bird will be, not where it is now.

# Predict bird position
predicted_y = bird_y
predicted_v = bird_velocity
for _ in range(prediction_frames):
    predicted_v = min(predicted_v + GRAVITY, 10.0)
    predicted_y += predicted_v

2. Look-ahead Blending

Instead of only targeting the current pipe gap, the optimized AI blends its target between the current and next pipe. When far away, it weights the next pipe more heavily (60%), preparing for the transition. When close, it focuses entirely on the current pipe.

# Dynamic target blending based on distance
if next_pipe_dx > 150:
    target_gap_cy = 0.4 * next_pipe_gap_cy + 0.6 * after_pipe_gap_cy
elif next_pipe_dx > 80:
    blend = (next_pipe_dx - 80) / 70
    target_gap_cy = (1 - blend) * next_pipe_gap_cy + blend * after_pipe_gap_cy

3. Adaptive Margins

The baseline uses a fixed margin of 19 pixels. The optimized version tightens the margin when close to a pipe (12px) and widens it when far away (22px), giving the bird more room to maneuver when it has time, and precision when it needs it.

4. Descent Handling

When the next pipe gap is significantly lower than the current one, the optimized AI recognizes the downward transition and adjusts its velocity dampening thresholds. This prevents the common failure mode of overshooting upward when approaching a lower gap.

None of these strategies were specified in advance. Weco discovered them through iterative mutation and evaluation, the same way biological evolution discovers adaptations, but in minutes instead of millennia.

Results

	Baseline	Optimized
Average score (30 seeds)	2.76	20.9
Max score (best seed)	11	88
Controller lines	71 (mostly comments)	82 (pure logic)
Strategy	Reactive safe-zone targeting	Predictive + look-ahead + adaptive

Total cost: ~$12. Total time: ~30 minutes. The optimization ran 100 iterations, each calling an LLM to generate a code variant and then evaluating it against 30 game seeds.

Watch the baseline and optimized AI play side by side →

Try It Yourself

Weco isn't just for games. The same evaluation-driven optimization works on any code with a measurable metric: model training, data pipelines, API performance, prompt engineering.

pip install weco && weco setup

This installs a Weco agent skill into your coding agent (Claude Code, Cursor, or any MCP-compatible agent). Then just type /weco optimize my code and the agent walks you through everything, including building an eval if you don't have one yet.

You get $20 in free credits to start, or bring your own API key. Check out the documentation to get started.

we·collaborate

Evolving a Flappy Bird AI: 100 Solutions Explored, 6.6x Improvement, $12

The Setup

How Weco Works

The Search Tree

What Changed in the Code

1. Trajectory Prediction

2. Look-ahead Blending

3. Adaptive Margins

4. Descent Handling

Results

Try It Yourself

More from Weco AI:

Weco Is Now in Public Beta

Weco Raises $8M to Build Self-Evolving Software

AIDE: Human-Level Performance on Data Science Competitions

Follow Us