Blog

What we're learning while building Weco

An AI Agent Became the #1 Contributor in OpenAI's Hiring Challenge
Research·10 min read

June 3, 2026

An AI Agent Became the #1 Contributor in OpenAI's Hiring Challenge

Aiden spent 22 days inside OpenAI's Parameter Golf and became the competition's most influential contributor by records, citations, and public signal quality.

Weco Team
Weco Team
SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents
Research·10 min read

May 21, 2026

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

We built SpecBench to measure the gap between code that passes visible tests and code that actually satisfies the specification. Across 30 systems-level tasks, that gap is real, measurable, and grows with task complexity.

Bingchen Zhao
Dhruv Srikanth
Yuxiang Wu
Zhengyao Jiang
Bingchen Zhao, Dhruv Srikanth, Yuxiang Wu, Zhengyao Jiang
AutoResearch vs Classical Hyperparameter Tuning
Research·6 min read

April 2, 2026

AutoResearch vs Classical Hyperparameter Tuning

We ran a head-to-head comparison on NanoChat: Optuna vs AutoResearch. The agent wins on sample efficiency, cost efficiency, and generalization.

Zhengyao Jiang
Bingchen Zhao
Zhengyao Jiang, Bingchen Zhao
Weco Now Integrates with LangSmith
Product·3 min read

March 9, 2026

Weco Now Integrates with LangSmith

Point Weco at your LangSmith datasets and evaluators. It handles the search.

Weco Team
Weco Team
How to Build a Good Eval
Research·6 min read

February 19, 2026

How to Build a Good Eval

Eval is not a tool you install. It's a way of thinking from experimental science. Learn what makes a good benchmark and how to avoid common pitfalls.

Zhengyao Jiang
Zhengyao Jiang
Evolving a Flappy Bird AI: 100 Solutions Explored, 6.6x Improvement, $12
Case Study·5 min read

February 7, 2026

Evolving a Flappy Bird AI: 100 Solutions Explored, 6.6x Improvement, $12

We gave Weco a simple Flappy Bird AI that scored 2.76 points on average. 100 iterations and $12 later, the AI scored 20.9 - a 6.6x improvement.

Weco Team
Weco Team
Weco Is Now in Public Beta
Product·4 min read

November 5, 2025

Weco Is Now in Public Beta

Hard work scales linearly. Automation scales exponentially.

Weco Team
Weco Team
Weco Raises $8M to Build Self-Evolving Software
Announcement·4 min read

July 30, 2025

Weco Raises $8M to Build Self-Evolving Software

Most AI tools can draft code that runs. None can deliver expert‑grade solutions that push the frontier.

Zhengyao Jiang
Yuxiang Wu
Zhengyao Jiang, Yuxiang Wu
AIDE: Human-Level Performance on Data Science Competitions
Research·5 min read

April 4, 2024

AIDE: Human-Level Performance on Data Science Competitions

In the world of data science, Kaggle competitions have become a widely accepted standard...

Dominik Schmidt
Zhengyao Jiang
Yuxiang Wu
Dominik Schmidt, Zhengyao Jiang, Yuxiang Wu
The Future of Machine Learning Research
Research·3 min read

January 22, 2024

The Future of Machine Learning Research

Research is a cornerstone in the quest to understand the world and tap into its economic values...

Zhengyao Jiang
Yuxiang Wu
Zhengyao Jiang, Yuxiang Wu

Follow Us

Join our community for real-time updates, insights, and discussions: