we·collaborate

Blog

What we're learning while building Weco

How Weco's Autoresearch Improved a Production Fraud Detection Pipeline

Case Study·5 min read

June 9, 2026

How Weco's Autoresearch Improved a Production Fraud Detection Pipeline

Vardera pointed Weco's autoresearch at their production fraud detection pipeline — F1 up 17.7%, precision up 18.9% — with every change shipping as auditable, version-controlled code.

Vayum Arora

An AI Agent Became the #1 Contributor in OpenAI's Hiring Challenge

Research·10 min read

June 3, 2026

An AI Agent Became the #1 Contributor in OpenAI's Hiring Challenge

Aiden spent 22 days inside OpenAI's Parameter Golf and became the competition's most influential contributor by records, citations, and public signal quality.

Weco Team

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

Research·10 min read

May 21, 2026

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

We built SpecBench to measure the gap between code that passes visible tests and code that actually satisfies the specification. Across 30 systems-level tasks, that gap is real, measurable, and grows with task complexity.

Bingchen Zhao

Dhruv Srikanth

Yuxiang Wu

Zhengyao Jiang

Bingchen Zhao, Dhruv Srikanth, Yuxiang Wu, Zhengyao Jiang

AutoResearch vs Classical Hyperparameter Tuning

Research·6 min read

April 2, 2026

AutoResearch vs Classical Hyperparameter Tuning

We ran a head-to-head comparison on NanoChat: Optuna vs AutoResearch. The agent wins on sample efficiency, cost efficiency, and generalization.

Zhengyao Jiang

Bingchen Zhao

Zhengyao Jiang, Bingchen Zhao

Weco Now Integrates with LangSmith

Product·3 min read

March 9, 2026

Weco Now Integrates with LangSmith

Point Weco at your LangSmith datasets and evaluators. It handles the search.

Weco Team

How to Build a Good Eval

Research·6 min read

February 19, 2026

How to Build a Good Eval

Eval is not a tool you install. It's a way of thinking from experimental science. Learn what makes a good benchmark and how to avoid common pitfalls.

Zhengyao Jiang

Evolving a Flappy Bird AI: 100 Solutions Explored, 6.6x Improvement, $12

Case Study·5 min read

February 7, 2026

Evolving a Flappy Bird AI: 100 Solutions Explored, 6.6x Improvement, $12

We gave Weco a simple Flappy Bird AI that scored 2.76 points on average. 100 iterations and $12 later, the AI scored 20.9 - a 6.6x improvement.

Weco Team

Weco Is Now in Public Beta

Product·4 min read

November 5, 2025

Weco Is Now in Public Beta

Hard work scales linearly. Automation scales exponentially.

Weco Team

Weco Raises $8M to Build Self-Evolving Software

Announcement·4 min read

July 30, 2025

Weco Raises $8M to Build Self-Evolving Software

Most AI tools can draft code that runs. None can deliver expert‑grade solutions that push the frontier.

Zhengyao Jiang

Yuxiang Wu

Zhengyao Jiang, Yuxiang Wu

AIDE: Human-Level Performance on Data Science Competitions

Research·5 min read

April 4, 2024

AIDE: Human-Level Performance on Data Science Competitions

In the world of data science, Kaggle competitions have become a widely accepted standard...

Dominik Schmidt

Zhengyao Jiang

Yuxiang Wu

Dominik Schmidt, Zhengyao Jiang, Yuxiang Wu

The Future of Machine Learning Research

Research·3 min read

January 22, 2024

The Future of Machine Learning Research

Research is a cornerstone in the quest to understand the world and tap into its economic values...

Zhengyao Jiang

Yuxiang Wu

Zhengyao Jiang, Yuxiang Wu

Follow Us

Join our community for real-time updates, insights, and discussions: