What we're learning while building Weco
June 9, 2026
Vardera pointed Weco's autoresearch at their production fraud detection pipeline — F1 up 17.7%, precision up 18.9% — with every change shipping as auditable, version-controlled code.
June 3, 2026
Aiden spent 22 days inside OpenAI's Parameter Golf and became the competition's most influential contributor by records, citations, and public signal quality.
May 21, 2026
We built SpecBench to measure the gap between code that passes visible tests and code that actually satisfies the specification. Across 30 systems-level tasks, that gap is real, measurable, and grows with task complexity.
April 2, 2026
We ran a head-to-head comparison on NanoChat: Optuna vs AutoResearch. The agent wins on sample efficiency, cost efficiency, and generalization.
March 9, 2026
Point Weco at your LangSmith datasets and evaluators. It handles the search.
February 19, 2026
Eval is not a tool you install. It's a way of thinking from experimental science. Learn what makes a good benchmark and how to avoid common pitfalls.
February 7, 2026
We gave Weco a simple Flappy Bird AI that scored 2.76 points on average. 100 iterations and $12 later, the AI scored 20.9 - a 6.6x improvement.
November 5, 2025
Hard work scales linearly. Automation scales exponentially.
July 30, 2025
Most AI tools can draft code that runs. None can deliver expert‑grade solutions that push the frontier.
April 4, 2024
In the world of data science, Kaggle competitions have become a widely accepted standard...
January 22, 2024
Research is a cornerstone in the quest to understand the world and tap into its economic values...
Join our community for real-time updates, insights, and discussions: