Weco Now Integrates with LangSmith

March 9, 2026 • By Weco Team

LangSmith is great for measuring agent quality. But once you have the evals in place, the actual improvement loop is still mostly manual.

Most teams using LangSmith already have the foundation for optimization: a dataset, a target function, and evaluators that score outputs. What still takes time is the part between evaluations. Someone reads failures, forms a hypothesis, edits a prompt or some code, runs evaluate() again, and repeats.

Weco now integrates directly with LangSmith to automate that loop. Point Weco at your existing dataset and evaluators, and it iteratively searches for better code and prompt changes between evaluations while tracking the best-performing version it finds.

Try it now The tutorial walks through the full flow end to end.

Key Takeaways

Weco now integrates directly with LangSmith, automating the improvement loop using your existing datasets and evaluators.
No need to rebuild your eval pipeline. Weco uses the same LangSmith setup you already trust.
The integration supports mixing local Python evaluators with LangSmith dashboard evaluators and optimizing on one data split while validating on another.
Weco iteratively modifies your source code, runs evaluations, compares scores, and keeps the best-performing version.

How It Works

Instead of writing a separate shell eval script or rebuilding your scoring logic somewhere else, you tell Weco which LangSmith dataset to evaluate against, which function to call, and which evaluators to use. Weco then modifies your source code, runs the LangSmith evaluation, compares scores, and keeps iterating.

The practical effect is that the LangSmith setup you already trust for measurement becomes the thing Weco optimizes against. You do not have to create a second evaluation pipeline just to get an optimization loop.

Why It Matters

Teams spend real time building evals: curating datasets, writing evaluators, and tuning judge prompts until the scores actually reflect quality. Once that work is done, the frustrating part is that improvement is still driven by manual trial and error.

This integration turns that existing investment into leverage. The same LangSmith setup you use to measure regressions and compare variants can now drive automated improvement runs too. That means less time translating eval feedback into edits by hand, and more time actually exploring the search space.

Key Capabilities

Use the eval stack you already have. Run Weco directly against a LangSmith dataset, target function, and evaluators instead of maintaining a separate eval command.
Combine code evaluators and LLM judges. Mix local Python evaluators with dashboard evaluators configured in LangSmith, then combine them with a custom metric function.
Optimize on one split, validate on another. Use dataset splits to optimize on an opt split and then run a separate holdout pass to check whether gains generalize.
Run from the CLI or a browser-based wizard. Power users can configure everything with flags, while new users can launch the setup wizard and wire up a run visually.
Handle async dashboard evaluators automatically. When LangSmith dashboard evaluators return scores asynchronously, Weco polls for results so you do not have to manage that part yourself.

Get Started

If you already have a LangSmith dataset and evaluators, you can connect Weco to that workflow today.

The full tutorial walks through a complete example, from dataset setup to optimization to holdout validation. If you want something concrete to start from, use the ZephHR LangSmith example.

If your target function calls an LLM provider directly, install the dependencies your agent already needs alongside Weco, then follow the tutorial to connect it to your LangSmith setup.

Already using LangSmith for evals? Read the tutorial and use that eval stack to drive optimization too. Questions or help adapting it to your workflow? Reach out at contact@weco.ai.

we·collaborate

Weco Now Integrates with LangSmith

Key Takeaways

How It Works

Why It Matters

Key Capabilities

Get Started

More from Weco AI:

How Vardera Used Weco to Push Their Models From 78% to 93%

An AI Agent Became the #1 Contributor in OpenAI's Hiring Challenge

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

Follow Us