Agentic research assistant with guardrails

Agents
LangGraph
LangChain
LLMOps
A project template for an LLM agent workflow with task routing, tool use, evaluation, and human oversight.
Published

April 26, 2026

System Goal

Design an AI research assistant that can decompose a request, retrieve relevant information, call tools, draft an answer, evaluate its own output, and hand off uncertain cases for human review.

Architecture

Possible implementation paths:

  • LangGraph for stateful agent workflows.
  • LangChain for tool calling and retrieval components.
  • CrewAI or Microsoft AutoGen for multi-agent coordination experiments.
  • A judge or evaluator node for groundedness, completeness, and risk checks.
  • Trace logging with LangSmith or Langfuse.

Reliability Questions

  • Which tasks can the agent complete autonomously?
  • Which tasks require human confirmation?
  • What tool calls are allowed?
  • How is the system evaluated before release?
  • How are hallucinations, stale context, and prompt regressions detected?

Evaluation Plan

Create an evaluation set with:

  • Simple requests.
  • Ambiguous requests.
  • Retrieval-heavy requests.
  • Tool-use requests.
  • Adversarial or misleading requests.
  • Expected refusal or escalation cases.

Notebook Plan

  • notebooks/agentic-research-assistant/01-workflow-design.ipynb
  • notebooks/agentic-research-assistant/02-tool-calling.ipynb
  • notebooks/agentic-research-assistant/03-evaluation-set.ipynb
  • notebooks/agentic-research-assistant/04-tracing-and-regression-tests.ipynb