Decomposition
When a user asks a question there is no guarantee that the relevant results can be returned with a single query. Sometimes to answer a question we need to split it into distinct sub-questions, retrieve results for each sub-question, and then answer using the cumulative context.
For example if a user asks: “How is Web Voyager different from reflection agents”, and we have one document that explains Web Voyager and one that explains reflection agents but no document that compares the two, then we’d likely get better results by retrieving for both “What is Web Voyager” and “What are reflection agents” and combining the retrieved documents than by retrieving based on the user question directly.
This process of splitting an input into multiple distinct sub-queries is what we refer to as query decomposition. It is also sometimes referred to as sub-query generation. In this guide we’ll walk through an example of how to do decomposition, using our example of a Q&A bot over the LangChain YouTube videos from the Quickstart.
Setup
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/core zod uuid
yarn add @langchain/core zod uuid
pnpm add @langchain/core zod uuid
Set environment variables
# Optional, use LangSmith for best-in-class observability
LANGSMITH_API_KEY=your-api-key
LANGCHAIN_TRACING_V2=true
Query generation
To convert user questions to a list of sub questions we’ll use a LLM function-calling API, which can return multiple functions each turn:
Pick your chat model:
- OpenAI
- Anthropic
- FireworksAI
- MistralAI
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/openai 
yarn add @langchain/openai 
pnpm add @langchain/openai 
Add environment variables
OPENAI_API_KEY=your-api-key
Instantiate the model
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({
  model: "gpt-3.5-turbo-0125",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/anthropic 
yarn add @langchain/anthropic 
pnpm add @langchain/anthropic 
Add environment variables
ANTHROPIC_API_KEY=your-api-key
Instantiate the model
import { ChatAnthropic } from "@langchain/anthropic";
const llm = new ChatAnthropic({
  model: "claude-3-sonnet-20240229",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/community 
yarn add @langchain/community 
pnpm add @langchain/community 
Add environment variables
FIREWORKS_API_KEY=your-api-key
Instantiate the model
import { ChatFireworks } from "@langchain/community/chat_models/fireworks";
const llm = new ChatFireworks({
  model: "accounts/fireworks/models/firefunction-v1",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/mistralai 
yarn add @langchain/mistralai 
pnpm add @langchain/mistralai 
Add environment variables
MISTRAL_API_KEY=your-api-key
Instantiate the model
import { ChatMistralAI } from "@langchain/mistralai";
const llm = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});
import { z } from "zod";
const subQuerySchema = z
  .object({
    subQuery: z.array(
      z.string().describe("A very specific query against the database")
    ),
  })
  .describe(
    "Search over a database of tutorial videos about a software library"
  );
import {
  ChatPromptTemplate,
  MessagesPlaceholder,
} from "@langchain/core/prompts";
const system = `You are an expert at converting user questions into database queries.
You have access to a database of tutorial videos about a software library for building LLM-powered applications.
Perform query decomposition. Given a user question, break it down into distinct sub questions that
you need to answer in order to answer the original question.
If there are acronyms or words you are not familiar with, do not try to rephrase them.
If the query is already well formed, do not try to decompose it further.`;
const prompt = ChatPromptTemplate.fromMessages([
  ["system", system],
  new MessagesPlaceholder({
    variableName: "examples",
    optional: true,
  }),
  ["human", "{question}"],
]);
const llmWithTools = llm.withStructuredOutput(subQuerySchema, {
  name: "SubQuery",
});
const queryAnalyzer = prompt.pipe(llmWithTools);
Let’s try it out with a simple question:
await queryAnalyzer.invoke({ question: "how to do rag" });
{ subQuery: [ "How to do rag" ] }
Now with two slightly more involved questions:
await queryAnalyzer.invoke({
  question:
    "how to use multi-modal models in a chain and turn chain into a rest api",
});
{
  subQuery: [
    "How to use multi-modal models in a chain",
    "How to turn a chain into a REST API"
  ]
}
await queryAnalyzer.invoke({
  question:
    "what's the difference between web voyager and reflection agents? do they use langgraph?",
});
{
  subQuery: [
    "Difference between Web Voyager and Reflection Agents",
    "Do Web Voyager and Reflection Agents use LangGraph?"
  ]
}
Adding examples and tuning the prompt
This works pretty well, but we probably want it to decompose the last question even further to separate the queries about Web Voyager and Reflection Agents. If we aren’t sure up front what types of queries will do best with our index, we can also intentionally include some redundancy in our queries, so that we return both sub queries and higher level queries.
To tune our query generation results, we can add some examples of inputs questions and gold standard output queries to our prompt. We can also try to improve our system message.
const examples: Array<Record<string, any>> = [];
const question = "What's chat langchain, is it a langchain template?";
const query = {
  query: "What's chat langchain, is it a langchain template?",
  subQueries: [
    "What is chat langchain",
    "Is chat langchain a langchain template",
  ],
};
examples.push({ input: question, toolCalls: [query] });
1
const question = "How would I use LangGraph to build an automaton";
const query = {
  query: "How would I use LangGraph to build an automaton",
  subQueries: ["How to build automaton with LangGraph"],
};
examples.push({ input: question, toolCalls: [query] });
2
const question =
  "How to build multi-agent system and stream intermediate steps from it";
const query = {
  query:
    "How to build multi-agent system and stream intermediate steps from it",
  subQueries: [
    "How to build multi-agent system",
    "How to stream intermediate steps",
    "How to stream intermediate steps from multi-agent system",
  ],
};
examples.push({ input: question, toolCalls: [query] });
3
const question =
  "What's the difference between LangChain agents and LangGraph?";
const query = {
  query: "What's the difference between LangChain agents and LangGraph?",
  subQueries: [
    "What's the difference between LangChain agents and LangGraph?",
    "What are LangChain agents",
    "What is LangGraph",
  ],
};
examples.push({ input: question, toolCalls: [query] });
4
Now we need to update our prompt template and chain so that the examples
are included in each prompt. Since we’re working with LLM model
function-calling, we’ll need to do a bit of extra structuring to send
example inputs and outputs to the model. We’ll create a
toolExampleToMessages helper function to handle this for us:
import { v4 as uuidV4 } from "uuid";
import {
  AIMessage,
  BaseMessage,
  HumanMessage,
  SystemMessage,
  ToolMessage,
} from "@langchain/core/messages";
const toolExampleToMessages = (
  example: Record<string, any>
): Array<BaseMessage> => {
  const messages: Array<BaseMessage> = [
    new HumanMessage({ content: example.input }),
  ];
  const openaiToolCalls = example.toolCalls.map((toolCall) => {
    return {
      id: uuidV4(),
      type: "function" as const,
      function: {
        name: "SubQuery",
        arguments: JSON.stringify(toolCall),
      },
    };
  });
  messages.push(
    new AIMessage({
      content: "",
      additional_kwargs: { tool_calls: openaiToolCalls },
    })
  );
  const toolOutputs =
    "toolOutputs" in example
      ? example.toolOutputs
      : Array(openaiToolCalls.length).fill(
          "This is an example of a correct usage of this tool. Make sure to continue using the tool this way."
        );
  toolOutputs.forEach((output, index) => {
    messages.push(
      new ToolMessage({
        content: output,
        tool_call_id: openaiToolCalls[index].id,
      })
    );
  });
  return messages;
};
const exampleMessages = examples.map((ex) => toolExampleToMessages(ex)).flat();
import { MessagesPlaceholder } from "@langchain/core/prompts";
import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";
const system = `You are an expert at converting user questions into database queries.
You have access to a database of tutorial videos about a software library for building LLM-powered applications.
Perform query decomposition. Given a user question, break it down into the most specific sub questions you can
which will help you answer the original question. Each sub question should be about a single concept/fact/idea.
If there are acronyms or words you are not familiar with, do not try to rephrase them.`;
const prompt = ChatPromptTemplate.fromMessages([
  ["system", system],
  new MessagesPlaceholder({ variableName: "examples", optional: true }),
  ["human", "{question}"],
]);
const queryAnalyzerWithExamples = RunnableSequence.from([
  {
    question: new RunnablePassthrough(),
    examples: () => exampleMessages,
  },
  prompt,
  llmWithTools,
]);
await queryAnalyzerWithExamples.invoke(
  "what's the difference between web voyager and reflection agents? do they use langgraph?"
);
{
  query: "what's the difference between web voyager and reflection agents? do they use langgraph?",
  subQueries: [
    "What's the difference between web voyager and reflection agents",
    "Do web voyager and reflection agents use LangGraph"
  ]
}