Dec 12th, 2023: [EN] Retrieval Augmented Generation (RAG) for Improving Support

In previous articles on the Elastic blog and the December 7th Advent calendar, you have seen how to set up a simple semantic search to find topics without using the exact keyword. Elastic runs its own Support Hub on this approach, and we were delighted by the immediate increase in relevant results.

But what if the answer to a user's question requires multiple articles or steps? In the past, this would mean numerous search queries and reading through the returned articles to piece together what might be a straightforward piece of information. Thanks to the power of semantic search, we can ask questions in natural language instead of using queries and layer generative AI to extract the needed answer across multiple documents and answer the user back in natural language.

Known as Retrieval Augmented Generation (RAG), this methodology can be an efficiency game-changer for Support use cases. Customers or support agents no longer need to investigate multiple sources of knowledge to find their answers. Additionally, the answer for each user is tailored much more specifically to them. The benefits of RAG using Elastic compared to traditional keyword search are many but can be broken down into several areas.

  • Improved relevance. Retrieval augmented generation (RAG) uses a combination of keyword search and natural language processing (NLP) to understand the user's intent and provide more relevant results. This is in contrast to traditional keyword search, which matches keywords in the user's query to keywords in the documents.
  • Increased efficiency. RAG can help users find the information they need more quickly and efficiently. This is because RAG can generate results tailored to the user's specific needs rather than requiring the user to sift through many irrelevant results.
  • Enhanced user experience. RAG can provide a more natural and intuitive user experience. This is because RAG can generate results written in a natural language rather than the technical language often used in traditional keyword search results.
  • Scalability. RAG can be scaled to handle large amounts of data. This is important for businesses that have a large amount of content that they need to make searchable.
  • Flexibility. RAG can be customized to meet the specific needs of a business. This is because RAG can be trained on a particular dataset, and the results can be tailored to the specific needs of the users.

Elastic supports the RAG methodology out of the box and builds on the semantic search implementation. You can find a step-by-step guide here as part of the Search Labs portion of our website. Below is an example of a simplified payload format used for sending the chat to Google Vertex, asking for it to respond through the persona of a Support Agent.

The instances[].messages[] will contain the back-and-forth conversation between the user and the bot. The examples[] provides some references for how you would want the bot to respond (while empty in our example, this will be specific to your products), and the context is a single value that can be updated / different with each request. For instance – attach the RAG results into the context but do a new search based on each new message sent to Vertex.

{
  "instances": [
    {
      "context": "You are an expert Elastic Support Technician, and should respond to customers with detailed answers.",
      "examples": [
        {
          "input": {
            "author": "user",
            "content": "hello"
          },
          "output": {
            "author": "bot",
            "content": "hello real person, welcome to the world"
          }
        },
        {
          "input": {
            "author": "user",
            "content": "foo"
          },
          "output": {
            "author": "bot",
            "content": "foo is commonly used in programming as a placeholder value, eg. foo: bar"
          }
        }
      ],
      "messages": [
        {
          "author": "user",
          "content": "I need some help writing an Elasticsearch query that matches all documents in an index"
        },
        {
          "author": "bot",
          "content": "{ \"query\": { \"match_all\": {} } }",
          "citationMetadata": {
            "citations": []
          }
        },
        {
          "author": "user",
          "content": "Great, now I need to filter to a specific term, could you show me an example using a bool filter"
        }
      ]
    }
  ],
  "parameters": {
    "candidateCount": 1,
    "maxOutputTokens": 1024,
    "temperature": 0.2,
2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.