Context extraction from image files in Amazon Q Business using LLMs


To effectively convey complex information, organizations increasingly rely on visual documentation through diagrams, charts, and technical illustrations. Although text documents are well-integrated into modern knowledge management systems, rich information contained in diagrams, charts, technical schematics, and visual documentation often remains inaccessible to search and AI assistants. This creates significant gaps in organizational knowledge bases, leading to interpreting visual data manually and preventing automation systems from using critical visual information for comprehensive insights and decision-making. While Amazon Q Business already handles embedded images within documents, the custom document enrichment (CDE) feature extends these capabilities significantly by processing standalone image files (for example, JPGs and PNGs).

In this post, we look at a step-by-step implementation for using the CDE feature within an Amazon Q Business application. We walk you through an AWS Lambda function configured within CDE to process various image file types, and we showcase an example scenario of how this integration enhances the Amazon Q Business ability to provide comprehensive insights. By following this practical guide, you can significantly expand your organization’s searchable knowledge base, enabling more complete answers and insights that incorporate both textual and visual information sources.

Example scenario: Analyzing regional educational demographics

Consider a scenario where you’re working for a national educational consultancy that has charts, graphs, and demographic data across different AWS Regions stored in an Amazon Simple Storage Service (Amazon S3) bucket. The following image shows student distribution by age range across various cities using a bar chart. The insights in visualizations like this are valuable for decision-making but traditionally locked within image formats in your S3 buckets and other storage.

With Amazon Q Business and CDE, we show you how to enable natural language queries against such visualizations. For example, your team could ask questions such as “Which city has the highest number of students in the 13–15 age range?” or “Compare the student demographics between City 1 and City 4” directly through the Amazon Q Business application interface.

Distribution Chart

You can bridge this gap using the Amazon Q Business CDE feature to:

  1. Detect and process image files during the document ingestion process
  2. Use Amazon Bedrock with AWS Lambda to interpret the visual information
  3. Extract structured data and insights from charts and graphs
  4. Make this information searchable using natural language queries

Solution overview

In this solution, we walk you through how to implement a CDE-based solution for your educational demographic data visualizations. The solution empowers organizations to extract meaningful information from image files using the CDE capability of Amazon Q Business. When Amazon Q Business encounters the S3 path during ingestion, CDE rules automatically trigger a Lambda function. The Lambda function identifies the image files and calls the Amazon Bedrock API, which uses multimodal large language models (LLMs) to analyze and extract contextual information from each image. The extracted text is then seamlessly integrated into the knowledge base in Amazon Q Business. End users can then quickly search for valuable data and insights from images based on their actual context. By bridging the gap between visual content and searchable text, this solution helps organizations unlock valuable insights previously hidden within their image repositories.

The following figure shows the high-level architecture diagram used for this solution.

Arch Diagram

For this use case, we use Amazon S3 as our data source. However, this same solution is adaptable to other data source types supported by Amazon Q Business, or it can be implemented with custom data sources as needed.To complete the solution, follow these high-level implementation steps:

  1. Create an Amazon Q Business application and sync with an S3 bucket.
  2. Configure the Amazon Q Business application CDE for the Amazon S3 data source.
  3. Extract context from the images.

Prerequisites

The following prerequisites are needed for implementation:

  1. An AWS account.
  2. At least one Amazon Q Business Pro user that has admin permissions to set up and configure Amazon Q Business. For pricing information, refer to Amazon Q Business pricing.
  3. AWS Identity and Access Management (IAM) permissions to create and manage IAM roles and policies.
  4. A supported data source to connect, such as an S3 bucket containing your public documents.
  5. Access to an Amazon Bedrock LLM in the required AWS Region.

Create an Amazon Q Business application and sync with an S3 bucket

To create an Amazon Q Business application and connect it to your S3 bucket, complete the following steps. These steps provide a general overview of how to create an Amazon Q Business application and synchronize it with an S3 bucket. For more comprehensive, step-by-step guidance, follow the detailed instructions in the blog post Discover insights from Amazon S3 with Amazon Q S3 connector.

  1. Initiate your application setup through either the AWS Management Console or AWS Command Line Interface (AWS CLI).
  2. Create an index for your Amazon Q Business application.
  3. Use the built-in Amazon S3 connector to link your application with documents stored in your organization’s S3 buckets.

Configure the Amazon Q Business application CDE for the Amazon S3 data source

With the CDE feature of Amazon Q Business, you can make the most of your Amazon S3 data sources by using the sophisticated capabilities to modify, enhance, and filter documents during the ingestion process, ultimately making enterprise content more discoverable and valuable. When connecting Amazon Q Business to S3 repositories, you can use CDE to seamlessly transform your raw data, applying modifications that significantly improve search quality and information accessibility. This powerful functionality extends to extracting context from binary files such as images through integration with Amazon Bedrock services, enabling organizations to unlock insights from previously inaccessible content formats. By implementing CDE for Amazon S3 data sources, businesses can maximize the utility of their enterprise data within Amazon Q, creating a more comprehensive and intelligent knowledge base that responds effectively to user queries.To configure the Amazon Q Business application CDE for the Amazon S3 data source, complete the following steps:

  1. Select your application and navigate to Data sources.
  2. Choose your existing Amazon S3 data source or create a new one. Verify that Audio/Video under Multi-media content configuration is not enabled.
  3. In the data source configuration, locate the Custom Document Enrichment section.
  4. Configure the pre-extraction rules to trigger a Lambda function when specific S3 bucket conditions are satisfied. Check the following screenshot for an example configuration.

Reference Settings
Pre-extraction rules are executed before Amazon Q Business processes files from your S3 bucket.

Extract context from the images

To extract insights from an image file, the Lambda function makes an Amazon Bedrock API call using Anthropic’s Claude 3.7 Sonnet model. You can modify the code to use other Amazon Bedrock models based on your use case.

Constructing the prompt is a critical piece of the code. We recommend trying various prompts to get the desired output for your use case. Amazon Bedrock offers the capability to optimize a prompt that you can use to enhance your use case specific input.

Examine the following Lambda function code snippets, written in Python, to understand the Amazon Bedrock model setup along with a sample prompt to extract insights from an image.

In the following code snippet, we start by importing relevant Python libraries, define constants, and initialize AWS SDK for Python (Boto3) clients for Amazon S3 and Amazon Bedrock runtime. For more information, refer to the Boto3 documentation.

import boto3
import logging
import json
from typing import List, Dict, Any
from botocore.config import Config

MODEL_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"
MAX_TOKENS = 2000
MAX_RETRIES = 2
FILE_FORMATS = ("jpg", "jpeg", "png")

logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client('s3')
bedrock = boto3.client('bedrock-runtime', config=Config(read_timeout=3600, region_name="us-east-1"))

The prompt passed to the Amazon Bedrock model, Anthropic’s Claude 3.7 Sonnet in this case, is broken into two parts: prompt_prefix and prompt_suffix. The prompt breakdown makes it more readable and manageable. Additionally, the Amazon Bedrock prompt caching feature can be used to reduce response latency as well as input token cost. You can modify the prompt to extract information based on your specific use case as needed.

prompt_prefix = """You are an expert image reader tasked with generating detailed descriptions for various """
"""types of images. These images may include technical diagrams,"""
""" graphs and charts, categorization diagrams, data flow and process flow diagrams,"""
""" hierarchical and timeline diagrams, infographics, """
"""screenshots and product diagrams/images from user manuals. """
""" The description of these images needs to be very detailed so that user can ask """
""" questions based on the image, which can be answered by only looking at the descriptions """
""" that you generate.
Here is the image you need to analyze:


"""

prompt_suffix = """


Please follow these steps to analyze the image and generate a comprehensive description:

1. Image type: Classify the image as one of technical diagrams, graphs and charts, categorization diagrams, data flow and process flow diagrams, hierarchical and timeline diagrams, infographics, screenshots and product diagrams/images from user manuals. The description of these images needs to be very detailed so that user can ask questions based on the image, which can be answered by only looking at the descriptions that you generate or other.

2. Items:
   Carefully examine the image and extract all entities, texts, and numbers present. List these elements in  tags.

3. Detailed Description:
   Using the information from the previous steps, provide a detailed description of the image. This should include the type of diagram or chart, its main purpose, and how the various elements interact or relate to each other.  Capture all the crucial details that can be used to answer any followup questions. Write this description in  tags.

4. Data Estimation (for charts and graphs only):
   If the image is a chart or graph, capture the data in the image in CSV format to be able to recreate the image from the data. Ensure your response captures all relevant details from the chart that might be necessary to answer any follow up questions from the chart.
   If exact values cannot be inferred, provide an estimated range for each value in  tags.
   If no data is present, respond with "No data found".

Present your analysis in the following format:



[Classify the image type here]



[List all extracted entities, texts, and numbers here]



[Provide a detailed description of the image here]



[If applicable, provide estimated number ranges for chart elements here]



Remember to be thorough and precise in your analysis. If you're unsure about any aspect of the image, state your uncertainty clearly in the relevant section.
"""

The lambda_handler is the main entry point for the Lambda function. While invoking this Lambda function, the CDE passes the data source’s information within event object input. In this case, the S3 bucket and the S3 object key are retrieved from the event object along with the file format. Further processing of the input happens only if the file_format matches the expected file types. For production ready code, implement proper error handling for unexpected errors.

def lambda_handler(event, context):
    logger.info("Received event: %s" % json.dumps(event))
    s3Bucket = event.get("s3Bucket")
    s3ObjectKey = event.get("s3ObjectKey")
    metadata = event.get("metadata")
    file_format = s3ObjectKey.lower().split('.')[-1]
    new_key = 'cde_output/' + s3ObjectKey + '.txt'
    if (file_format in FILE_FORMATS):
        afterCDE = generate_image_description(s3Bucket, s3ObjectKey, file_format)
        s3.put_object(Bucket = s3Bucket, Key = new_key, Body=afterCDE)
    return {
        "version" : "v0",
        "s3ObjectKey": new_key,
        "metadataUpdates": []
    }

The generate_image_description function calls two other functions: first to construct the message that is passed to the Amazon Bedrock model and second to invoke the model. It returns the final text output extracted from the image file by the model invocation.

def generate_image_description(s3Bucket: str, s3ObjectKey: str, file_format: str) -> str:
    """
    Generate a description for an image.
    Inputs:
        image_file: str - Path to the image file
    Output:
        str - Generated image description
    """
    messages = _llm_input(s3Bucket, s3ObjectKey, file_format)
    response = _invoke_model(messages)
    return response['output']['message']['content'][0]['text']

The _llm_input function takes in the S3 object’s details passed as input along with the file type (png, jpg) and builds the message in the format expected by the model invoked by Amazon Bedrock.

def _llm_input(s3Bucket: str, s3ObjectKey: str, file_format: str) -> List[Dict[str, Any]]:
    s3_response = s3.get_object(Bucket = s3Bucket, Key = s3ObjectKey)
    image_content = s3_response['Body'].read()
    message = {
        "role": "user",
        "content": [
            {"text": prompt_prefix},
            {
                "image": {
                    "format": file_format,
                    "source": {
                        "bytes": image_content
                    }
                }
            },
            {"text": prompt_suffix}
        ]
    }
    return [message]

The _invoke_model function calls the converse API using the Amazon Bedrock runtime client. This API returns the response generated by the model. The values within inferenceConfig settings for maxTokens and temperature are used to limit the length of the response and make the responses more deterministic (less random) respectively.

def _invoke_model(messages: List[Dict[str, Any]]) -> Dict[str, Any]:
    """
    Call the Bedrock model with retry logic.
    Input:
        messages: List[Dict[str, Any]] - Prepared messages for the model
    Output:
        Dict[str, Any] - Model response
    """
    for attempt in range(MAX_RETRIES):
        try:
            response = bedrock.converse(
                modelId=MODEL_ID,
                messages=messages,
                inferenceConfig={
                    "maxTokens": MAX_TOKENS,
                    "temperature": 0,
                }
            )
            return response
        except Exception as e:
            print(e)
    
    raise Exception(f"Failed to call model after {MAX_RETRIES} attempts")

Putting all the preceding code pieces together, the full Lambda function code is shown in the following block:

# Example Lambda function for image processing
import boto3
import logging
import json
from typing import List, Dict, Any
from botocore.config import Config

MODEL_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"
MAX_TOKENS = 2000
MAX_RETRIES = 2
FILE_FORMATS = ("jpg", "jpeg", "png")

logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client('s3')
bedrock = boto3.client('bedrock-runtime', config=Config(read_timeout=3600, region_name="us-east-1"))

prompt_prefix = """You are an expert image reader tasked with generating detailed descriptions for various """
"""types of images. These images may include technical diagrams,"""
""" graphs and charts, categorization diagrams, data flow and process flow diagrams,"""
""" hierarchical and timeline diagrams, infographics, """
"""screenshots and product diagrams/images from user manuals. """
""" The description of these images needs to be very detailed so that user can ask """
""" questions based on the image, which can be answered by only looking at the descriptions """
""" that you generate.
Here is the image you need to analyze:


"""

prompt_suffix = """


Please follow these steps to analyze the image and generate a comprehensive description:

1. Image type: Classify the image as one of technical diagrams, graphs and charts, categorization diagrams, data flow and process flow diagrams, hierarchical and timeline diagrams, infographics, screenshots and product diagrams/images from user manuals. The description of these images needs to be very detailed so that user can ask questions based on the image, which can be answered by only looking at the descriptions that you generate or other.

2. Items:
   Carefully examine the image and extract all entities, texts, and numbers present. List these elements in  tags.

3. Detailed Description:
   Using the information from the previous steps, provide a detailed description of the image. This should include the type of diagram or chart, its main purpose, and how the various elements interact or relate to each other.  Capture all the crucial details that can be used to answer any followup questions. Write this description in  tags.

4. Data Estimation (for charts and graphs only):
   If the image is a chart or graph, capture the data in the image in CSV format to be able to recreate the image from the data. Ensure your response captures all relevant details from the chart that might be necessary to answer any follow up questions from the chart.
   If exact values cannot be inferred, provide an estimated range for each value in  tags.
   If no data is present, respond with "No data found".

Present your analysis in the following format:



[Classify the image type here]



[List all extracted entities, texts, and numbers here]



[Provide a detailed description of the image here]



[If applicable, provide estimated number ranges for chart elements here]



Remember to be thorough and precise in your analysis. If you're unsure about any aspect of the image, state your uncertainty clearly in the relevant section.
"""

def _llm_input(s3Bucket: str, s3ObjectKey: str, file_format: str) -> List[Dict[str, Any]]:
    s3_response = s3.get_object(Bucket = s3Bucket, Key = s3ObjectKey)
    image_content = s3_response['Body'].read()
    message = {
        "role": "user",
        "content": [
            {"text": prompt_prefix},
            {
                "image": {
                    "format": file_format,
                    "source": {
                        "bytes": image_content
                    }
                }
            },
            {"text": prompt_suffix}
        ]
    }
    return [message]

def _invoke_model(messages: List[Dict[str, Any]]) -> Dict[str, Any]:
    """
    Call the Bedrock model with retry logic.
    Input:
        messages: List[Dict[str, Any]] - Prepared messages for the model
    Output:
        Dict[str, Any] - Model response
    """
    for attempt in range(MAX_RETRIES):
        try:
            response = bedrock.converse(
                modelId=MODEL_ID,
                messages=messages,
                inferenceConfig={
                    "maxTokens": MAX_TOKENS,
                    "temperature": 0,
                }
            )
            return response
        except Exception as e:
            print(e)
    
    raise Exception(f"Failed to call model after {MAX_RETRIES} attempts")

def generate_image_description(s3Bucket: str, s3ObjectKey: str, file_format: str) -> str:
    """
    Generate a description for an image.
    Inputs:
        image_file: str - Path to the image file
    Output:
        str - Generated image description
    """
    messages = _llm_input(s3Bucket, s3ObjectKey, file_format)
    response = _invoke_model(messages)
    return response['output']['message']['content'][0]['text']

def lambda_handler(event, context):
    logger.info("Received event: %s" % json.dumps(event))
    s3Bucket = event.get("s3Bucket")
    s3ObjectKey = event.get("s3ObjectKey")
    metadata = event.get("metadata")
    file_format = s3ObjectKey.lower().split('.')[-1]
    new_key = 'cde_output/' + s3ObjectKey + '.txt'
    if (file_format in FILE_FORMATS):
        afterCDE = generate_image_description(s3Bucket, s3ObjectKey, file_format)
        s3.put_object(Bucket = s3Bucket, Key = new_key, Body=afterCDE)
    return {
        "version" : "v0",
        "s3ObjectKey": new_key,
        "metadataUpdates": []
    }

We strongly recommend testing and validating code in a nonproduction environment before deploying it to production. In addition to Amazon Q pricing, this solution will incur charges for AWS Lambda and Amazon Bedrock. For more information, refer to AWS Lambda pricing and Amazon Bedrock pricing.

After the Amazon S3 data is synced with the Amazon Q index, you can prompt the Amazon Q Business application to get the extracted insights as shown in the following section.

Example prompts and results

The following question and answer pairs refer the Student Age Distribution graph at the beginning of this post.

Q: Which City has the highest number of students in the 13-15 age range?

Natural Language Query Response

Q: Compare the student demographics between City 1 and City 4?

Natural Language Query Response

In the original graph, the bars representing student counts lacked explicit numerical labels, which could make data interpretation challenging on a scale. However, with Amazon Q Business and its integration capabilities, this limitation can be overcome. By using Amazon Q Business to process these visualizations with Amazon Bedrock LLMs using the CDE feature, we’ve enabled a more interactive and insightful analysis experience. The service effectively extracts the contextual information embedded in the graph, even when explicit labels are absent. This powerful combination means that end users can ask questions about the visualization and receive responses based on the underlying data. Rather than being limited by what’s explicitly labeled in the graph, users can now explore deeper insights through natural language queries. This capability demonstrates how Amazon Q Business transforms static visualizations into queryable knowledge assets, enhancing the value of your existing data visualizations without requiring additional formatting or preparation work.

Best practices for Amazon S3 CDE configuration

When setting up CDE for your Amazon S3 data source, consider these best practices:

  • Use conditional rules to only process specific file types that need transformation.
  • Monitor Lambda execution with Amazon CloudWatch to track processing errors and performance.
  • Set appropriate timeout values for your Lambda functions, especially when processing large files.
  • Consider incremental syncing to process only new or modified documents in your S3 bucket.
  • Use document attributes to track which documents have been processed by CDE.

Cleanup

Complete the following steps to clean up your resources:

  1. Go to the Amazon Q Business application and select Remove and unsubscribe for users and groups.
  2. Delete the Amazon Q Business application.
  3. Delete the Lambda function.
  4. Empty and delete the S3 bucket. For instructions, refer to Deleting a general purpose bucket.

Conclusion

This solution demonstrates how combining Amazon Q Business, custom document enrichment, and Amazon Bedrock can transform static visualizations into queryable knowledge assets, significantly enhancing the value of existing data visualizations without additional formatting work. By using these powerful AWS services together, organizations can bridge the gap between visual information and actionable insights, enabling users to interact with different file types in more intuitive ways.

Explore What is Amazon Q Business? and Getting started with Amazon Bedrock in the documentation to implement this solution for your specific use cases and unlock the potential of your visual data.

About the Authors


About the authors

Amit Chaudhary Amit Chaudhary is a Senior Solutions Architect at Amazon Web Services. His focus area is AI/ML, and he helps customers with generative AI, large language models, and prompt engineering. Outside of work, Amit enjoys spending time with his family.

Nikhil Jha Nikhil Jha is a Senior Technical Account Manager at Amazon Web Services. His focus areas include AI/ML, building Generative AI resources, and analytics. In his spare time, he enjoys exploring the outdoors with his family.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *