Playing with Claude 3.5 Sonnet for code generation and image alt-text descriptions

Playing with Claude 3.5 Sonnet for code generation and image alt-text descriptions

I had a little time to relax over the weekend, so I decided to try a small coding project with Anthropic Claude 3.5 Sonnet. I wanted to test the new 3.5 Sonnet model and also explore the artifacts feature. Claude 3.5 Sonnet is very good at describing images that contain technical diagrams and screenshots, so I decided to try writting some code to auto-generate alt-text for images in any HTML file. Here's the thing -- I've never written Python code. I'm more of a Java and Node kind of guy, so by choosing Python, I was really testing my ability to generate code from scratch using some simple prompting.

Below is my chat with Claude. Although I'm not an expert at this type of prompting yet, I was able to hack my way through it. After most prompts, Claude created new code for me that was replaced after the subsequent prompts, so I omitted the early Claude responses to make this easier to read. I started by testing if I could get it to describe local images of varying formats (JPG, HEIC, PNG). Once this was successful, I expanded the scope to include parsing the images from HTML, injecting new ALT attributes, and more. This entire project took me only 20 minutes which I found super exciting. Even if I wrote in a programming language I'm intimately familiar with, I could have never completed this as fast because I'd have to figure out to handle PNGs, JPGs, HEIC, the Anthropic APIs, and more.


Greg: Are you familiar with the Claude API?

Claude 3.5 Sonnet: Yes, I am familiar with the Claude API. It allows developers to integrate Anthropic's language models, including versions of me, into their applications and services. The API provides access to capabilities like natural language understanding, text generation, and task completion. However, I don't have detailed knowledge about its exact specifications, pricing, or implementation details. For the most up-to-date and accurate information about the Claude API, I'd recommend checking Anthropic's official documentation and resources.

Greg: I have a list of image URLs. I need a 125 char description of each. Write python code to do this with Claude 3.5 Sonnet.
(Claude output omitted for brevity)

Greg: I will provide the list of image URLs in a file named images.csv
(Claude output omitted for brevity)

Greg: print each url as you read the file
(Claude output omitted for brevity)

Greg: Getting an error - 'url' found using 'type' does not match any of the expected tags: 'base64'"

Claude 3.5 Sonnet: I apologize for the confusion. It seems the error is related to how we're passing the image URL to the Claude API. The API is expecting a base64-encoded image rather than a URL. Let's modify the script to handle this correctly. We'll need to download the image and convert it to base64 before sending it to the API. Here's an updated version of the script that should resolve this issue:

Greg: This only handles jpg. I need it to also handle png
(Claude output omitted for brevity)

Greg: Output the results to a new csv file. The columns of the csv file should be (1) image url and (2) image description.
(Claude output omitted for brevity)

Greg: add gif support including animated gif
(Claude output omitted for brevity)

(I had to install PIL)

Greg: Can you add webp support?
(Claude output omitted for brevity)

Greg: I want the option to pass a parameter to the program. The parameter will be a URL of a webpage. Read the web page and use all of the image URLs in the page instead of the input file currently used
(Claude output omitted for brevity)

(I had to install BeautifulSoup)

Greg: Modify the program to output the same HTML that it reads but insert each of the image descriptions as the "alt" of the image tag.
(Claude output omitted for brevity)

Greg: write the output to stdout in addition to the file
(Claude output omitted for brevity)

Greg: Don't process img tags that already have the alt attribute
(Claude output omitted for brevity)

Greg: Change the program to take a single parameter that is an HTML file. The output will continue to be stdout.

Resulting program (untouched by me):

import anthropic
import base64
import argparse
import sys
from io import BytesIO
from urllib.parse import urljoin
from PIL import Image
from bs4 import BeautifulSoup
import os
import requests

# Initialize the Anthropic client
client = anthropic.Anthropic(api_key="your_api_key_here")

def download_image(url):
    response = requests.get(url)
    return BytesIO(response.content)

def image_to_base64(image):
    buffered = BytesIO()
    image.save(buffered, format="PNG")
    return base64.b64encode(buffered.getvalue()).decode('utf-8')

def get_image_type(path):
    ext = os.path.splitext(path)[1].lower()
    if ext == '.png':
        return 'image/png'
    elif ext in ['.jpg', '.jpeg']:
        return 'image/jpeg'
    elif ext == '.gif':
        return 'image/gif'
    elif ext == '.webp':
        return 'image/webp'
    else:
        return 'image/jpeg'  # Default to JPEG if unable to determine

def process_gif(gif_image):
    if gif_image.is_animated:
        gif_image.seek(0)
    return gif_image.convert('RGB')

def get_image_description(image_path):
    try:
        media_type = get_image_type(image_path)

        with Image.open(image_path) as img:
            if media_type == 'image/gif':
                img = process_gif(img)
            elif media_type == 'image/webp' and hasattr(img, 'is_animated') and img.is_animated:
                img.seek(0)

            if img.mode != 'RGB':
                img = img.convert('RGB')

            base64_image = image_to_base64(img)

        message = client.messages.create(
            model="claude-3-sonnet-20240229",
            max_tokens=125,
            temperature=0.7,
            system="You are a helpful assistant that provides concise image descriptions.",
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "image",
                            "source": {
                                "type": "base64",
                                "media_type": "image/png",
                                "data": base64_image
                            }
                        },
                        {
                            "type": "text",
                            "text": "Describe this image in exactly 125 characters. If it's an animated image, describe the first frame."
                        }
                    ]
                }
            ]
        )

        return message.content[0].text
    except Exception as e:
        return f"Error processing image: {str(e)}"

def process_html(html_file):
    try:
        with open(html_file, 'r', encoding='utf-8') as file:
            soup = BeautifulSoup(file, 'html.parser')

        base_dir = os.path.dirname(os.path.abspath(html_file))
        img_tags = soup.find_all('img')

        for img in img_tags:
            if img.get('alt'):
                print(f"Skipping image with existing alt text: {img.get('src')}", file=sys.stderr)
                continue

            img_src = img.get('src')
            if img_src:
                img_path = os.path.join(base_dir, img_src)
                if os.path.exists(img_path):
                    print(f"Processing image: {img_path}", file=sys.stderr)
                    description = get_image_description(img_path)
                    img['alt'] = description
                    print(f"Added description: {description}", file=sys.stderr)
                else:
                    print(f"Image file not found: {img_path}", file=sys.stderr)

        return soup.prettify()
    except Exception as e:
        print(f"Error processing HTML: {str(e)}", file=sys.stderr)
        return None

def main():
    parser = argparse.ArgumentParser(description="Generate image descriptions and modify HTML.")
    parser.add_argument("html_file", help="Path to the HTML file to process")
    args = parser.parse_args()

    print(f"Processing HTML file: {args.html_file}", file=sys.stderr)
    modified_html = process_html(args.html_file)

    if modified_html:
        # Print to stdout
        print(modified_html)
    else:
        print("Failed to process the HTML file.", file=sys.stderr)

if __name__ == "__main__":
    main()