Category: Uncategorized

  • The Next Million Rails Apps

    TL;DR

    • The cost to develop software has collapsed over the last 50 years — and demand has exploded.
    • LLMs now let non-engineers build working software, but most AI builders break down during real project maintenance.
    • Rails is uniquely suited for AI-driven development because it gives LLMs a deterministic, opinionated structure — the one thing AI agents need to reliably modify and maintain code.
    • The remaining barrier is accessibility: Rails has historically been hard for non-technical people to install, run, or even start.
    • Leonardo removes that barrier by giving users a zero-setup, browser-based Rails environment powered by an AI operator-agent — no terminal, no VSCode, no local environment required.
    • Leonardo combines the accessibility of vibe-coding tools (Lovable/Bolt/Replit) with the power, stability, and maintainability of Rails.
    • This is how the next 1,000,000 Rails apps will get created, by people who deeply understand real-world problems, not just engineers.

    My goal: help accelerate the next 1 million next Rails apps be created, from non-traditional developers using LLMs.

    Introduction: The cost to develop software has steadily decreased

    Since 1970, the cost to develop software has plummeted. Higher level languages, better tooling, and sharing of knowledge has led to higher quality software that is more affordable.

    Open Source frameworks & development infrastructure decrease the amount of time and effort to create new custom software.

    It’s hard to objectively measure the cost to develop software. Counting the lines of code is misleading, because good engineers actually often write less code, and also comparing across programming languages and frameworks is misleading.

    For example, to print Hello World in C is more lines of code than to print Hello World in a higher level language like Python. Here’s a good illustrative example on reading errors from a file.

    High Level (Python):

    Straightforward, expressive, hides the ugly parts:

    count = 0
    with open("log.txt") as f:
        for line in f:
            if "error" in line:
                count += 1
    
    print(count)
    

    Lines of code: 6
    Memory management: Automatic
    I/O abstraction: Built-in
    String ops: Built-in
    Error handling: Could be added, but not required for basic functionality
    Developer mental load: Minimal


    Low Level (C):

    You handle your own memory, buffers, file pointers, comparisons, and cleanup:

    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    int main() {
        FILE *file = fopen("log.txt", "r");
        if (file == NULL) {
            perror("Error opening file");
            return 1;
        }
    
        char *line = NULL;
        size_t len = 0;
        ssize_t read;
        int count = 0;
    
        while ((read = getline(&line, &len, file)) != -1) {
            if (strstr(line, "error") != NULL) {
                count++;
            }
        }
    
        printf("%d\n", count);
    
        free(line);
        fclose(file);
    
        return 0;
    }
    

    Lines of code: ~35
    Memory management: Manual (free, malloc via getline)
    I/O: Manual (fopen, getline, fclose)
    String ops: Manual (strstr)
    Error handling: Required
    Developer mental load: High — and mistakes cause security bugs

    ____

    While this example illustrates how software has gotten easier, this also helps illustrate why lines of code as an objective metric is misleading.

    To solve this problem, an engineer from IBM Allan Albrecht, invented function point analysis.

    Albrecht’s insight was to measure the size of the problem being solved, not the amount of code used to solve it.

    So he created the now-standard 5 categories:

    1. External Inputs
    2. External Outputs
    3. External Inquiries
    4. Internal Logical Files
    5. External Interface Files

    Weighted → summed → Function Points.

    Software companies have also tracked & published data around their own function points, so we have some reference as to how this has evolved over time. While there’s no comprehensive data source created, we can still derive a rough approximation of cost per function point over time, since the 70s.

    Sources: Capers Jones, The Economics of Software Quality, and ISBSG / Czarnacka-Chrobot (case study using ISBSG “Software Project Costs”).

    It’s safe to claim that since the 1970s, the cost has dropped approximately by a factor of 3, despite growing costs of labor for software engineers.

    These productivity gains come from better tooling, open source projects, online knowledge sharing, and the rise of higher level programming languages like Python, Ruby, and JavaScript.

    As is true for any technology: as the cost decreases, the demand increases.

    Demand for New Software is Still Growing.

    The world has a plethora of problems that can still be addressed by new software. But the cost to develop new software has always been a bottleneck, because engineers are expensive, and code is complicated.

    With the costs to develop new software decreasing, along with new AI capabilities such as AI vision, LLMs, and generative AI, we can now build software that was previously unimaginable, for much cheaper than ever before in history.

    Increasingly, software isn’t being created by just professional developers. A new class is emerging, known as “citizen developers.” This class of people do not have Computer Science degrees, and they rely increasingly on other tools to generate code for them.

    Before LLMs, a plethora of no-code and low-code tools like Bubble, Make, and even open source projects like WordPress made building new software accessible. But as AI scaled, LLMs have become extremely good at writing code.

    LLMs help non-engineers build software.

    This has led to an explosion in popularity of “vibe coding” tools, such as Lovable.dev, Bolt.new, and Replit. You can simply type your idea into an AI chat box, and have a fully functioning software prototype in a matter of seconds.

    But LLMs writing code isn’t all sunshine and roses, there are difficult problems that arise, such as:

    Expensive Token Costs & High Usage

    Bugs are Inevitable and Hard for LLMs to Fix

    Security and Auth

    Scalability and Performance

    User Experience and Learning Curve

    Integration and Export

    Community and Support

    But despite these problems, millions of “citizen developers” are continuing to adopt and pay for AI generated coding tools. As AI models become increasingly affordable and capable, these problems will become easier to solve.

    Another way we can reduce these errors is by helping the LLM out, the same way we’d help a Jr. Software Engineer out: give it the right tools. That’s where a framework like Ruby on Rails comes into play.

    My Thesis: Rails is the perfect framework for non-engineers to build software using LLMs

    There are very strong, structural advantages of Ruby on Rails, as explained by it’s creator David Heinemeier Hansson:

    It brings us to the ideal of The One Person Framework. A toolkit so powerful that it allows a single individual to create modern applications upon which they might build a competitive business. The way it used to be.

    There’s so much to learn these days, if you want to be an expert in all the latest tools and techniques. The conventional path [to software engineering], as paved by solutions extracted from giant tech companies, is a journey akin to The Oregon Trail. You might well die of dysentery before you ever get to your destination!

    Rails 7 seeks to be the wormhole that folds the time-learning-shipping-continuum, and allows you to travel grand distances without knowing all the physics of interstellar travel. Giving the individual rebel a fighting chance against The Empire.

    You simply can’t play by the same rules against an opponent exponentially stronger than you.

    The key engine powering this assault is conceptual compression. Like a video codec that throws away irrelevant details such that you might download the film in real-time rather than buffer for an hour.

    Source: The One Person Framework, David Heinemeier Hansson, Creator of Rails

    Rails + LLMs Are A Perfect “Unfair Leverage Stack”.

    If a non-engineer is going to build real software with an AI assistant, Rails is simply the highest-probability path to success.

    Why?

    1. Rails has batteries-included: migrations, ActiveRecord, jobs, mailers, caching, etc.
    2. Rails already embodies 20 years of engineering best practices, so the LLM doesn’t have to reinvent fundamentals.
    3. Convention over configuration massively reduces hallucination surface area.
    4. Rails gives LLMs a predictable file structure, which is critical for deterministic code modifications.
    5. Rails’ monolith avoids the fragility of microservice-style code generation.
    6. Rails robust ecosystem of Gems such as Devise for security/auth patterns reduce catastrophic LLM mistakes, and provide powerful, working code, on day 1.

    Non-engineers using LLMs do not want:

    • Modern JS dependency churn
    • 18 dependencies for basic CRUD
    • Vite → Next → Bun → Node → pnpm → esbuild → whatever

    They want:

    • One framework
    • One mental model
    • One cohesive toolkit

    The Rails superpower is not just technical — it’s psychological. It removes 80% of the cognitive load that overwhelms not only engineers, but also LLMs.

    This is a truth nobody in the JavaScript hype cycle wants to admit, and it’s exactly why Rails is superior for AI coding agents.

    The Real Problem Isn’t Code Generation — It’s Project Lifecycle

    Existing vibe coding tools can generate code, but cannot maintain code.

    That’s their fatal flaw.

    LLMs fall apart when:

    • File structures change unpredictably
    • Auth patterns drift
    • API shape diverges
    • The app becomes a tangle of idiosyncratic user prompts

    Rails solves this by giving the LLM:

    • Deterministic MVC structure
    • Deterministic naming conventions
    • Deterministic routing
    • Deterministic file placement

    LLMs thrive when the environment is deterministic. Rails is deterministic.

    But determinism alone isn’t enough.

    Ruby on Rails has always had one major weakness: it’s hard to start. Rails is unbelievably productive once you’re inside it, but getting there traditionally requires terminals, version managers, local databases, and dev environments that instantly overwhelm non-technical people.

    This is the real reason browser-based AI coding tools exploded.
    Not because their stacks are better, but because they’re accessible.

    Cursor and Claude Code are incredible tools, but most people don’t have VSCode installed. Most have never touched a terminal. They see a blinking prompt and shut their laptop.

    This is where tools like Lovable.dev, Bolt.new, and Replit actually shine: they’re frictionless.

    You click a button and you’re building software in the browser.
    No installs. No setup. No environment debugging.
    My 63-year-old mother could start a project that way.

    Rails doesn’t have this yet.

    If Rails is going to empower a new class of builders — if the next million Rails apps will be created by people outside traditional engineering — then Rails needs something radically simpler:

    1. A brand-new Rails app that spins up instantly in the browser
    2. Zero installation or local environment
    3. A simple chat interface where they never have to view or edit the code manually

    You need an AI agent that can operate Rails (run it, modify it, migrate it, refactor it, etc.) inside a stable environment where the user never touches the terminal.

    This is exactly what Leonardo provides.

    Leonardo runs a full Rails app in the cloud, accessible entirely through the browser. It gives users a chat interface to create and modify features without ever installing Ruby, opening VSCode, or configuring PostgreSQL.

    And because Rails is a monolith (concise, high-level, migration-aware, schema-aware), Leonardo’s agent can maintain the app over time without collapsing it. It doesn’t rely on Supabase or an external backend. It doesn’t generate sprawling TypeScript/React projects that explode token usage. Rails is far more token efficient and far more structurally predictable for an LLM.

    We also did something fundamentally different with pricing:
    users pay exactly the LLM token cost, no markup.

    Or they can bring their own API keys entirely.

    This makes Leonardo dramatically more cost-effective than Lovable, Bolt, or Replit. But, more importantly, Rails + Leonardo produces software that is maintainable, stable, and evolvable.

    Leonardo gives you the accessibility of browser-based AI builders…
    with the power, determinism, and maintainability of Ruby on Rails.

    This is the missing layer the ecosystem needed.

    For the first time in history, the bottleneck to building software is no longer technical. It’s imaginative. The world is full of people who understand business problems deeply, but cannot code. If we give them a stable, powerful, opinionated framework, we unlock a new class of builders!

    The next million Rails apps will not come from Silicon Valley dev teams. They will come from accountants automating workflows, operations leads replacing spreadsheets, founders validating ideas, and domain experts who’ve never touched a terminal.

    If you’re a founder, operator, freelancer, or domain expert with a problem you wish were automated, then build it with Rails and an AI partner. Leonardo is the first step.

    Use Leonardo here: https://llamapress.ai

  • System Prompt Caching Allows Broke Founder to Pay for Groceries

    Last night, I noticed these emails where Anthropic was charging my credit card for $13-$15, multiple times a day.

    When I dug into this more, I saw that yesterday I had 47M inputs tokens for Claude!

    And it cost me ~$50. This was shocking because I recently switched to Claude Haiku 4.5 to avoid this precise issue.

    Leonardo is a hungry beast for tokens.

    I recently gave Leonardo the ability to write & run Rspec tests. And these tests take a lot of input tokens, especially if it’s dumping all your SQL transaction queries into the LLM context window. 🤷

    But beyond turning off verbose SQL statement logging to reduce the amount of input tokens, another thing stood out to me on Anthropic’s pricing page.

    Cached tokens are only $0.10 per million tokens! That’s 1/10th the cost. That sounds amazing to me, so I decided to implement it into Leonardo’s test building mode.

    But first, let’s just discuss costs and do an experiment.

    Notice how many tokens this uses just to ask it to explore my project and tell me about it.

    Claude Haiku ingested 139K of tokens. (The system prompt alone is 16K tokens).

    BUT, notice it had 8 turns! Meaning, this system prompt got fed back into Claude Haiku 8 times! 16K * 8 = 128K tokens, just on ingesting the system prompt alone!

    That single turn cost 15 cents. And as the conversation drags on, the cost skyrockets, because at each message, it continuously ingests the entire conversation, including all the previous tokens.

    If this was Claude Sonnet, it’d be even more expensive. This 139K tokens would cost 45 cents. But just wait, it gets worse.

    I then asked Leonardo to run Rspec, and the cost jumped up again. Now with 340,000 tokens, it cost 36 cents for this additional message from the user.

    For Claude Sonnet, this would have been $1.08.

    And now, this is where my budget for groceries really gets murdered: asking Leonardo to implement unit tests where there’s gaps in coverage!

    I asked Leonardo to implement new tests to boost test coverage. And Leonardo sent over 50!! messages, and implemented a handful of new tests using RSpec.

    This run cost $2.48.

    With Claude Sonnet, this would have been approximately $7.44.

    Now, you can see how spending 12 hours on this yesterday, why I ended up with a bill of $47, (even when using the affordable Claude Haiku). This makes both me and my girlfriend sad. (Sorry babe, we have to cancel our plans to see Wicked 2, Leonardo ate all our money).

    But.. I don’t want to overly complain, because this is still pretty awesome.

    Even if it’s expensive, Leonardo generated 1,244 new lines of high quality Ruby/RSpec code that tests our Ruby on Rails project. It boosted test coverage and wrote over 80 new tests.

    To do this, it ingested thousands of lines of code to understand the project well enough. I mean, just look at this beautiful Ruby code.

    Pre LLMs, assume a Jr. Dev spent a full working day doing this. If you’re paying said Jr. Dev a 75K/year salary, we’re looking at ~$322.95 for that full days worth of work.

    So even if we used Claude Sonnet, and it cost $10 for this, that’s an order of magnitude lower on cost, and multiple orders of magnitude lower on time efficiency.

    That being said, I’m a broke founder.

    So if I’m choosing whether to pay API credits over paying my groceries (not hyperbole if this is $50/day in token costs), then I’m going with groceries so I don’t starve.

    Luckily, I can have my LLM tokens and eat them too, by implementing something called prompt caching.

    The cost savings are dramatic, let me show you.

    The main code change was to switch off LangChain’s SystemMessage object, and use good ol’ JSON.

    At first, Claude Code tried to add in additional_kwargs={"cache_control": {"type": "ephemeral"}} as an input argument to the SystemMessage constructor, but I wasn’t able to get it to trigger output.

    In LangChain’s documentation on Anthropic Caching, they actually have the example where they just use a JSON object for the system message, so that’s what I decided to go with.

    Source: https://reference.langchain.com/python/integrations/langchain_anthropic/ChatAnthropic/#langchain_anthropic.chat_models.ChatAnthropic

    from langchain_anthropic import ChatAnthropic
    
    model = ChatAnthropic(model="claude-sonnet-4-5-20250929")
    
    messages = [
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "Below is some long context:",
                },
                {
                    "type": "text",
                    "text": f"{long_text}",
                    "cache_control": {"type": "ephemeral"},
                },
            ],
        },
        {
            "role": "user",
            "content": "What's that about?",
        },
    ]
    
    response = model.invoke(messages)
    response.usage_metadata["input_token_details"]

    As a reminder, here’s the costs without caching:

    And here’s the exact same experiment ran, but with caching enabled:

    By just caching the system prompt, I cut the cost down to almost half of what it was previously!

    What’s awesome too, is that I’m only caching the system prompt.

    If I were properly caching the entire thread of messages, the cost would come down even more.

    But, since I’m a lazy founder just trying to ship capability increases (such as making Leonardo 10x smarter for coding Ruby on Rails projects), I’ll hold off on more LLM caching work until I get my next crazy high bill from Anthropic. 🙂

    In summary, if you’re a broke founder worried about the costs of groceries, yet you’re hopelessly addicted to using Claude to write code, and especially if you’re writing your own system prompts, consider implementing Anthropic’s prompt caching mechanism.

    With ❤️,

    • Kody

  • How to debug langgraph/agents/leo/nodes.py in a Leonardo project

    How to debug langgraph/agents/leo/nodes.py

    1.⁠ ⁠Run docker compose down
    2.⁠ ⁠⁠In docker-compose-dev.yml, comment out ⁠ command: bash -c “python init_pg_checkpointer.py && uvicorn main:app –host 0.0.0.0 –port 8000″ ⁠ and uncomment ⁠ # command: tail -f /dev/null ⁠.
    3.⁠ ⁠Run bash bin/dev again. This time, the LlamaBot container will start, but Uvicorn (FastAPI) won’t run, so you won’t be able to access localhost:8000 yet.
    4.⁠ ⁠⁠⁠Run this ⁠ docker compose exec -it llamabot uvicorn main:app –host 0.0.0.0 –port 8000 ⁠ to manuallys tart Uvicorn (FastAPI). You should now be able to access localhost:8000
    5.⁠ ⁠⁠Now, you can add ⁠ breakpoint() ⁠ in langgraph/agents/leo/nodes.py, like so:

    1. ⁠⁠ # Node
      def leo(state: LlamaPressState):
      breakpoint() # add this line to test if this is being loaded correctly, and that we hit the breakpoint.
      llm = ChatOpenAI(model=”gpt-4.1″)
      llm_with_tools = llm.bind_tools(tools) custom_prompt_instructions_from_llamapress_dev = state.get(“agent_prompt”)
      full_sys_msg = SystemMessage(content=f”””{sys_msg} Here are additional instructions provided by the developer: {custom_prompt_instructions_from_llamapress_dev} “””) return {“messages”: [llm_with_tools.invoke([full_sys_msg] + state[“messages”])]} ⁠
  • How to Deploy Leonardo (AI Full Stack Coding Agent) from LocalHost to Production Using AWS Lightsail + Route53

    Leonardo is our coding agent that builds Ruby on Rails applications, and a built in AI agent orchestration runtime that’s written in Python and LangGraph.

    So in parts, this is a guide on deploying Ruby on Rails applications, but also deploying LangGraph agents.

    To do this effectively, we use Docker Containers.

    Docker is an amazing tool I knew almost nothing about 3 months ago, and yet by using ChatGPT, I’ve learned how powerful and amazing Docker is.

    In general the LlamaPress tech stack (including LlamaBot, our LangGraph agent runtime, and Leonardo, our actual coding agent), allows us to develop powerful AI applications with powerful agent experiences inside the application.

    We get the benefit of these three powerful open source frameworks:

    • Ruby on Rails for it’s powerful full stack scaffolding features & rapid web application development cycles.
    • LangGraph for it’s powerful agent orchestration framework.
    • Docker for it’s powerful architecture-agnostic dev setup + project deployments. Perfect for going from a working prototype running on localhost to production

    To deploy a Leonardo Application to production from localhost, I recommend taking the following approach:

    Initial Installation of LlamaBot & Leonardo

    aws configure # set up aws cli on your machine
    git clone https://github.com/KodyKendall/LlamaBot
    cd LlamaBot
    bash bin/deploy_llamabot_on_aws.sh

    This bin/deploy_llamabot_on_aws.sh script does the following:

    1. Collects important information for setting up your AWS Lightsail Instance.

    
    read -p "Name of instance: " INSTANCE
    read -p "Path to identity file: (defaults to ~/.ssh/LightsailDefaultKey-us-east-2.pem)" IDENTITY_FILE
    export INSTANCE
    export REGION=us-east-2
    export AZ=${REGION}a
    export BLUEPRINT=ubuntu_24_04
    export BUNDLE=small_2_0
    export IDENTITY_FILE=${IDENTITY_FILE:-~/.ssh/LightsailDefaultKey-us-east-2.pem}

    Type your instance name (no spaces allowed). In our case I’m naming it: “HistoryEducation”

    2. Launches an AWS LightSail Instance ($12/mo.)

    
    aws lightsail create-instances \
      --instance-names "$INSTANCE" \
      --availability-zone "$AZ" \
      --blueprint-id "$BLUEPRINT" \
      --bundle-id "$BUNDLE" \
      --region "$REGION"
    
    IPADDRESS=$(aws lightsail get-instance \
                  --instance-name "$INSTANCE" \
                  --region "$REGION" \
                  --query 'instance.publicIpAddress' \
                  --output text)
    
    echo $IPADDRESS
    
    cat >> ~/.ssh/config <<EOF
    Host $INSTANCE
            HostName $IPADDRESS
            User ubuntu
            IdentityFile $IDENTITY_FILE
            IdentitiesOnly yes
    EOF

    3. Sets up DNS records through Route 53.

    
    export DOMAIN=llamapress.ai.   
    export ZONE_ID=$(aws route53 list-hosted-zones-by-name \
      --dns-name "$DOMAIN" --query 'HostedZones[0].Id' --output text | sed 's|/hostedzone/||')
    echo $ZONE_ID
    
    TARGET_FQDN=$INSTANCE.llamapress.ai.
    RAILS_TARGET_FQDN=rails-$TARGET_FQDN
    
    cat > new-a-record.json <<EOF
    {
      "Comment": "Add A records for $TARGET_FQDN for LlamaBot Agent Deploy",
      "Changes": [
        {
          "Action": "UPSERT",
          "ResourceRecordSet": {
            "Name": "${TARGET_FQDN}",
            "Type": "A",
            "TTL": 60,
            "ResourceRecords": [
              { "Value": "${IPADDRESS}" }
            ]
          }
        },
        {
          "Action": "UPSERT",
          "ResourceRecordSet": {
            "Name": "${RAILS_TARGET_FQDN}",
            "Type": "A",
            "TTL": 60,
            "ResourceRecords": [
              { "Value": "${IPADDRESS}" }
            ]
          }
        }
      ]
    }
    EOF
    
    aws route53 change-resource-record-sets \
      --hosted-zone-id "$ZONE_ID" \
      --change-batch file://new-a-record.json

    4. Opens up Port 443 for HTTPS access

    echo "Instance created! Now, waiting to open port 443..."
    sleep 20
    
    # Open port 443: 
    aws lightsail open-instance-public-ports \
      --instance-name "$INSTANCE" \
      --port-info fromPort=443,toPort=443,protocol=TCP \
      --region "$REGION"
    
    #Check port is open on instance
    aws lightsail get-instance-port-states \
      --instance-name "$INSTANCE" \
      --region "$REGION" \
      --query 'portStates[?fromPort==`443`]'
    

    5. Allows you to SSH into instance directly, to install LlamaBot on your production Ubuntu server

    echo "Instance is ready to be used! type command ssh $INSTANCE to connect to it, then paste the following command to install the agent: "
    echo "curl -fsSL "https://raw.githubusercontent.com/KodyKendall/LlamaBot/refs/heads/main/bin/install_llamabot_prod.sh" -o install_llamabot_prod.sh && bash install_llamabot_prod.sh"
    
    ssh $INSTANCE

    After you you’ve done this, you should be able to ssh into the server.

    Step 6. SSH into your LlamaBot & Leonardo Instance, and run the install script.

    Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
    applicable law.
    
    To run a command as administrator (user "root"), use "sudo <command>".
    See "man sudo_root" for details.
    
    ubuntu@ip-172-26-14-43:~$ 

    Paste in the following command:

    curl -fsSL "https://raw.githubusercontent.com/KodyKendall/LlamaBot/refs/heads/main/bin/install_llamabot_prod.sh" -o install_llamabot_prod.sh && bash install_llamabot_prod.sh

    You’ll see this:

       _      _                           _          _   
      | |    | |  __ _  _ __  __    __ _ | | _  ___ | |_ 
      | |    | | / _` || '_ \ _ \  / _` ||  _ \/ _ \| __|
      | |___ | || (_| || | | | | || (_| || |_)  |_| | |_ 
      |_____||_| \__,_||_| |_| |_| \__,_||____/\___/ \__|
                                                               
                       (\     (\
                      (  )   (  )
                     (   )___(  )
             ______ /           )
            {                   )
             |                 ))   L L A M A B O T   I N S T A L L E R
              \______         ))    LangGraph + Rails + Docker + Caddy
                 (           ))
                 |           ))     LlamaBot (LangGraph) • LlamaPress (Rails)
                 |           ))
                 |           ))     v0.2.6
                 |           ))
        
    
    → Kickstarting setup... (press Ctrl+C to abort)
    
    🦙🤖 Paste your OpenAI API Key: 
    

    Paste in your OpenAI API Key, and hit enter.

    You’ll see a request for putting in your hosted domain.

    
    🦙🤖 Paste your OpenAI API Key: sk-proj-*******
    
    🌐 Enter your hosted domain (e.g., example.com): 
    

    This domain format must match this: <INSTANCE>.<AWS_ROUTE_53_DOMAIN>

    Make the first part of the domain match EXACTLY what you put as the INSTANCE type in the install script, in my case it’s HistoryEducation as the INSTANCE>

    Then, in the second part of the domain, put in your actual domain name that’s configured in AWS Route 53. So in my case it’s llamapress.ai in AWS_ROUTE_53_DOMAIN.

    So, this means I’m pasting in this:

    HistoryEducation.llamapress.ai

    Which ends up looking like this:

    🌐 Enter your hosted domain (e.g., example.com): HistoryEducation.llamapress.ai

    Now, the following things will get installed on your Ubuntu 24 server automatically.

    1. Docker
    2. Caddy
    3. Github CLI
    4. LlamaBot & Leonardo.

    This should take approximately 5 minutes or less.

    You should see this if it succeeded.

    🎉 Leonardo is deployed!

    Now, you should be able to navigate to your URL. In this case, it should be:

    https://HistoryEducation.llamapress.ai

    Sign into Leonardo

    The default username is: kody

    The default password is: kody

    You should now be able to see the Leonardo interface.

    Get your Leonardo Instance by Authenticating with Github, and adding an origin.

    Back in your ssh terminal:

    gh auth login
    
    > Github.com
    > HTTPS
    > Y
    > Login with a web browser
    
    ! First copy your one-time code: C881-1E51
    Press Enter to open github.com in your browser...
    

    Copy the code.

    Go to https://github.com/login/device

    Paste the code.

    Continue -> Authorize.

    You may need to install the Github mobile app and give it an access code.

    cd llamapress
    git init
    git remote add origin <your leonardo app url>
    

    in my case it’s:

    git remote add origin https://github.com/History-Education-Foundation/StoryBook
    

    Then, git fetch, checkout main, and run docker compose up -d:

    git fetch
    git checkout main
    docker compose up -d 

    If you get messages like:

    ? Volume "llamapress_rails_storage" exists but doesn't match configuration in compose file. Recreate (data will be lost)? Yes
    

    Then always select “yes”.

  • 🚀 Customizing Leonardo with New Gems & Ruby on Rails Extensions

    By default, Leonardo runs your Rails app (llamapress-simple) as a Docker image baked into the system. To add new Gems or deeper Rails customization, you’ll need to:

    1. Clone & modify the Rails app
    2. Build & push a new Docker image
    3. Update Leonardo’s Docker Compose config to use your new image
    4. Restart services inside your Leonardo instance

    All changes will persist across EC2 stops/starts because the instance root volume lives on EBS.


    1. Clone the Rails app locally

    git clone https://github.com/KodyKendall/LlamaPress-Simple
    cd LlamaPress-Simple
    

    2. Add new gems

    Add your gem with bundle add or edit the Gemfile directly:

    bundle add devise
    bundle install
    

    This updates Gemfile and Gemfile.lock. Commit these changes if you want to keep them versioned.


    3. Build a new Docker image

    Use Docker Buildx to build for linux/amd64 (the platform EC2 runs):

    docker buildx build \
      --platform linux/amd64 \
      -t your-dockerhub-username/llamapress-simple:0.1.18 .
    

    Push the image to Docker Hub:

    docker push your-dockerhub-username/llamapress-simple:0.1.18
    

    4. SSH into your Leonardo instance

    From your local machine:

    ssh ubuntu@your-instance-ip
    

    Navigate to the Docker Compose directory:

    cd ~/llamapress
    

    5. Update docker-compose.yml

    Open the file:

    nano docker-compose.yml
    

    Find the service for llamapress and update the image tag:

    llamapress:
      image: your-dockerhub-username/llamapress-simple:0.1.18
      build: .
      ports:
        - "3000:3000"
      # other options…
    

    Save and exit.


    6. Pull & restart the container

    docker compose pull llamapress
    docker compose up -d llamapress
    

    This pulls your new image and restarts the container in detached mode.


    7. Verify

    Check logs:

    docker compose logs llamapress
    

    Visit your Leonardo instance in the browser → your Rails app is now running with the new gems.


    ⚡ Notes

    • Changes to docker-compose.yml persist across stops/starts. (Your instance’s filesystem is preserved).
    • To make new users get the updated config by default, bake a new AMI with the updated compose file.
    • Use semantic versioning for your images (e.g., 0.1.18, 0.1.19) to avoid confusion.
  • Using Leonardo on Windows – Commands

    We use WSL to run docker to run Leonardo.

    When editing the source code in VSCode, we will often run into file permissions. To fix this issue, run this command:

    “`sudo chown -R user:user /home/user/llamapress“`

    Replace user with your windows username.

  • A Chatbot that Builds Rails Apps

    Monday, 08/18/2025.

    I’ve been steadily working towards this idea for the last 12 months. Back when I first started I knew this was technically feasible, even inevitable, but there were so many unknowns that I hadn’t worked out.

    The first version was letting LLMs generate HTML/CSS/JavaScript code, in a fully formatted HTML document, like so:

    <!DOCTYPE html> <html> ... (llm generated content) </html>

    By saving this LLM generated code into a database table named “Pages”, we simply dump that raw HTML content into the response when a user requests the page, like so:

    
    class PagesController < ApplicationController
      # GET /pages/1 or /pages/1.json
      def show
        content = @page.render_content
        render inline: content.html_safe, layout: 'page'
      end
    end
    

    This was the easiest way to release a product without needing to worry about partitioning file systems for each user, etc. Me and a friend ultimately wrapped this all into a Ruby on Rails app, and launched it under the name: “LlamaPress”, and marketed it as a webpage builder for people who want to build simple websites & webpages.

    Using OpenAI’s free credit program (free credits by opting into sharing our prompts with them), we offered this as a freemium service, and have had over 8,000 user registrations, and over 40,000 pages created.

    We even got featured on Business Insider, because it turns out scammers had used LlamaPress to clone a victims online store, download the HTML out of our platform, and then used it to phish them. Yikes. More about that later.

    While having an LLM write HTML code directly into a database and rendering it in real time is a great start, it still fell short of the original dream of letting LLMs write and execute Rails scaffolding commands, and build Rails apps.

    Now, I’m ready to make the leap.

    Here’s how it’s going to work, starting with a screenshot of what I’m currently running locally.

    This is a FastAPI app running on localhost:8000, with a chat interface on the left, and an iFrame loading the front-page of the Rails app, LlamaPress, that’s running on localhost:3000.

    The chat interface on the left is similar to ChatGPT or Lovable, where you can chat with an LLM that’s tied into the filesystem of the Rails app.

    There’s a websocket connection from this webpage to FastAPI that sends messages to LangGraph. Within LangGraph, there’s a ReAct agent, with tool calls to read & write files from the file-system, run Rails terminal commands (like scaffolding, rake tasks, console, etc), commit code to the git history, etc.

    This all runs from a single docker-compose.yml file, where there are four containers.

    1. A Ruby on Rails docker container that runs on localhost:3000.
    2. A FastAPI/LangGraph container that runs on localhost:8000
    3. A Postgres container that acts as the database for Rails and to save LangGraph agent checkpoints.
    4. A Redis container for ActionCable web socket communication from the browser to Rails.

    I don’t have a final version to show yet, but I will soon.

    This is an ongoing effort, and I’ll continue to make updates on this post as I make progress.

    Kody

    Tue, 08/19/2025.

    I have a breakpoint triggering inside the FastAPI application, and we can view the file contents inside the Ruby on Rails app docker container.

    We do this through doing a shared, mounted volume from the Ruby on Rails docker container, to the FastAPI docker container.

      llamapress:
        # image: kody06/llamapress:0.1.16
        build:
          context: LlamaPress
          dockerfile: Dockerfile.dev
        stdin_open: true     # <-- enables byebug/irb input
        tty: true             # <-- attaches terminal output
        env_file:
          - .env
        command: bash -c "rm -f tmp/pids/server.pid && bundle exec rails db:prepare && bundle exec rails s -b '0.0.0.0'"
        platform: linux/amd64       # ← tells Compose “run this container as amd64 under QEMU”
        volumes:
          - ./app:/rails/app
          - ./config/routes.rb:/rails/config/routes.rb    # mount just one file
          - ./db:/rails/db                               # mount entire db folder
        ports:
          - "3000:3000"
        networks:
          - llama-network
        environment:
          - RAILS_ENV=development
          - DATABASE_URL=postgres://user:password@db:5432/llamapress
          - BUNDLE_PATH=/usr/local/bundle
          - REDIS_URL=redis://redis:6379/1
          - BOOTSNAP_CACHE_DIR=/rails/tmp/cache/bootsnap
        depends_on:
          - db
          - redis
    
      llamabot:
        # image: kody06/llamabot:0.1.16
        build: 
          context: LlamaBot
          dockerfile: Dockerfile
        volumes:
          - ./app:/app/app/rails/app
          - ./config/routes.rb:/app/app/rails/config/routes.rb    # mount just one file
          - ./db:/app/app/rails/db                               # mount entire db folder
          - ./LlamaBot/app:/app/app
        env_file:
          - .env


    And here’s a picture where we can see the beginnings of how our LangGraph agent could view the contents of the Rails app.

    Here’s a breakpoint where we can view and inspect all of the internals of the Rails app, from the breakpoint()

    INFO:app.websocket.web_socket_handler:Received message: {'message': 'test', 'thread_id': '2025-08-19_09-57-36', 'agent_name': 'llamabot'}
    INFO:app.websocket.web_socket_handler:Waiting for message from LlamaPress
    INFO:app.websocket.request_handler:Created state with keys: ['messages']
    > /app/app/agents/llamabot/nodes.py(42)read_rails_file()
    -> print(os.listdir(APP_DIR / "rails"))
    (Pdb) 
    (Pdb) n
    ['app', 'config', 'db']
    > /app/app/agents/llamabot/nodes.py(43)read_rails_file()
    -> # ['app', 'config', 'db'] <- these are the folders we're able to see in the container.
    (Pdb) os.listdir(APP_DIR / "rails" / "app")
    ['.DS_Store', 'mailers', 'strategies', 'models', 'instruments', 'llama_bot', 'javascript', 'jobs', 'controllers', 'views', 'assets', 'helpers', 'services', 'channels']
    (Pdb) os.listdir(APP_DIR / "rails" / "app" / "controllers")
    ['llama_bot_controller.rb', 'application_controller.rb', 'organizations_controller.rb', 'pages_controller.rb', 'admin', 'users_controller.rb', 'users', 'sites_controller.rb', 'api', 'page_histories_controller.rb', 'submissions_controller.rb', 'concerns']
    (Pdb) os.listdir(APP_DIR / "rails" / "app" / "models")
    ['site.rb', 'page_history.rb', 'page.rb', 'organization.rb', 'submission.rb', 'checkpoint.rb', 'application_record.rb', 'concerns', 'user.rb']
    (Pdb) os.listdir(APP_DIR / "rails" / "app" / "views")
    ['organizations', 'devise', '.DS_Store', 'admin', 'shared', 'sites', 'layouts', 'chat_conversations', 'users', 'pwa', 'submissions', 'page_histories', 'llama_bot', 'pages']
    (Pdb) 

    This is interesting because it means we can take these files, and feed them into the LLM if we want to, and it would be able to understand the Rails application.

    We could also let the LLM overwrite these files directly, with new code. And the user would be able to see the changes in real time through the iFrame preview.

    Let’s see if we can get the LLM to update the current home-page title from “Build a website in seconds!” to something more based like: “Hello from Leonardo da Llama”.

    Let’s change the system prompt to tell the LLM that it can view Rails files, and let’s give it a tool call to read.

    Old System Prompt:

    You are Leonardo, a Llama that reads and writes HTML/CSS/JavaScript code.

    Your task is to help the user to modify and create webpages, using HTML/CSS/JavaScript.

    All outputted code is saved into a single HTML file, including the CSS and JavaScript code.

    Here is the current page that the user is viewing: <HTML_PAGE>{content}</HTML_PAGE>

    New System Prompt:

    You are Leonardo, a Llama that can read and write changes to a Ruby on Rails application.

    Your task is to help the user with their Ruby on Rails application, by answering questions, making modifications, etc.

    You can list the contents of the Rails directory to explore the app.

    And here’s the tool calls we’ll pass the agent, using LangGraph’s tool decorator.

    ```
@tool
def read_rails_file(filepath: str) -> str:
    """Read the contents of a Rails file."""
    # Construct the full path
    full_path = APP_DIR / "rails" / filepath
    
    # Check if file exists
    if not full_path.exists():
        return f"File not found: {filepath}"
    
    # Read the file contents
    try:
        # Option 1: Using pathlib (recommended)
        contents = full_path.read_text()
        
        # Option 2: Using traditional open()
        # with open(full_path, 'r') as f:
        #     contents = f.read()
        
        return contents
    except Exception as e:
        return f"Error reading file: {e}"

@tool 
def list_directory_contents(directory: str = "") -> str:
    """
    List the contents of a directory.
    If directory is empty, lists the rails root directory.
    """
    # Build path - if directory is empty, just use rails root
    dir_path = APP_DIR / "rails" / directory if directory else APP_DIR / "rails"
    
    if not dir_path.exists():
        return f"Directory not found: {dir_path}"
    
    return os.listdir(dir_path)

# Global tools list
tools = [list_directory_contents, read_rails_file]

# System message
sys_msg = SystemMessage(content=f"""You are Leonardo, 
a Llama that can read and write changes to a Ruby on Rails application.
Your task is to help the user with their Ruby on Rails application, 
by answering questions, making modifications, etc.
You can list the contents of the Rails directory to explore the app.
""")```

    Will it work? Let’s test it.

    Ok, great. The agent is reading directory contents and file contents properly. Next step, let’s see if we can get the LLM to overwrite a file.

    I’ve added an additional tool method for the LLM to use.

    We’ll help the LLM by giving it the path to the file in question, which is: app/views/devise/registrations/new.html.erb.

    This is acting as the home page for unregistered users, and says: “Build a website in seconds!”

    Here’s the message I’ll send Leonardo:

    Can you please view the contents of the file app/views/devise/registrations/new.html.erb, and tell me what it says?

    Leonardo responds with:

    🔨 Using tool: read_rails_file

    The app/views/devise/registrations/new.html.erb file provides a modern, multi-step user experience for new user registration and sign-in, with a focus on quickly building a website using an AI prompt. Here’s what it contains: 1. Landing/Prompt Section: – Displays the LlamaPress logo. – Headline: “Build a website in seconds!” – Users are prompted to enter their website idea in a textarea, with a typing effect suggesting example prompts (e.g., “Build me a landing page for my plumbing business…”). – Button: “Generate your website” (triggers the signup form). 2. Signup Section (hidden by default, revealed after clicking the button): – Form fields for email, password, confirmation, and a hidden prompt (the idea entered above). – Automatically fills a hidden “organization name” based on the user’s email. – Error and validation messages are clearly styled. – Button: “Create my website”. – Option to switch to sign-in form for returning users. 3. Signin Section (hidden by default, revealed via links): – Standard Devise login form with email and password. – Displays validation/flash messages. – Option to switch to sign-up. 4. JavaScript Enhancements: – Placeholder typing animation for example prompts in the textarea. – Transition logic to slide between prompt, signup, and sign-in sections, carrying the user’s prompt through the process. – Organization name is dynamically set when a user enters their email. 5. Styling: – Modern, clean interface with Tailwind CSS and FontAwesome styling. – Responsive, mobile-friendly layout. Summary: This page is designed to quickly capture a website idea (prompt), then guide the user through signup or login while making the process engaging and clear. It keeps the user’s input all the way to account creation, enabling a fast start for generating their website. Let me know if you want to see the actual rendered HTML, a certain section, or need the content in a different format!

    I then said:

    Can you please change the h1 from “Build a website in seconds!” to say: “Hello world from Leonardo?”

    He responded with:

    The heading (h1) in your registrations new user page has been changed from “Build a website in seconds!” to: Hello world from Leonardo? You’ll now see this new heading when users visit the signup page! If you’d like any further customizations, just let me know.

    And here’s the change Leonardo made:

    In order to view the changes, I had to bring down the Rails docker container, and and then bring it up again. Ideally, the iFrame could refresh automatically instead of needing to run:

    docker compose down -v llamapress
    docker compose up -d llamapress

    I’m sure there’s a way to hotload the files in, or give a tool-call for Leonardo to pull the docker container down, and then recreate it.

    But, most importantly we’ve verified that because the Rails files are in a mounted volume that lives on my local computer, the code changes that Leonardo wrote in the Rails directory persist.

    The next steps to me seem clear:

    1. Figure out how to get the Rails changes working immediately, by either restarting the Rails server inside the docker container, or enabling some sort of hot-reload by putting Rails in dev mode.
    2. Improve user experience while the user is waiting for the LLM response, when the LLM is calling tools, and while the LLM is streaming.
    3. Figure out how to commit file changes in case a mistake is made, and we need to rollback commits.

    The first one we could start with a tool call to allow the agent to restart the rails server, from inside the container. Or potentially even just restart the entire docker container. With that too, we’d want a good user experience communicating what’s happening.

    For the second issue, we’ve already solved similar problems for LlamaPress, where we basically have JavaScript on the client-side that listens for LangGraph base_message tool calls, then you can case on the tool_name args and input arguments, and format HTML elements depending on what tool call was done. I’ve been needing to consolidate all of this into a JavaScript library that can be used across projects, across LlamaPress, across LlamaBot, etc. I’ve found myself re-writing a lot of the front-end code needed to format agent tool calling outputs, and LLM outputs in general.

    The third one seems easy enough, we can add another git command tool for Leonardo.

    We’d need to figure out how to do this from within the Python/FastAPI Docker container that shares the mounted volume with the rails folder. Is it as simple as just running git commit -m "message"?

  • LlamaBot as a super tool

    Turning LlamaBot into a full blown MCP client

    Why am I wanting LlamaBot to become an MCP Client, similar to Claude and ChatGPT?


    I don’t want to rely on Claude and ChatGPT to be my only access to the MCP server world. (I want control over the full stack).
    I want access to the client-side agentic workflows. (ReAct, CodeAct, etc).


    I want other people to have access to the client-side agentic workflows (so that Claude & OpenAI aren’t black-box magic. I assume that they’re mostly implementing a similar type of ReAct and CodeAct type of agent workflow out of the box).

    Creating TypeScript/React front-end for LlamaBot

    I decided I want conversation threads for LlamaBot, so we can have multiple, unique persistent conversations with our agent (similar to other Chat agents like ChatGPT and Claude Sonnet).

    While having a basic HTML front-end for our initial project made sense, if we’re turning this into a full-blown MCP client, including multiple conversation threads, previewing LlamaBot projects, etc. Then it makes sense that we branch into something that put our project on stable ground. Hence, React with TypeScript.

    I actually haven’ t personally used TypeScript previously, but I’m a big fan of React (my first startup’s front-end relied heavily on React and React-Native).

    React is amazing because of it’s hook and state update propagation — makes for 10X cleaner and re-usable front-end code once you understand it.

    Typed languages in general are nice, because they provide compilation level checking that prevents bugs before run-time. Although I haven’ t used TypeScript extensively, I’m excited to implement it into LlamaBot, because it will lead to a more stable user experience and allow us to build in some amazing functionality to the front-end.

    Being a lazy vibe-coder, I decided to let Cursor Agent take the first stab at the entire setup of creating our front-end with TypeScript and React. Let’s see how it does!

    ^ Cursor banging out a TypeScript/React front-end effortlessly.

    Cursor and Claude Sonnet 4 coming up with a banger design for the interface.

    It wouldn’t be LlamaBot without our beloved mascot staring down the user with his piercing gaze.

    Let’s display our list of Agents that the user can select & run (pulled from langgraph.json).

    Adding front-end to detect tool calls and format it as a message:

    Next step, getting LlamaBot to write these in as “artifacts”, similar to how Claude creates artifacts.

    I created a new folder structure “artifacts” that can house individual projects.

    From here, we can equip our agent to write directly to artifacts/<artifact_id>/page.html, artifacts/<artifact_id>/assets/script.js,and artifacts/<artifact_id>/assets/styles.css

    We could also have a model.py file, and a controller.py file, that could allow backend functionality for our front-end to interact with (maybe even giving it the ability to trigger additional agent flows and display output!)

    One example of this would be a storybook generator side project that I’ve worked on previously, that generates “chapters” of text, and then an associated audio recording of the chapters, and pictures to go along with it.

    A very fun project that lived in it’s own FastAPI application and used LangGraph, but once we have artifacts properly working, we could have LlamaBot recreate it as an artifact! (More to come on this)

  • Giving our Coding Agent Playwright as a Tool Call

    Playwright is a powerful tool for our AI Agent. Playwright can launch a browser that can be controlled via Python code.

    Source

    It can do the following:

    1. Query HTML/CSS elements and interact with the browser dom.
    2. Take screenshots of the page it’s viewing.
    3. View JavaScript logs

    Why is giving Playwright to our coding agent useful?

    1. Our agent could use Playwright to take screenshots and recreate it as HTML/CSS.
    2. Our agent could inspect HTML page DOMs & styling.
    3. Our agent could dynamically test the code we’re building in real time, (as if it were a human).
      • It could detect if our page wasn’t loading, read JavaScript logs, and feed it back into the agent in a type of “self-healing” verification loop.
        • (This is what real web software developers do, they load the browser and test the app in real time as they’re building it out).

    Challenges with Directly Copying HTML/CSS Source

    While we can do that, these files can often be extremely massive. And they are transpiled

    While directly copying HTML and CSS from existing websites seems straightforward, this comes with challenges. Modern web pages, particularly those built with no-code platforms or advanced page builders (like Wix, SquareSpace, Elementor, etc.), typically produce large, complex, and heavily transpiled codebases. This includes heavy, excessive CSS styles, many nested elements, and hugely bloated JavaScript files.

    This causes:

    Massive File Sizes: Transpiled code from visual builders is enormous, making it difficult for Language Learning Models (LLMs) or agents to parse efficiently. Current LLMs have input token limits, restricting the amount of content they can understand and process at once.

    Edibility: We’re not looking to infringe on copyrighted work, so we’ll need to be able to make our own derivations from this, ideally using the LLM. But copied code from these tools often lacks readability and is challenging to edit or debug, leading to difficulties for the LLM to understand it, let alone make effective changes.

    Instead, a vision first approach, along with passing in a parsed down version of their HTML structure helps generate clean, understandable, and editable code, overcoming these direct-copy challenges effectively.

    Using AI Vision for Web Page Cloning is Now Possible Due to the Latest Advancements in AI Vision Reasoning Capabilities.

    AI models have gotten really good at visually reasoning about image information. Take OpenAI’s announcement back in December 2024, about O3’s breakthrough in visual reasoning tasks, such as the ARC AGI data-set.

    ARC AGI is an evaluation set to compare AI systems performance against. It was intended to be a set of questions and tasks that would be very challenging for AI models to solve, and it’s creators didn’t anticipate a solution to appear as rapidly as it did.

    See the announcement here:

    We want to test the models ability to learn new skills on the fly. … ARC AGI version 1 took 5 years to go from 0% [solved] to 5% [solved]. However today, O3 has scored a new state of the art score that we have verified. O3 was able to score 87.5%, human performance is comparable at 85% threshold, so being above this is a major milestone.

    Gregory Kamradt, president of the ARC foundation.

    Source: InfoQ

    Source: InfoQ

    Given these breakthroughs, an AI model like O3 should be able to reason about the image we give it, and provide a very clear verbal representation of webpage, that can then be passed in to another LLM to create the HTML/CSS code.

    Our Approach to Cloning WebPages using AI, and our Agent Architecture:

    Here’s the video going over the implementation!

    06/04/2025, 7:59PM:

    Feeling a little discouraged! I made the decision to add the playwright screenshot in as a two-step tool-calling process. (two new additional tools).

    That means our agent has the following tool calls at its disposal:

    1. Write HTML
    2. Write CSS
    3. Write JavaScript
    4. get_screenshot_and_html_content_using_playwright
    5. clone_and_write_html_to_file

    There are two main problems happening right now.

    1. The LLM is correctly picking “get_screenshot_and_html_content_using_playwright” when I send in a prompt like

    Please clone this webpage: https://fieldrocket.us

    2. The LLM is not including image sources for some reason, even though the trimmed_html that we get from playwright, does indeed have the image src tags included in the HTML.

    Furthermore, our tracing is lame because when we get into our clone_and_write_html_to_file, we aren’t using langchain_openai sdk, so it’s not logging the LLM input & output in LangSmith (making it harder to observe & debug)

    But, roughly 30% of the time, it’s jumping straight from the get_screenshot tool call, into the write_html tool call, rather than going to the clone_and_write_html_to_file.

    It does make me wonder: what does this @tool decorator even do?

    Is the LLM just seeing the function name of the tool call, or is it also seeing the comment just below the method signature? In the LangChain academy course on LangGraph, Lance doesn’t specify. But he has the comment in there right below the signature, so I assumed the LLM could see it.

    According to this guide

    You must include a docstring which will serve as the function description and help LLM in understanding its use.

    Which is what I assumed, and how Lance appeared to present it in the LangChain academy course.

    One workaround that could work, is collapsing the two separate tool-calls into a single one. That way the LLM isn’t having to make one right decisions, just a single right decision.

    I bet that would solve this first problem.

    I now collapsed the two separate tools into one:

    @tool
    def get_screenshot_and_html_content_using_playwright(url: str) -> tuple[str, list[str]]:
        """
        Get the screenshot and HTML content of a webpage using Playwright. Then, generate the HTML as a clone, and save it to the file system. 
        """
        html_content, image_sources = asyncio.run(capture_page_and_img_src(url, "assets/screenshot-of-page-to-clone.png"))
    
        llm = ChatOpenAI(model="o3")
    
        # Getting the Base64 string
        base64_image = encode_image("assets/screenshot-of-page-to-clone.png")
    
        print(f"Making our call to o3 vision right now")
        
        response = llm.invoke(
            messages=[
                SystemMessage(content="""
                    ### SYSTEM
    You are “Pixel-Perfect Front-End”, a senior web-platform engineer who specialises in
     * redesigning bloated, auto-generated pages into clean, semantic, WCAG-conformant HTML/CSS
     * matching the *visual* layout of the reference screenshot to within ±2 px for all major breakpoints
    
    When you reply you MUST:
    1. **Think step-by-step silently** (“internal reasoning”), then **output nothing but the final HTML inside a single fenced code block**.
    2. **Inline zero commentary** – the code block is the entire answer.
    3. Use **only system fonts** (font-stack: `Roboto, Arial, Helvetica, sans-serif`) and a single `<style>` block in the `<head>`.
    4. Avoid JavaScript unless explicitly asked; replicate all interactions with pure HTML/CSS where feasible.
    5. Preserve all outbound links exactly as provided in the RAW_HTML input.
    7. Ensure the layout is mobile-first responsive (Flexbox/Grid) and maintains the same visual hierarchy:  
       e.g) **header ➔ main (logo, search box, buttons, promo) ➔ footer**.
    
    ### USER CONTEXT
    You will receive two payloads:
    
    **SCREENSHOT** – a screenshot of the webpage.  
    **RAW_HTML** – the stripped, uglified DOM dump (may include redundant tags, hidden dialogs, etc.).
    
    ### TASK
    1. **Infer the essential visual / UX structure** of the page from SCREENSHOT.  
    2. **Cross-reference** with RAW_HTML only to copy:
       * anchor `href`s & visible anchor text
       * any aria-labels, alt text, or titles that improve accessibility.
    3. **Discard** every element not visible in the screenshot (menus, dialogs, split-tests, inline JS blobs).
    4. Re-create the page as a **single HTML document** following best practices described above.
    
    ### OUTPUT FORMAT
    Return one fenced code block starting with <!DOCTYPE html> and ending with </html>
    No extra markdown, no explanations, no leading or trailing whitespace outside the code block.
                     
                     Here is the trimmed down HTML:
                     {trimmed_html_content}
                """),
                HumanMessage(content=f"Here is the trimmed down HTML: {trimmed_html_content}"),
                HumanMessage(content=f"data:image/jpeg;base64,{base64_image}")
            ]
        )
    
        breakpoint()
    
        with open("/Users/kodykendall/SoftEngineering/LLMPress/Simple/LlamaBotSimple/page.html", "w") as f:
            f.write(response.content)
        
        return "Cloned webpage written to file"

    Let’s try it now.

  • Part 18: LangGraph Pre-Built Components: An Easier Way to Build Agents

    Previously when building our coding agent, LlamaBot, we made a very simple Agent workflow, as described by this diagram here:

    This is an intuitive & simple implementation of a coding agent that can take a user’s initial message, decide if the user wants to write code or not, create a “design plan”, and then write code that gets saved to the file system in order to implement that plan.

    I like this approach because it’s simple, straightforward, and easy to understand.

    There’s a natural progression of thinking from first principles that arrives at this simple agent workflow. That thought process can be seen in real time, by watching videos 1-16 of building our coding agent from scratch, here: https://www.youtube.com/watch?v=AadSBNKglMM&list=PLjxwvRWwj8anN2aTUhX2P0oKc0ghnhXHQ&pp=gAQB

    We use LangGraph because it helps us build more reliable agents. One definition of an agent is allowing the LLM to decide the control flow of our application. By representing our application as a set of nodes and edges, we give the LLM autonomy to decide what code should be executed next.

    There’s a fatal flaw of our current implementation: the LLM has limited authority to decide the control flow of our application.

    For our current implementation, we are on the far-left of this curve. Our current implementation is essentially just a router.

    And this is great, because it’s very reliable and simple.

    BUT, if we want to build a more impressive and capable coding agent, we need to give the LLM more autonomy.

    Our current agent only writes a single file to our file-system right now, into page.html.

    This is simple, but limited.

    What if we wanted our agent to be able to write the styles.css or script.js into separate files in our file system?

    Under our current paradigm, we would need to add two separate nodes, and 2 additional LLM prompts, and an additional 4 edges to make this workflow work.

    New Agent Architecture (ReAct agent architecture)