Test Driving Prompting Strategies

September 02, 2024

Natural Language to SQL is an area of continued research and one I’ve explored previously. Today, I will implement prompting techniques from two research papers. The first explores updating the structure and providing supplementary SQL knowledge. The second uses few shot Chain of Thought reasoning.

C3: Updated Question and Supplementary SQL Knowledge

In the C3: Zero-shot Text-to-SQL with ChatGPT paper, the authors introduce C3 prompting, by which they mean clear prompting, calibration of model bias, and consistent output. The goal of clear prompting is to make it easier for the LLM to parse the information we pass in via the prompt, clearly separating input data and the table structure. To calibrate against model bias, the authors found common mistakes the model made and provided example SQL queries that used preferred syntax. To ensure consistent output, the authors seed the model response with a SELECT at the end of the prompt and instruct the model to only respond with a valid SQL statement.

When comparing this approach with the basic agent I developed before exploring existing research, the only methodological improvement is the avoidance of common pitfalls. I was already passing in the table schema, albeit in a different form and was instructing the LLM to only return a valid SQL statement. When comparing the performance of the two agents, there was no noticeable difference in the questions the agents were able to answer correctly. Recent research has show constraining response format negatively impacts reasoning models ability, which may explain the lack of improvement I saw in the C3 agent.

Few Shot, Chain of Thought Prompting

The seminal Chain of Thought paper leverages few shot prompting with Chain of Thought examples to help the LLM understand the type of answers it should provide. Few shot prompting means giving the LLM multiple question/answer pairs to show it how to respond. A Chain of Thought question/answer pair talks the model through how to arrive at an answer. An example of chain of thought reasoning is given in the paper:

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.

For the SQL query example, I added a handful of examples to give the model an idea of how to use the tables and their relationships. Here is one example I provided to the model:

Question: How many total yards did Josh Allen pass for in 2023?

Answer: To answer the question 'How many yards did Josh Allen pass for in 2023?', we can follow a step-by-step process.

First, let's identify the columns needed from the schema:

  1. season: We need this to select the 2023 season.
  2. player_id: We need this to filter by Josh Allen.
  3. passing_yards: This column gives us the total yards passed for each game.
  4. name: This will help us verify that we're filtering by the correct player.

Given these requirements, our query should be:

SELECT p.name as player_name, sum(s.passing_yards) as total_passing_yards FROM stats_by_game s JOIN players p on p.id = s.player_id WHERE season_type='REG' and season=2023 and p.name='Josh Allen';

This query will return the total yards Josh Allen passed for in 2023. The season_type filter is included to ensure we're only considering regular-season games.

This agent performed markedly better than all previous agents and was able to correctly answer fairly complex queries such as "List all running backs who switched teams from 2022 to 2023, their previous and current team and their total fantasy points for each year" when the question was asked clearly.

Highest Points in a Game

The ability to answer questions at this level of complexity opens the door for fantasy football managers to find truly novel insights about player performance and value. In future iterations, I would attempt to increase the robustness of the agent, allowing it to navigate poorly worded questions, domain specific jargon, and misspellings of player names. If you’d like to view the code, it’s available here. Thanks for reading!