The Aim:-
I want to create an agent connected to a dataset. The user should be able to ask questions about the data, and the agent should return responses by running SQL-like analytical queries.
Specifically, I expect the agent to handle:
- Selection : Filtering Rows based on conditions across multiple columns
- Projection : Returning values from another column.
- Perform Aggregations : Like average, sum, min, max, count on filtered subsets
- Simple Computations : Like sum/ difference/ percentage if needed (i doubt i will need this a lot)
What I did:-
- Created a Default Generative Playbook and connected it with a datastore tool.
- For now, I only want it to handle one test question reliably. The question is “What are all the pitch results when the event is a strikeout on 26th April, 2024“
To answer the above question,I have defined the steps to the agent as follows :
- Filter rows by “
game_date" column name. (Do I need to tell it in which format the date data is in the datset? or will it be able to undersand the YYYY-MM-DD format on it’s own?) - Filter again by “
events" column for a specific event. - Return all corresponding values from the
descriptioncolumn.
Constraints I Set :
-
The agent should never guess answers; all outputs must come directly from the data.
-
The agent is allowed to use aggregation if the question requires it.
Datastore Tool Setup
This playbook is connected to structured dataset datastore that I have fetched from bigquery.
I configured the datastore so only the relevant columns are enabled with the right properties:
-
description(key property): Searchable, Indexable, Retrievable -
events: Searchable, Indexable, Retrievable, Dynamic Facetable -
game_date: Indexable, Retrievable, Dynamic Facetable -
All other columns have Searchable, Indexable, Retrievable, Dynamic Facetable turned off.
Problems
-
When I ask the one test question to the agent, it says that I can’t do aggregations on the dataset. It is however able to access the differnt columns. in the dataset.
-
While uploading the dataset to bigquery, I set it to detect the schema of the dataset automatically. It automatically defines one of the columns called “description“ as the key property. Now, to test if the agent is able to see the dataset, I asked it a simple question. “Give me all the unique entries “xyz“ column name. “. The agent is able to fetch some (or sometimes all) the unique entries under a specific column name but it can’t give me any values from the description column.
So my question is why does auto schema detection set description as the key value and then second, is that the reason why I am unable to fetch values from the description column? If so, How do I fix it?
- The agent is connected to structured dataset from bigquery. why is it that, when I ask it the question “Give me all the unique entries “xyz“ column name. “, for the events column it would give all of the unique entries under that column name and then in a different test of the same agent, it is not able to fetch all the entries from the events column, it only returns two or three.