I’m experimenting with the OpenAI Agents SDK (Python) and ran into unexpected behavior with parallel_tool_calls.
According to the docs, setting parallel_tool_calls=False should ensure that the model makes at most one tool call per turn. However, I’m still seeing multiple tool calls returned in the assistant response.
My Code:
client = AsyncOpenAI(
api_key=GEMINI_API_KEY,
base_url=BASE_URL
)
model = OpenAIChatCompletionsModel(
openai_client=client,
model="gemini-2.0-flash" # using openai sdk with gemini model
)
# Emited some code for brevity
async def main():
agent = Agent[UserInfo](
name="Assistant",
model=model,
tools=[fetch_user_uid, fetch_weather], # Provided two tools
model_settings=ModelSettings(parallel_tool_calls=False) # Setting `parallel_tool_calls` to false.
)
result = await Runner.run(
starting_agent=agent,
input="what's user uid and what's the weather in karachi?", # asking question to check if the model is calling both tools
context=user_info,
)
if __name__ == "__main__":
asyncio.run(main())
Logs:
...
LLM resp:
{
"content": null,
"refusal": null,
"role": "assistant",
"annotations": null,
"audio": null,
"function_call": null,
"tool_calls": [ # Model is asking to call both tools even when `parallel_tool_calls` is set to `False`
{
"id": "",
"function": {
"arguments": "{}",
"name": "fetch_user_uid"
},
"type": "function"
},
{
"id": "",
"function": {
"arguments": "{\"city\":\"karachi\"}",
"name": "fetch_weather"
},
"type": "function"
}
]
}
...
Expected results:
LLM calls at most one tool call per turn.
Actual results:
LLM is calling both tools even when parallel_tool_calls is set to False
parallelit may execute all tools at the same time - to make it faster. Withoutparallelit may execute next tool after getting result from previous tool - it allows to use result from one tool as information for next tool.parallel_tool_callsparameter directly goes to thechatcompletions.create()request, and their docs states that model will not return multiple tools to call.