-1

I am new to python and I am trying to conduct multiple PubMed searches and save the number of results in a CSV file. The code I have right now will not run uless I remove my for loop. I would like the code to run through a column of "Terms" provided in a CSV file, but I don't know what location to place the for loop, and I don't know how to...I guess set the variable for the loop to run. Here is what I have that produces the key error:

import requests
import time
import pandas as pd

def get_pubmed_results_count(search_terms, delay=1):
    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
    results = {}

    for term in search_terms:
        # Define parameters for the API request
        params = {
            "db": "pubmed",
            "term": term,
            "retmode": "json"
        }

        try:
            # Make the request to the PubMed API
            response = requests.get(base_url, params=params)
            response.raise_for_status()

            # Parse the response
            data = response.json()
            count = data['esearchresult']['count']
            results[term] = count

        except requests.exceptions.RequestException as e:
            print(f"Error retrieving data for term '{term}': {e}")
            results[term] = None

        # Respectful delay between requests
        time.sleep(delay)

    return results

# Example usage

df_searchterms = pd.read_csv('search1.csv')
print(df_searchterms)

if __name__ == "__main__":
    for index, row in df_searchterms.iterrows():
        search_terms = (row['Term']) 
        result_counts = get_pubmed_results_count(search_terms)
    
    for term, count in result_counts.items():
       df_results = pd.DataFrame(result_counts.items(), columns=['term','count'])
       print (df_results)
       df_results.to_csv('TestRestults1.csv', index=False)

And here is what my search terms data frame looks like:

 Term
0   APOE AND Alzheimer's
1  PSEN1 AND Alzheimer's
2  PSEN2 AND Alzheimer's
3    APP AND Alzheimer's
4    CLU AND Alzheimer's

My question is, how to I get this to run without the keyerror?

9
  • 2
    Don't comment out the code that's getting the error. Post the problem code, and include the full traceback of the error. Commented Jul 3, 2024 at 19:02
  • Is the code after the for loop supposed to be inside the loop? If not, search_terms will just be the value from the last row of the df. Commented Jul 3, 2024 at 19:03
  • Are you trying to make a list of all the search terms in the df? You're just setting it to the value from one row, not a list of all rows. Commented Jul 3, 2024 at 19:04
  • 2
    If you want a list of all the terms, use search_terms = list(df_searchterms['Term']). You don't need a for loop. Commented Jul 3, 2024 at 19:05
  • @Barmar "Is the code after the for loop supposed to be inside the loop?" - There's more than one for loop, so to clarify, you're talking about for index, row, right? I would say the same thing. Commented Jul 3, 2024 at 19:12

1 Answer 1

0

I notice that your variable search_terms is just a list with one string inside...

if you want a list of a entire columm you need to do:

search_terms = df_searchterms[""APOE AND Alzheimer's]

I can't try to run your code by myself bcs you dont give the "search1.csv" file, but you can add a counter and run the entire list

if the for continues to execute just 1 time, plz try:

for i in range(len(search_terms)):
    # Define parameters for the API request
    params = {
        "db": "pubmed",
        "term": search_terms[i],
        "retmode": "json"

or give us a example of your "search1.csv"

Edit: It took me so long to write a good answer that they said what I thought in the coments, and I need more reputation points

Sign up to request clarification or add additional context in comments.

1 Comment

Hello, yes, I did include the info from search1.csv

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.