I've been querying a few API's with Python to individually create CSV's for a table.
I would like to try and instead of recreating the table each time, update the existing table with any new API data.
At the moment the way the Query is working, I have a table that looks like this,
From this I am taking the suburbs of each state and copying them into a csv for each different state.
Then using this script I am cleaning them into a list (the api needs the %20 for any spaces),
"%20"
#suburbs = ["want this", "want this (meh)", "this as well (nope)"]
suburb_cleaned = []
#dont_want = frozenset( ["(meh)", "(nope)"] )
for urb in suburbs:
cleaned_name = []
name_parts = urb.split()
for part in name_parts:
if part in dont_want:
continue
cleaned_name.append(part)
suburb_cleaned.append('%20'.join(cleaned_name))
Then taking the suburbs for each state and putting them into this API to return a csv,
timestr = time.strftime("%Y%m%d-%H%M%S")
Name = "price_data_NT"+timestr+".csv"
url_price = "http://mwap.com/api"
string = 'gxg&state='
api_results = {}
n = 0
y = 2
for urbs in suburb_cleaned:
url = url_price + urbs + string + "NT"
print(url)
print(urbs)
request = requests.get(url)
api_results[urbs] = pd.DataFrame(request.json())
n = n+1
if n == y:
dfs = pd.concat(api_results).reset_index(level=1, drop=True).rename_axis(
'key').reset_index().set_index(['key'])
dfs.to_csv(Name, sep='\t', encoding='utf-8')
y = y+2
continue
print("made it through"+urbs)
# print(request.json())
# print(api_results)
dfs = pd.concat(api_results).reset_index(level=1, drop=True).rename_axis(
'key').reset_index().set_index(['key'])
dfs.to_csv(Name, sep='\t', encoding='utf-8')
Then adding the states manually in excel, and combining and cleaning the suburb names.
# use pd.concat
df = pd.concat([act, vic,nsw,SA,QLD,WA]).reset_index().set_index(['key']).rename_axis('suburb').reset_index().set_index(['state'])
# apply lambda to clean the %20
f = lambda s: s.replace('%20', ' ')
df['suburb'] = df['suburb'].apply(f)
and then finally inserting it into a db
engine = create_engine('mysql://username:password@localhost/dbname')
with engine.connect() as conn, conn.begin():
df.to_sql('Price_historic', conn, if_exists='replace',index=False)
Leading this this sort of output
Now, this is a hek of a process. I would love to simplify it and make the database only update the values that are needed from the API, and not have this much complexity in getting the data.
Would love some helpful tips on achieving this goal - I'm thinking I could do an update on the mysql database instead of insert or something? and with the querying of the API, I feel like I'm overcomplicating it.
Thanks!

