For context, I have a dataset that is comprised of USA's states and territories. I have made a new data frame with only the 50 states(excluding territories) lets call it States_Only. This is complete. However, the first data set (lets call it USA_ALL) had both NY and NYC as independent rows, meaning that the values attributed to NY do not already include NYC's recorded data. Because they originated from the same data set the columns match. All values are either NAN/NULL or integers. For my States_Only data to be complete, the NYC values from USA_ALL need to be added to NY in the States_only dataframe. How can I achieve this? For clarity, I do not want to append NYC, nor do I have the ability to groupby() because there is nothing software side tying these two together(such as an identifier), only the knowledge that NYC is within NY.
import requests
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
if __name__ == '__main__':
#data prep
data_path = './assets/'
out_path = './output'
#scraping javascript map data via xml
endpoint = "https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData"
data = requests.get(endpoint, params={"id": "US_MAP_DATA"}).json()
#convert to df and export raw data as csv
df = pd.DataFrame(data["US_MAP_DATA"])
path = os.path.join(out_path,'Raw_CDC_Data.csv')
df.to_csv(path)
#Remove last data point (Total USA)
df.drop(df.tail(1).index,inplace=True)
#Create DF of just 50 states
state_abbr =["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA",
"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
"MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
"NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
"SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]
states = df[df['abbr'].isin(state_abbr)]
# Add NYC from df to NY's existing values (sum of each column) to states
here is an excel spreadsheat to show the expected final value in the States_only dataset, this is included because the formatting on this forum for this data would be hard to understand and unclear Expected Values