I have a data frame (data) that has multiple data types (outliers were previously removed and marked with the string: Outlier). I am looking to summarize this data into a new data frame (analysis), but am running into issues when it comes to data types.
The issue that I am facing is some of the columns are descriptors (categories, names, countries, etc) and are not pulled in the numerical lists (mean, med, sd). This creates a mismatch in the number of rows in the lists (len(title) = 64, len(mean) = 61).
I'd like for the data frame to match up to all 64, with those descriptors being marked as 'NaN' for numerical fields such as mean (because I know you cant take the mean of ['Blue','Red','Yellow'])
Sample Data:
ORG|PROGRAM|YEAR|INDUSTRY|Responses|# of Questions|New Zone|Q1|Q2
USA|MO|2018|PRD - LF|64|44|High|4.75806451612903|4.70967741935484
CAN|ALB|2017|FS - B|247|43|Medium|4.61382113821138|4.66803278688525
UK|IRE|2018|RES - U|236|46|Low|4.13617021276596|4.30932203389831
Code:
title = list(data.keys())
n = list(data.count())
mean = list(data.mean())
med = list(data.median())
sd = list(data.std())
analysis = pd.DataFrame({'Mean':mean,'Median':med,'SD':sd})
print(analysis)
Current Output:
Desired Output: Additional rows should be shown with the NaN value if no numerical values exist (i.e. if it's a category or country). This would increase the amount of rows to 64 rather than 61, and allow for the additional columns to be added (Title, count, etc)



data.head(), so we have a MCVE (stackoverflow.com/help/mcve)analysis = pd.DataFrame(list-of-the-series, index=the_index_you_want)