I am indexing data from a pandas dataframe in elasticsearch. I have null_value set for some es fields, but not others. How do I drop the columns without null_value but leave those with (setting value to None)?
es mapping:
"properties": {
"sa_start_date": {"type": "date", "null_value": "1970-01-01T00:00:00+00:00"},
"location_name": {"type": "text"},
code:
cols_with_null_value = ['sa_start_date']
orig = [{
'meter_id': 'M1',
'sa_start_date': '',
'location_name': ''
},{
'meter_id': 'M1',
'sa_start_date': '',
'location_name': 'a'
}]
df = pd.DataFrame.from_dict(orig)
df['sa_start_date'] = df['sa_start_date'].apply(pd.to_datetime, utc=True, errors='coerce')
df.replace({'': np.nan}, inplace=True)
df:
meter_id sa_start_date location_name
0 M1 NaT NaN
1 M1 NaT a
dicts needed for elasticsearch index:
{"meter_id": M1, "sa_start_date": None}
{"meter_id": M1, "sa_start_date": None, "location_name": "a"}
Note location_name cells with NaN are not indexed, but sa_start_date cells with NaT are. I've tried many things, each more ridiculous than the last; have nothing worth showing. Any ideas appreciated!
Tried this but the Nones are dropped along with the NaNs..
df[null_value_cols] = df[null_value_cols].replace({np.nan: None})
df:
meter_id sa_start_date location_name
0 M1 None NaN
1 M1 None a
for row in df.iterrows():
ser = row[1]
ser.dropna(inplace=True)
lc = {k: v for k, v in dict(row[1]).items()}
lc: {'meter_id': 'M1'}
lc: {'meter_id': 'M1', 'location_name': 'a'}