index pandas dataframe in elasticsearch with null values but no nan

Question

I am indexing data from a pandas dataframe in elasticsearch. I have null_value set for some es fields, but not others. How do I drop the columns without null_value but leave those with (setting value to None)?

es mapping:

    "properties": {
        "sa_start_date": {"type": "date", "null_value": "1970-01-01T00:00:00+00:00"},
        "location_name": {"type": "text"},

code:

cols_with_null_value = ['sa_start_date']
orig = [{
    'meter_id': 'M1',
    'sa_start_date': '',
    'location_name': ''
},{
    'meter_id': 'M1',
    'sa_start_date': '',
    'location_name': 'a'
}]
df = pd.DataFrame.from_dict(orig)

df['sa_start_date'] = df['sa_start_date'].apply(pd.to_datetime, utc=True, errors='coerce')
df.replace({'': np.nan}, inplace=True)

df:
   meter_id sa_start_date location_name
0       M1           NaT           NaN
1       M1           NaT             a

dicts needed for elasticsearch index:

{"meter_id": M1, "sa_start_date": None}
{"meter_id": M1, "sa_start_date": None, "location_name": "a"}

Note location_name cells with NaN are not indexed, but sa_start_date cells with NaT are. I've tried many things, each more ridiculous than the last; have nothing worth showing. Any ideas appreciated!

Tried this but the Nones are dropped along with the NaNs..

df[null_value_cols] = df[null_value_cols].replace({np.nan: None})
df:
   meter_id sa_start_date location_name
0       M1          None           NaN
1       M1          None             a
for row in df.iterrows():
    ser = row[1]
    ser.dropna(inplace=True)

    lc = {k: v for k, v in dict(row[1]).items()}

lc: {'meter_id': 'M1'}
lc: {'meter_id': 'M1', 'location_name': 'a'}

mechanical_meat · Accepted Answer · 2020-01-19 02:28:09Z

6

Don't use .dropna() here. It will either drop entire rows, or entire columns; and you want to keep everything with the exception of empty location names.

You can do this in the following way:

df.replace({'': None}, inplace=True) # replace with None instead of np.nan

for idx,row in df.iterrows(): 
    lc = {k:v for k,v in row.items() if not (k == 'location_name' and v is None)} 
    print(lc)

Result:

{'meter_id': 'M1', 'sa_start_date': None}
{'meter_id': 'M1', 'sa_start_date': None, 'location_name': 'a'}

answered Jan 19, 2020 at 2:28

mechanical_meat

170k25 gold badges237 silver badges231 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Chiel Over a year ago

The problem with replacing Nan with None leads to the problem that all data types of the data frame will become 'object'. Any solutions for this?

mechanical_meat Over a year ago

@Chiel: what downstream issue(s) does that cause? i'm asking because any possible solution might depend on what else you want to do with the DataFrame.

Chiel Over a year ago

I have a dataframe with various types ranging from int64, float64, and datetime64s. Whenever I use df = df.replace({np.nan: None}), it works in the sense that it properly replaces NaN, NaT, and NA with None. However, all the different datatypes in the df change to object. Also, it works sending all these objects to Elastic using eland.pandas_to_eland(), but since all the df columns are of type object everything in Elastic becomes keyword. As a solution now I'm using the es_type_overrides parameter in pandas_to_eland() to override the types. I'm wondering if there is cleaner solution though.

Collectives™ on Stack Overflow

index pandas dataframe in elasticsearch with null values but no nan

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related