I'm working with a dataframe that contains multiple columns, my goal is to create one extra column which contains a list of the values from the columns and then, explode the dataframe on that new column.
This is the original dataset:
id day_a1 day_a2 ... day_a6
13804 002n 25.0 25.0 ... 25.0
30842 002c 30.0 30.0 ... 30.0
1624 002k 25.0 NaN ... 25.0
8959 002j 25.0 25.0 ... 25.0
21216 003t 25.0 25.0 ... 25.0
I use df['vector'] = df[['day_a1,'day_a2','day_a3','day_a4','day_a5','day_a6']].astype(str).apply(lambda x: ','.join(axis=1) to create this extra column that should be a list of all the dates for the day columns from 1 to 6.
print(df['vector']) returns the following output:
13804 25.0,25.0,24.0,25.0,25.0,25.0
30842 30.0,30.0,31.0,28.0,31.0,30.0
1624 25.0,nan,nan,nan,nan,25.0
8959 25.0,25.0,25.0,25.0,25.0,25.0
This is not being interpreted as a list, so if try new_df = df.explode('vector') nothing happens.
But also, I've tried using the following to convert the column vector into a list:
def listing(row):
val = list(row['vector'])
return val
df['vector_b'] = df.apply(listing,axis=1)
But it also doesn't work, because each row is interpreted as string, hence the list is being created as:
13804 [2, 5, ., 0, ,, 2, 5, ., 0, ,, 2, 4, ., 0, ,, ...
30842 [3, 0, ., 0, ,, 3, 0, ., 0, ,, 3, 1, ., 0, ,, ...
1624 [2, 5, ., 0, ,, n, a, n, ,, n, a, n, ,, n, a, ...
How can I create an extra column with the values of the columns day_a1,day_a2, to day_a6 that will be interpreted as a list to later use explode on?
- I've tried also using ast.literal_eval() in a custom function and it didn't work because it returned error.
- I need to use
.astype(str)before applying thelambdaotherwise I get an error saying string was expected but recieved float.
Thanks.
The expected output would be this:
id vector
13804 002n 25.0
13804 002n 25.0
.... ....
13804 002n 25.0
30842 002c 30.0
30842 002c 30.0
... ... ...
30842 002c 30.0
1624 002k 25.0
1624 002k NaN
... ... ...
1624 002k 25.0
astype(str).apply(','.join, axis=1)doastype(str).apply(list, axis=1)?