I have a list of values that I want to update into multiple columns, this is fine for a single row. However when I try to update over multiple rows it simply overrides the whole column with the last value.
List for each row looks like below (note: list length is of variable size):
['2016-03-16T09:53:05',
'2016-03-16T16:13:33',
'2016-03-17T13:30:31',
'2016-03-17T13:39:09',
'2016-03-17T16:59:01',
'2016-03-23T12:20:47',
'2016-03-23T13:22:58',
'2016-03-29T17:26:26',
'2016-03-30T09:08:17']
I can store this in empty columns by using:
for i in range(len(trans_dates)):
df[('T' + str(i + 1) + ' - Date')] = trans_dates[i]
However this updates the whole column with the single trans_dates[i] value
I thought looping over each row with the above code would work but it still overwrites.
for issues in all_issues:
for i in range(len(trans_dates)):
df[('T' + str(i + 1) + ' - Date')] = trans_dates[i]
- How do I only update my current row in the loop?
- Am I even going about this the right way? Or is there a faster vectorised way of doing it?
Full code snippet below:
for issues in all_issues:
print(issues)
changelog = issues.changelog
trans_dates = []
from_status = []
to_status = []
for history in changelog.histories:
for item in history.items:
if item.field == 'status':
trans_dates.append(history.created[:19])
from_status.append(item.fromString)
to_status.append(item.toString)
trans_dates = list(reversed(trans_dates))
from_status = list(reversed(from_status))
to_status = list(reversed(to_status))
print(trans_dates)
# Store raw data in created columns and convert dates to pd.to_datetime
for i in range(len(trans_dates)):
df[('T' + str(i + 1) + ' - Date')] = trans_dates[i]
for i in range(len(to_status)):
df[('T' + str(i + 1) + ' - To')] = to_status[i]
for i in range(len(from_status)):
df[('T' + str(i + 1) + ' - From')] = from_status[i]
for i in range(len(trans_dates)):
df['T' + str(i + 1) + ' - Date'] = pd.to_datetime(df['T' + str(i + 1) + ' - Date'])
- EDIT: Sample input and output added.
input: issue/row #1 list (note year changes):
['2016-03-16T09:53:05',
'2016-03-16T16:13:33',
'2016-03-17T13:30:31',
'2016-03-17T13:39:09']
issue #2
['2017-03-16T09:53:05',
'2017-03-16T16:13:33',
'2017-03-17T13:30:31']
issue #3
['2018-03-16T09:53:05',
'2018-03-16T16:13:33',
'2018-03-17T13:30:31']
issue #4
['2015-03-16T09:53:05',
'2015-03-16T16:13:33']
output:
col T1 T2 T3 T4
17 '2016-03-16T09:53:05' '2016-03-16T16:13:33' '2016-03-17T13:30:31' '2016-03-17T13:30:31'
18 '2017-03-16T09:53:05' '2017-03-16T16:13:33' '2017-03-17T13:30:31' np.nan
19 '2018-03-16T09:53:05' '2018-03-16T16:13:33' '2018-03-17T13:30:31' np.nan
20 '2015-03-16T09:53:05' '2015-03-16T16:13:33' np.nan np.nan