Need help creating new columns in csv file with dates

Question

I have a csv file that has multiple columns, one named 'dates'. Inside 'dates' contains some dates, and some of the rows have more dates than others. Here is an example of what the dates look like inside of the file:

dates
31-Mar-24
Nov 21, 2024, Apr 14, 2025, May 18, 2025
21-Oct-23
26-Sep-24
22-Nov-23
24-Sep-24
13-Nov-23
10-Apr-24
23-Sep-23
Apr 16, 2025, Jun 04, 2025

I would like the dates to be separated into their own individual columns. If you use my previous example, one of the rows has the data 'Nov 21, 2024, Apr 14, 2025, May 18, 2025'. I would like it to look like this in the output:

date                date2:                date3:
nov 1, 2024             Apr 14, 2025    you get it right

And so on, you get the point, I want all columns with more than one date to be placed into another column but it must be in the same row.

Here is the code that I have tried but it did not work:

import csv

with open('input.csv', 'r') as csv_input_file, open('output.csv', 'w', newline='') as csv_output_file:
    reader = csv.reader(csv_input_file)
    writer = csv.writer(csv_output_file)

    header = next(reader)
    new_columns = []
    for column in header:
        if column == 'dates':
            new_columns.extend(['date' + str(i+1) for i in range(10)])  # maximum of 10 date columns
        else:
            new_columns.append(column)
    writer.writerow(new_columns)

    for row in reader:
        dates_str = row[header.index('dates')]
        dates_list = dates_str.split(',')
        dates_list = [date.strip() for date in dates_list]

        new_row = []
        for column in row:
            if column == dates_str:
                new_row.extend(dates_list)
                new_row.extend([''] * (len(new_columns) - len(new_row)))
            else:
                new_row.append(column)

        writer.writerow(new_row)

This is my current code but the issue is that it is splitting the dates at every comma so the years are being split from the months and day into all new columns. Cant seem to find a solution for it and was hoping someone on here could help.

Saxtheowl · Accepted Answer · 2023-04-17 21:17:28Z

1

Use a regular expression to split the dates, which will prevent splitting on commas inside the date strings.

import csv
import re

with open('input.csv', 'r') as csv_input_file, open('output.csv', 'w', newline='') as csv_output_file:
    reader = csv.reader(csv_input_file)
    writer = csv.writer(csv_output_file)

    header = next(reader)
    new_columns = []
    for column in header:
        if column == 'dates':
            new_columns.extend(['date' + str(i+1) for i in range(10)])  # maximum of 10 date columns
        else:
            new_columns.append(column)
    writer.writerow(new_columns)

    date_pattern = re.compile(r'\d{1,2}-\w{3}-\d{2,4}|\w{3}\s\d{1,2},\s\d{4}')

    for row in reader:
        dates_str = row[header.index('dates')]
        dates_list = date_pattern.findall(dates_str)
        dates_list = [date.strip() for date in dates_list]

        new_row = []
        for idx, column in enumerate(row):
            if header[idx] == 'dates':
                new_row.extend(dates_list)
                new_row.extend([''] * (len(new_columns) - len(new_row)))
            else:
                new_row.append(column)

        writer.writerow(new_row)

edited Apr 17, 2023 at 21:17

answered Apr 17, 2023 at 21:07

Saxtheowl

4,7025 gold badges28 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Camol1 Over a year ago

This worked perfectly except it got rid of all the years attached to the dates! How do we keep those? Edit: The full dates are formatted as such Dec 01, 2023,

Saxtheowl Over a year ago

I modified my code slightly

Timeless · Accepted Answer · 2023-04-17 21:15:24Z

1

With pandas, you can try this :

with open("input.csv", "r") as f:
    data = f.read()
    
df = (pd.Series(data.split("\n"))
          .loc[1:].str.split(r",\s*(?=[A-Z]+)", expand=True)
          .rename(lambda x: x+1, axis=1)
          .add_prefix("date")
     )

#df.to_csv("output.csv", index=False) #uncomment this line to make a csv

Output :

print(df)

           date1         date2         date3
1      31-Mar-24          None          None
2   Nov 21, 2024  Apr 14, 2025  May 18, 2025
3      21-Oct-23          None          None
4      26-Sep-24          None          None
5      22-Nov-23          None          None
6      24-Sep-24          None          None
7      13-Nov-23          None          None
8      10-Apr-24          None          None
9      23-Sep-23          None          None
10  Apr 16, 2025  Jun 04, 2025          None

answered Apr 17, 2023 at 21:15

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

1 Comment

Camol1 Over a year ago

Did not try this due to the above answer working for me but I appreciate your help anyways :)

Collectives™ on Stack Overflow

Need help creating new columns in csv file with dates

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related