1

I have a csv file that has multiple columns, one named 'dates'. Inside 'dates' contains some dates, and some of the rows have more dates than others. Here is an example of what the dates look like inside of the file:

dates
31-Mar-24
Nov 21, 2024, Apr 14, 2025, May 18, 2025
21-Oct-23
26-Sep-24
22-Nov-23
24-Sep-24
13-Nov-23
10-Apr-24
23-Sep-23
Apr 16, 2025, Jun 04, 2025

I would like the dates to be separated into their own individual columns. If you use my previous example, one of the rows has the data 'Nov 21, 2024, Apr 14, 2025, May 18, 2025'. I would like it to look like this in the output:

date                date2:                date3:
nov 1, 2024             Apr 14, 2025    you get it right

And so on, you get the point, I want all columns with more than one date to be placed into another column but it must be in the same row.

Here is the code that I have tried but it did not work:

import csv

with open('input.csv', 'r') as csv_input_file, open('output.csv', 'w', newline='') as csv_output_file:
    reader = csv.reader(csv_input_file)
    writer = csv.writer(csv_output_file)

    header = next(reader)
    new_columns = []
    for column in header:
        if column == 'dates':
            new_columns.extend(['date' + str(i+1) for i in range(10)])  # maximum of 10 date columns
        else:
            new_columns.append(column)
    writer.writerow(new_columns)

    for row in reader:
        dates_str = row[header.index('dates')]
        dates_list = dates_str.split(',')
        dates_list = [date.strip() for date in dates_list]

        new_row = []
        for column in row:
            if column == dates_str:
                new_row.extend(dates_list)
                new_row.extend([''] * (len(new_columns) - len(new_row)))
            else:
                new_row.append(column)

        writer.writerow(new_row)

This is my current code but the issue is that it is splitting the dates at every comma so the years are being split from the months and day into all new columns. Cant seem to find a solution for it and was hoping someone on here could help.

2 Answers 2

1

Use a regular expression to split the dates, which will prevent splitting on commas inside the date strings.

import csv
import re

with open('input.csv', 'r') as csv_input_file, open('output.csv', 'w', newline='') as csv_output_file:
    reader = csv.reader(csv_input_file)
    writer = csv.writer(csv_output_file)

    header = next(reader)
    new_columns = []
    for column in header:
        if column == 'dates':
            new_columns.extend(['date' + str(i+1) for i in range(10)])  # maximum of 10 date columns
        else:
            new_columns.append(column)
    writer.writerow(new_columns)

    date_pattern = re.compile(r'\d{1,2}-\w{3}-\d{2,4}|\w{3}\s\d{1,2},\s\d{4}')

    for row in reader:
        dates_str = row[header.index('dates')]
        dates_list = date_pattern.findall(dates_str)
        dates_list = [date.strip() for date in dates_list]

        new_row = []
        for idx, column in enumerate(row):
            if header[idx] == 'dates':
                new_row.extend(dates_list)
                new_row.extend([''] * (len(new_columns) - len(new_row)))
            else:
                new_row.append(column)

        writer.writerow(new_row)
Sign up to request clarification or add additional context in comments.

2 Comments

This worked perfectly except it got rid of all the years attached to the dates! How do we keep those? Edit: The full dates are formatted as such Dec 01, 2023,
I modified my code slightly
1

With , you can try this :

with open("input.csv", "r") as f:
    data = f.read()
    
df = (pd.Series(data.split("\n"))
          .loc[1:].str.split(r",\s*(?=[A-Z]+)", expand=True)
          .rename(lambda x: x+1, axis=1)
          .add_prefix("date")
     )

#df.to_csv("output.csv", index=False) #uncomment this line to make a csv

Output :

print(df)

           date1         date2         date3
1      31-Mar-24          None          None
2   Nov 21, 2024  Apr 14, 2025  May 18, 2025
3      21-Oct-23          None          None
4      26-Sep-24          None          None
5      22-Nov-23          None          None
6      24-Sep-24          None          None
7      13-Nov-23          None          None
8      10-Apr-24          None          None
9      23-Sep-23          None          None
10  Apr 16, 2025  Jun 04, 2025          None

1 Comment

Did not try this due to the above answer working for me but I appreciate your help anyways :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.