pandas reading csv file with dates and data

Question

The first column is dates and the following columns are nodes (6100). When I read the csv, the values are in one column(Added a picture). So, I did the separate by space and added the headers. Each node has 4 values corresponding to one date, but the data gets filled under dates.

What it does:

Date 1   2  3  4 ...
2/12 14 14  14 14
14   13 13  13  nan
14   13 13  13  nan
14   13 13  13  nan

What I am trying to get

Date 1   2  3  4 ...
2/12 14 14  14 14
     14 13 13  13 
     14 13 13  13 
     14 13 13  13 
2/13 ......

I am very stuck on this. Any suggestions would help. Thanks in advance :)

 path =r'Paco' # use your path
    #filenames = glob.glob(path + "/*_drh.csv")
    filenames = glob.glob(path + "/FM2Sim_GW_HeadAllOut.txt")
    filename ={}
    for filename in filenames:
        if len(filenames) == 1:   
    #The setup
            column_name = np.arange(0,6101).tolist()
            daa = pd.read_csv(filename,comment='C',header = None,sep = '\s+', names = column_name, parse_dates=[0],low_memory=False,index_col=False)
            daa = daa.iloc[6:]
            daa = daa.rename(columns = {"0":"Dates"})

Data File

Trenton McKinney · Accepted Answer · 2020-05-25 04:05:19Z

2

The code below skips the first 6 rows of the txt file
The 7th row has a date, which is saved to date_var
The subsequent rows with no date are filled with date_var
row is a list and is appended to data
- each space between the values in the row data is turned into an empty string '' in row.
- list(filter(None, row)) removes all '' from the list
data is loaded into pandas
Now do what you want with the data

import csv
import pandas as pd

# forward fill the missing dates
data = list()
with open('FM2Sim_GW_HeadAllOut.txt', 'r') as f:
    csv_reader = csv.reader(f, delimiter=' ')
    date_var = ''
    for i, row in enumerate(csv_reader):
        if i > 5:  # skip first 6 rows
            if row[0]:
                date_var = row[0]
                # date_var = row[0][:10]  # if you want to get rid of time, use this line
                date_var = date_var.replace('24:00', '23:59')  # remove this line, if removing time
                row[0] = date_var

            else:
                row[0] = date_var

            row = list(filter(None, row)) 
            data.append(row)

# create the dataframe
df = pd.DataFrame(data)

# rename the date column
df.rename(columns={0: 'date'}, inplace=True)

# format as datetime
df.date = pd.to_datetime(df.date, format='%m/%d/%Y_%H:%M')  # format='%m/%d/%Y' if time was removed

# save a new file
df.to_csv('new_file.csv', index=False)

print(df.iloc[:10, :10])

|    | date                |       1 |       2 |       3 |       4 |       5 |       6 |       7 |       8 |       9 |
|---:|:--------------------|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|
|  0 | 1899-09-30 23:59:00 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 17.2917 | 17.2917 |
|  1 | 1899-09-30 23:59:00 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 17.2917 | 17.2917 |
|  2 | 1899-09-30 23:59:00 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 17.2917 | 17.2917 |
|  3 | 1899-09-30 23:59:00 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 17.2917 | 17.2917 |
|  4 | 1899-10-31 23:59:00 | 13.5019 | 13.6842 | 14.0841 | 13.698  | 13.7531 | 13.9286 | 14.0963 | 15.9466 | 16.6629 |
|  5 | 1899-10-31 23:59:00 | 13.9378 | 14.0566 | 14.3744 | 14.0539 | 14.0964 | 14.1527 | 14.3827 | 15.4823 | 16.2371 |
|  6 | 1899-10-31 23:59:00 | 14.4266 | 14.5391 | 14.833  | 14.567  | 14.582  | 14.6196 | 14.9055 | 15.7093 | 16.4724 |
|  7 | 1899-10-31 23:59:00 | 14.8438 | 14.8858 | 15.1216 | 14.9813 | 14.9525 | 14.9419 | 15.1824 | 15.8385 | 16.5648 |
|  8 | 1899-11-30 23:59:00 | 13.0963 | 13.3783 | 13.9715 | 13.3591 | 13.444  | 13.7413 | 14.0693 | 15.3191 | 16.8376 |
|  9 | 1899-11-30 23:59:00 | 13.7826 | 13.9578 | 14.4    | 13.9429 | 13.9827 | 14.1416 | 14.4996 | 15.1693 | 16.3612 |

edited May 25, 2020 at 4:05

answered May 13, 2020 at 22:46

Trenton McKinney

63.2k41 gold badges169 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Daisy Guitron Over a year ago

What date format is that? I currently have %m/%d/%Y_%H:%M, but I get this error (time data '01/31/1900_24:00' does not match format '"%m/%d/%Y_%H:%M"')

Trenton McKinney Over a year ago

@DaisyGuitron The answer has been updated. The problem is 24:00 isn't a valid hour. What you should do when you're processing the data by row, is replace 24:00 with 23:59, then converting to datetime will work.

Daisy Guitron Over a year ago

only some of the dates follow the format, Like the first then shown here do not.

Trenton McKinney Over a year ago

@DaisyGuitron it works for the entire file you shared. If you have other data, you'll have to adapt the script to deal with those changes.

Collectives™ on Stack Overflow

pandas reading csv file with dates and data

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related