0

The first column is dates and the following columns are nodes (6100). When I read the csv, the values are in one column(Added a picture). So, I did the separate by space and added the headers. Each node has 4 values corresponding to one date, but the data gets filled under dates.

What it does:

Date 1   2  3  4 ...
2/12 14 14  14 14
14   13 13  13  nan
14   13 13  13  nan
14   13 13  13  nan

What I am trying to get

Date 1   2  3  4 ...
2/12 14 14  14 14
     14 13 13  13 
     14 13 13  13 
     14 13 13  13 
2/13 ...... 

I am very stuck on this. Any suggestions would help. Thanks in advance :)

 path =r'Paco' # use your path
    #filenames = glob.glob(path + "/*_drh.csv")
    filenames = glob.glob(path + "/FM2Sim_GW_HeadAllOut.txt")
    filename ={}
    for filename in filenames:
        if len(filenames) == 1:   
    #The setup
            column_name = np.arange(0,6101).tolist()
            daa = pd.read_csv(filename,comment='C',header = None,sep = '\s+', names = column_name, parse_dates=[0],low_memory=False,index_col=False)
            daa = daa.iloc[6:]
            daa = daa.rename(columns = {"0":"Dates"}) 

Data File enter image description here

0

1 Answer 1

2
  • The code below skips the first 6 rows of the txt file
  • The 7th row has a date, which is saved to date_var
  • The subsequent rows with no date are filled with date_var
  • row is a list and is appended to data
    • each space between the values in the row data is turned into an empty string '' in row.
    • list(filter(None, row)) removes all '' from the list
  • data is loaded into pandas
  • Now do what you want with the data
import csv
import pandas as pd

# forward fill the missing dates
data = list()
with open('FM2Sim_GW_HeadAllOut.txt', 'r') as f:
    csv_reader = csv.reader(f, delimiter=' ')
    date_var = ''
    for i, row in enumerate(csv_reader):
        if i > 5:  # skip first 6 rows
            if row[0]:
                date_var = row[0]
                # date_var = row[0][:10]  # if you want to get rid of time, use this line
                date_var = date_var.replace('24:00', '23:59')  # remove this line, if removing time
                row[0] = date_var

            else:
                row[0] = date_var

            row = list(filter(None, row)) 
            data.append(row)

# create the dataframe
df = pd.DataFrame(data)

# rename the date column
df.rename(columns={0: 'date'}, inplace=True)

# format as datetime
df.date = pd.to_datetime(df.date, format='%m/%d/%Y_%H:%M')  # format='%m/%d/%Y' if time was removed

# save a new file
df.to_csv('new_file.csv', index=False)

print(df.iloc[:10, :10])

|    | date                |       1 |       2 |       3 |       4 |       5 |       6 |       7 |       8 |       9 |
|---:|:--------------------|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|
|  0 | 1899-09-30 23:59:00 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 17.2917 | 17.2917 |
|  1 | 1899-09-30 23:59:00 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 17.2917 | 17.2917 |
|  2 | 1899-09-30 23:59:00 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 17.2917 | 17.2917 |
|  3 | 1899-09-30 23:59:00 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 14.3833 | 17.2917 | 17.2917 |
|  4 | 1899-10-31 23:59:00 | 13.5019 | 13.6842 | 14.0841 | 13.698  | 13.7531 | 13.9286 | 14.0963 | 15.9466 | 16.6629 |
|  5 | 1899-10-31 23:59:00 | 13.9378 | 14.0566 | 14.3744 | 14.0539 | 14.0964 | 14.1527 | 14.3827 | 15.4823 | 16.2371 |
|  6 | 1899-10-31 23:59:00 | 14.4266 | 14.5391 | 14.833  | 14.567  | 14.582  | 14.6196 | 14.9055 | 15.7093 | 16.4724 |
|  7 | 1899-10-31 23:59:00 | 14.8438 | 14.8858 | 15.1216 | 14.9813 | 14.9525 | 14.9419 | 15.1824 | 15.8385 | 16.5648 |
|  8 | 1899-11-30 23:59:00 | 13.0963 | 13.3783 | 13.9715 | 13.3591 | 13.444  | 13.7413 | 14.0693 | 15.3191 | 16.8376 |
|  9 | 1899-11-30 23:59:00 | 13.7826 | 13.9578 | 14.4    | 13.9429 | 13.9827 | 14.1416 | 14.4996 | 15.1693 | 16.3612 |
Sign up to request clarification or add additional context in comments.

4 Comments

What date format is that? I currently have %m/%d/%Y_%H:%M, but I get this error (time data '01/31/1900_24:00' does not match format '"%m/%d/%Y_%H:%M"')
@DaisyGuitron The answer has been updated. The problem is 24:00 isn't a valid hour. What you should do when you're processing the data by row, is replace 24:00 with 23:59, then converting to datetime will work.
only some of the dates follow the format, Like the first then shown here do not.
@DaisyGuitron it works for the entire file you shared. If you have other data, you'll have to adapt the script to deal with those changes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.