1

I have not used Pandas before and looks like I need some initial help. I could not really find this specific example anywhere.

I have a csv file, say file1.csv as following:

ID     value1     value2
1       100        200
2       101        201

I need to read 1 line at a time from file1.csv, append 2 new column info/data, and then write everything to a new file called file2.csv. file2.csv is supposed to look like following:

ID     value1     value2     value3     value4
1       100        200        10         20
2       101        201        11         21

Can anyone guide or give a short example showing how to do this (reading file1, appending the new data (value3 and value4 columns), and writing it to file2)?

ADDENDUM: I need to read 1 line at a time from file1 and write 1 line at a time to file2.

3
  • pandas has very good tools for reading in all kinds of formats. See pandas.read_csv. And equivalently, you can then save your DataFrame to a csv with DataFrame.to_csv Commented Jul 11, 2018 at 2:10
  • 1
    If you're set on reading one line at a time, I don't think pandas is the tool for you (and there will likely be some slow-downs because of it). A simple with open('file.csv') as f: ... will suffice Commented Jul 11, 2018 at 3:08
  • @Arda Arslan, thanks for this additional comment. Performance is not the issue for my specific case but the memory.Pandas is also sth I want to use more moving forward so it is good exercise for me. Commented Jul 11, 2018 at 3:18

4 Answers 4

3

The following will load file1.csv, add in columns 'value3' and 'value4' and output the resulting dataframe as a csv.

import pandas as pd

df = pd.read_csv('file1.csv')
df['value3'] = [10, 11]
df['value4'] = [20, 21]
df.to_csv('file2.csv')

Contents of file1.csv:

ID,value1,value2
1,100,200
2,101,201

Contents of file2.csv:

,ID,value1,value2,value3,value4
0,1,100,200,10,20
1,2,101,201,11,21
Sign up to request clarification or add additional context in comments.

8 Comments

This is great! But I need to read 1 line at a time and then need to write 1 line at a time to file2. It is my mistake that I missed to mention it above. If I read it right, your code is reads/writes the entire file at once. How can I read and write 1 line at a time?
@edn, but why do you need to do one line at a time? The major power of pandas is that it allows you to move away from doing things one row at a time, and instead you can perform vectorized operations on the entire DataFrame
I agree with @ALollz I don't think I understand why you specifically would want to read one line at a time
I see your point. Because I will be using the data from file1 for other purposes. value3 and value4 columns above will be the result of that processing. file1 is a big file and I dont have the possibility to process everything atonce.. That's why... I can actually process 5 or 10 rows as well but if I see how to do this for 1 rowat a time, I believe it will be easier to configure it later on.
And at the very least, you should be able to fit more than a single row into memory. At least you could process the file in larger chunks, maybe several thousand at a time.
|
2

Though there are typically better solutions, like using Dask, changing the dtypes or using categorical variables, one alternative is to simply process the file in chunks.

import pandas as pd

# Read one line at at time. Change chunksize to process more lines at a time. 
reader = pd.read_csv('test.csv', chunksize=1)
write_header = True  # Needed to get header for first chunk

for chunk in reader:
    # Do some stuff
    chunk['val3'] = chunk.val1**2
    chunk['val4'] = chunk.val2*4

    # Save the file to a csv, appending each new chunk you process. mode='a' means append.
    chunk.to_csv('final.csv', mode='a', header=write_header, index=False)
    write_header = False  # Update so later chunks don't write header

Sample Data: test.csv

val1,val2
1,2
3,4
5,6
7,8
9,10
11,12
13,14
15,16

Output: final.csv

val1,val2,val3,val4
1,2,1,8
3,4,9,16
5,6,25,24
7,8,49,32
9,10,81,40
11,12,121,48
13,14,169,56
15,16,225,64

1 Comment

Seemingly we submitted similar answers at the same time but yours is more elegant, it provides a solution with less code. Thank you! Solved!
2

Use read_csv and to_csv. Use the index keyword arg in to_csv to keep or remove the index.

In [117]: df = pd.read_csv('eg.csv')

In [118]: df
Out[118]:
   col 1  col 2  col 3
0      4      5      6
1      7      8      9

In [119]: df['new col'] = 'data'

In [120]: df
Out[120]:
   col 1  col 2  col 3 new col
0      4      5      6    data
1      7      8      9    data

In [121]: df.to_csv('eg.new.csv')

In [122]: new_df = pd.read_csv('eg.new.csv')      # includes the index

In [123]: new_df
Out[123]:
   Unnamed: 0  col 1  col 2  col 3 new col
0           0      4      5      6    data
1           1      7      8      9    data

In [124]: df.to_csv('eg.new.csv', index=False)    # excludes index

In [125]: new_df = pd.read_csv('eg.new.csv')

In [126]: new_df
Out[126]:
   col 1  col 2  col 3 new col
0      4      5      6    data
1      7      8      9    data

Comments

1

Looks like the following code snippet is solving my problem. Thanks to @aydow and @Arda Arslan for given inspiration.

The following piece of code creates the file2 with header names only, and the rest is empty.

column_names = ['ID', 'value1', 'value2', 'value3', 'value4']
raw_data = {column_names[0]: [], 
            column_names[1]: [],
            column_names[2]: [],
            column_names[3]: [], 
            column_names[4]: []}
df = pd.DataFrame(raw_data, columns = column_names)
df.to_csv("file2.csv", index=False) 

And the following piece of code reads 1 line at a time from file1 and appends it to file2.

for df in pd.read_csv('file1.csv', chunksize=1):
    df['value3'] = 11
    df['value4'] = 22
    df.to_csv("file2.csv", header=False, index=False, mode='a')

And changing the value of parameter chunksize is helping to change the # rows that you want to read/write at a time. Your improvement comments are more than welcome if you think it can be done more elegantly.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.