Pandas : how to add Column name on dataframe on csv file

Question

new but excited about Python and i need your advice. I came up with the following code to compare two CSV files based on nmap scan:

import pandas as pd
from pandas import DataFrame
import os
file = raw_input('\nEnter the Old CSV file: ')
file1 = raw_input('\nEnter the New CSV file: ')
A=set(pd.read_csv(file, index_col=False, header=None)[0])
B=set(pd.read_csv(file1, index_col=False, header=None)[0])
final=list(A-B)
df = pd.DataFrame(final, columns=["host"])
df.to_csv('DIFF_'+file)

print "Completed!"

when i run it i got the following results: ,

host
0,82.214.228.71;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
1,82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;

My question is how to add a label/enter code herename on the columns 2,3 etc for example: hostanme , port , port name ,state etc. I have tried : df['hostname'] = range(1, len(df) + 1) but this adds the hostname on the first column along with host when i open the file with Excel

Do you want compare all columns or only first?

jezrael
– jezrael

2017-08-14 11:01:50 +00:00
Commented Aug 14, 2017 at 11:01 — jezrael
– jezrael, Commented Aug 14, 2017 at 11:01

jezrael · Accepted Answer · 2017-08-14 11:27:45Z

3

I think you need read_csv with parameter sep=',' and names for define columns names first:

file = raw_input('\nEnter the Old CSV file: ')
file1 = raw_input('\nEnter the New CSV file: ')

cols = ['hostname','port','portname', ...]
A= pd.read_csv(file, index_col=False, header=None, sep=';', names=cols)
B= pd.read_csv(file1, index_col=False, header=None, sep=';', names=cols)

Then use merge with comparing by boolean indexing if need compare all columns:

df = pd.merge(A, B, how='outer', indicator=True)
df = df[df['_merge']=='left_only'].drop('_merge',axis=1)

df.to_csv('DIFF_'+file)

print "Completed!"

Sample:

import pandas as pd
from pandas.compat import StringIO

temp=u"""82.214.228.71;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
82.214.228.74;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
82.214.228.75;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
cols = ['hostname','port','portname', 'a','b','c','d','e','f','g','h','i', 'j']
A = pd.read_csv(StringIO(temp), sep=";", names=cols)
print (A)
        hostname                         port portname    a    b        c  \
0  82.214.228.71  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   
1  82.214.228.70  dsl-radius-01.direcpceu.com      PTR  tcp  111  rpcbind   
2  82.214.228.74  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   
3  82.214.228.75  dsl-radius-01.direcpceu.com      PTR  tcp  111  rpcbind   

      d   e   f        g   h  i   j  
0  open NaN NaN  syn-ack NaN  3 NaN  
1  open NaN NaN  syn-ack NaN  3 NaN  
2  open NaN NaN  syn-ack NaN  3 NaN  
3  open NaN NaN  syn-ack NaN  3 NaN

temp=u"""82.214.228.75;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
82.214.228.77;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
cols = ['hostname','port','portname', 'a','b','c','d','e','f','g','h','i', 'j']
B = pd.read_csv(StringIO(temp), sep=";", names=cols)
print (B)
        hostname                         port portname    a    b        c  \
0  82.214.228.75  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   
1  82.214.228.70  dsl-radius-01.direcpceu.com      PTR  tcp  111  rpcbind   
2  82.214.228.77  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   

      d   e   f        g   h  i   j  
0  open NaN NaN  syn-ack NaN  3 NaN  
1  open NaN NaN  syn-ack NaN  3 NaN  
2  open NaN NaN  syn-ack NaN  3 NaN

df1 = pd.merge(A, B, how='outer', indicator=True)

print (df1)

        hostname                         port portname    a    b        c  \
0  82.214.228.71  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   
1  82.214.228.70  dsl-radius-01.direcpceu.com      PTR  tcp  111  rpcbind   
2  82.214.228.74  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   
3  82.214.228.75  dsl-radius-01.direcpceu.com      PTR  tcp  111  rpcbind   
4  82.214.228.75  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   
5  82.214.228.77  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   

      d   e   f        g   h  i   j      _merge  
0  open NaN NaN  syn-ack NaN  3 NaN   left_only  
1  open NaN NaN  syn-ack NaN  3 NaN        both  
2  open NaN NaN  syn-ack NaN  3 NaN   left_only  
3  open NaN NaN  syn-ack NaN  3 NaN   left_only  
4  open NaN NaN  syn-ack NaN  3 NaN  right_only  
5  open NaN NaN  syn-ack NaN  3 NaN  right_only

#only values in A
df1 = df1[df1['_merge']=='left_only'].drop('_merge',axis=1)
print (df1)
        hostname                         port portname    a    b        c  \
0  82.214.228.71  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   
2  82.214.228.74  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   
3  82.214.228.75  dsl-radius-01.direcpceu.com      PTR  tcp  111  rpcbind   

      d   e   f        g   h  i   j  
0  open NaN NaN  syn-ack NaN  3 NaN  
2  open NaN NaN  syn-ack NaN  3 NaN  
3  open NaN NaN  syn-ack NaN  3 NaN

#only values in B
df1 = pd.merge(A, B, how='outer', indicator=True)
df11 = df1[df1['_merge']=='right_only'].drop('_merge',axis=1)
print (df11)
        hostname                         port portname    a    b        c  \
4  82.214.228.75  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   
5  82.214.228.77  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   

      d   e   f        g   h  i   j  
4  open NaN NaN  syn-ack NaN  3 NaN  
5  open NaN NaN  syn-ack NaN  3 NaN

#same values in both dataframes
df12 = df1[df1['_merge']=='both'].drop('_merge',axis=1)
print (df12)
        hostname                         port portname    a    b        c  \
1  82.214.228.70  dsl-radius-01.direcpceu.com      PTR  tcp  111  rpcbind   

      d   e   f        g   h  i   j  
1  open NaN NaN  syn-ack NaN  3 NaN

But if need compare only first column hostname use isin for mask, ~ for inverting with boolean indexing:

df2 = A[~A['hostname'].isin(B['hostname'])]
print (df2)
        hostname                         port portname    a    b        c  \
0  82.214.228.71  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   
2  82.214.228.74  dsl-radius-02.direcpceu.com      PTR  tcp  111  rpcbind   

      d   e   f        g   h  i   j  
0  open NaN NaN  syn-ack NaN  3 NaN  
2  open NaN NaN  syn-ack NaN  3 NaN

edited Aug 14, 2017 at 11:27

answered Aug 14, 2017 at 10:55

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Ivan Madolev Over a year ago

hey Jez.Thanks! WIll try as well and get back

jezrael Over a year ago

Yes, sure. small notice - if csv has csv header also, remove parameter header=None and parametr names

Ivan Madolev Over a year ago

Perfect Jez! worked like a charm! Had just to add sep=';' on the writing statement : df.to_csv('DIFF_'+file , sep=';') and i got what i wanted :).I am accpeting this answer and just one more thing if you dont mind. I am getting the following: host hostname hostname_type protocol port \ 24 82.214.228.70 dsl-radius-01.direcpceu.com PTR tcp 111 32 82.214.228.71 dsl-radius-02.direcpceu.com PTR tcp 111

Ivan Madolev Over a year ago

was thinking the same ..:).All set ! Thank you

jezrael Over a year ago

df1['_merge']=='both' to df1['_merge']!='both' for select right or left only.

|

Amit · Accepted Answer · 2017-08-14 10:54:58Z

1

You can add the labels where you are defining the dataframe. For example, the following should work

df = pd.DataFrame(final, columns=["host"].append([x for x in range(1, len(df) + 1)] ))

answered Aug 14, 2017 at 10:54

Amit

20.6k7 gold badges51 silver badges55 bronze badges

1 Comment

Ivan Madolev Over a year ago

Thanks Amit! Will try and get back

Collectives™ on Stack Overflow

Pandas : how to add Column name on dataframe on csv file

2 Answers 2

11 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

11 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related