convert .dat into .csv in python

Question

I want to convert a data set of an .dat file into csv file. The data format looks like,

Each row begins with the sentiment score followed by the text associated with that rating.

Image of the .dat file

I want the have sentiment value of (-1 or 1) to have a column and the text of review corresponding to the sentiment value to have an review to have an column.

WHAT I TRIED SO FAR

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np  
import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("train.dat").readlines()]

# write it as a new CSV file
with open("train.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)
def your_func(row):
    return row['Sentiments'] / row['Review']

columns_to_keep = ['Sentiments', 'Review']
dataframe = pd.read_csv("train.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

print dataframe

Sample screen shot of the resulting train.csv it has an comma after every word in the review.

Output of the train.csv

So what did you learn about pandas' read_csv, it's a one-liner. — sascha
– sascha, Commented Oct 9, 2017 at 1:51
What is separating the score from the text? A space or a tab? — Evan Nowak
– Evan Nowak, Commented Oct 9, 2017 at 1:52
@sascha that keeps giving error prolly due to the fact its not .csv format. I did df = pd.read_csv("train.dat") — KoushikProgrammer
– KoushikProgrammer, Commented Oct 9, 2017 at 1:56
read_csv has parameters and csv is a very general format! But Evan is right; it might be easier if it's a tab; if it's a space; you can do it too; but it will be harder. — sascha
– sascha, Commented Oct 9, 2017 at 1:57

cs95 · Accepted Answer · 2019-06-21 18:00:17Z

4

If all your rows follow that consistent format, you can use pd.read_fwf. This is a little safer than using read_csv, in the event that your second column also contains the delimiter you are attempting to split on.

df = pd.read_fwf('data.txt', header=None, 
        widths=[2, int(1e5)], names=['label', 'text'])

print(df)
   label                       text
0     -1  ieafxf  rjzy xfxk ymi wuy
1      1     lqqm  ceegjnbjpxnidygr
2     -1  zss awoj anxb rfw  kgbvnl

data.txt

-1  ieafxf  rjzy xfxk ymi wuy
+1  lqqm  ceegjnbjpxnidygr
-1  zss awoj anxb rfw  kgbvnl

edited Jun 21, 2019 at 18:00

answered Oct 9, 2017 at 2:05

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

KoushikProgrammer Over a year ago

@COLDSPEED hey the problem lies in the fact I have no headers as in label and text, do I just make them up?

cs95 Over a year ago

@KoushikProgrammer I know you don't have them, I made them up for you. You don't have to modify your data file.

KoushikProgrammer Over a year ago

@COLDSPEED thanks. Hey what is the purpose of int(1e15)?

Evan Nowak · Accepted Answer · 2017-10-09 02:24:22Z

0

As mentioned in the comments, read_csv would be appropriate here.

df = pd.read_csv('train_csv.csv', sep='\t', names=['Sentiments', 'Review'])

  Sentiments     Review
0         -1    alskjdf
1          1      asdfa
2          1       afsd
3         -1        sdf

answered Oct 9, 2017 at 2:24

Evan Nowak

8954 silver badges8 bronze badges

Collectives™ on Stack Overflow

convert .dat into .csv in python

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related