Data formatting and manipulation in Python

Question

I want to format data from a text file to a specific format. My data file contains more than 120000 rows but I have posted here truncated data. The data file has R, L, G, C data for different frequencies (here 3 frequency in 3 rows). The file has only 2 columns 1st column is "Freq" and 2nd column is either one of the RLGC data. Now I want to manipulate the data to another format (Let's say, the target .txt). here is the link of data. I want to convert it to target file like this.

Here is my code:

import pandas as pd

#create DataFrame from csv with columns f and v 
df = pd.read_csv('data_in.txt', sep="\s+", names=['freq','v'])
#df = df.astype(float).convert_objects()

#boolean mask for identify columns of new df   
m = df['v'].str.endswith('R', 'L', 'G', 'C')
#new column by replace NaNs by forward filling
df['g'] = df['v'].where(m).ffill()
#get original ordering for new columns
cols = df['g'].unique()
#remove rows with same values in v and g columns
df = df[df['v'] != df['g']]
#reshape by pivoting with change ordering of columns by reindex
df = df.pivot('freq', 'g', 'v').rename_axis(None, axis=1).reindex(columns=cols).reset_index()

df.columns = [x.replace('R','R1:1').replace('L','L1:1').replace('G','G1:1').replace('C','C1:1') for x in df.columns]
df.to_csv('target.txt', index=False, sep='\t')

But it gives the following error:

TypeError: wrapper3() takes from 2 to 3 positional arguments but 5 were given

Can anyone help me to format it into target file.

Now I need another formatting other than target file. I need to format into like "target_2.txt". This is another unusual type of format that is also needed. You can see that each of the R1:1, L1:1, G1:1 and C1:1 data now seem like a block of array (though not an array). If you look closely, for freq, it should names as FORMAT Freq, then a tab, then :, then again a tab and then R1:1. If you see, it will be like - FORMAT Freq+tab+:+tab+R1:1. Then a new line, then 2 tabs, then L1:1. Then again a new line, then 2 tabs, then G1:1. And, finally the same for C1:1. After that a blank line, then follows the 1st row of data, 2nd row of data and continues. The data values will be according to the header line.

How to do that 2nd target file?

I am using Spyder 3.2.6 where python 3.6.4 64-bit is embedded.

Ben.T · Accepted Answer · 2018-08-14 20:25:56Z

2

you can't use str.endswith this way. For what you seem to look for, I would say str.contains is a better solution where you look for R or L or ... such as:

m = df['v'].str.contains('R|L|G|C')

Then your code until pivot. I got an error at the pivot line caused by rows with nan, so you may need a dropna and you can rename the columns at the same time:

df = (df.dropna().pivot('freq', 'g', 'v').rename_axis(None, axis=1)
        .reindex(columns=cols).reset_index()
        .rename(columns={col:'{}1:1'.format(col) for col in cols}))

and df looks like:

       freq      R1:1      L1:1      G1:1      C1:1
0  0.00E+00  2.66E+00  3.00E-07  2.76E-16  1.58E-10
1  1.00E+06  2.89E+00  3.10E-07  1.72E-05  1.46E-10
2  2.00E+06  2.98E+00  3.13E-07  3.43E-05  1.45E-10
3  3.00E+06  3.07E+00  3.15E-07  5.15E-05  1.44E-10

answered Aug 14, 2018 at 20:25

Ben.T

29.7k6 gold badges39 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

aguntuk Over a year ago

thanks a lot. It worked. I have updated the question for another unusual formatting which is most important. Can you check on that? I have uploaded the 2nd target file as well.

aguntuk Over a year ago

It worked but while my data is big enough and the step size is decreasing between data then the data is not monotonic while saving in the new dataframe. I have created another question. Can you look at it? Here is the link

ALollz · Accepted Answer · 2018-08-14 20:25:33Z

2

You can do this with pivot after some initial clean-up.

import pandas as pd
df = pd.read_table('data_in.txt', sep='\s+', names=['freq','v'])

# Determine where `'freq'` occurs
mask = df.freq == 'freq'

# Create the column headers you want for each measurement
df.loc[mask, 'col_names'] = df.loc[mask, 'v']
df['col_names'] = df.col_names.ffill() + '1:1'

# Pivot to desired output
df = df.loc[~mask].pivot(index = 'freq', 
                         columns ='col_names', 
                         values = 'v').reset_index()
df.columns.name=None
df = df.astype('float')

Output:

        freq          C1:1          G1:1          L1:1      R1:1
0        0.0  1.580132e-10  2.763283e-16  2.997629e-07  2.661409
1  1000000.0  1.459912e-10  1.716549e-05  3.096696e-07  2.892461
2  2000000.0  1.447848e-10  3.434434e-05  3.130131e-07  2.981991
3  3000000.0  1.440792e-10  5.152409e-05  3.151563e-07  3.066247

answered Aug 14, 2018 at 20:25

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

2 Comments

Ben.T Over a year ago

Good idea to rename what is going to be the column names during the ffill :) +1

aguntuk Over a year ago

thanks a lot. It also worked. I have updated the question for another unusual formatting which is most important. Can you check on that? I have uploaded the 2nd target file as well.

user1763510 · Accepted Answer · 2018-08-14 20:36:30Z

I would do it with regular string manipulations like this:

#open file
filename='data_in.txt'
file = open(filename,'r')
fileData=file.read()
file.close() 

#remove carriage returns
fileData=fileData.replace("\r","")


chunkNumber=0
data=[]

for chunk in fileData.split("\n\n\n"):
    chunkNumber+=1
    chunkType=chunk.split("\n")[0].split("\t")[1]
    firstData=["freq"]
    thisData=["%s:%s"%(chunkType,chunkNumber)]
    for line in chunk.split("\n")[1:]:
        entries=line.split("    ")
        thisData.append(entries[1])
        firstData.append(entries[0])
    data.append(thisData)
data=[firstData]+data

string=""
for j in range(5):
    for k in data:
        string+=k[j]+"\t"
    string=string[:-1]+"\n"

filename='output.txt'
file = open(filename,'w')
file.writelines(string)
file.close()

Collectives™ on Stack Overflow

Data formatting and manipulation in Python

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related