1

I want to format data from a text file to a specific format. My data file contains more than 120000 rows but I have posted here truncated data. The data file has R, L, G, C data for different frequencies (here 3 frequency in 3 rows). The file has only 2 columns 1st column is "Freq" and 2nd column is either one of the RLGC data. Now I want to manipulate the data to another format (Let's say, the target .txt). here is the link of data. I want to convert it to target file like this.

Here is my code:

import pandas as pd

#create DataFrame from csv with columns f and v 
df = pd.read_csv('data_in.txt', sep="\s+", names=['freq','v'])
#df = df.astype(float).convert_objects()

#boolean mask for identify columns of new df   
m = df['v'].str.endswith('R', 'L', 'G', 'C')
#new column by replace NaNs by forward filling
df['g'] = df['v'].where(m).ffill()
#get original ordering for new columns
cols = df['g'].unique()
#remove rows with same values in v and g columns
df = df[df['v'] != df['g']]
#reshape by pivoting with change ordering of columns by reindex
df = df.pivot('freq', 'g', 'v').rename_axis(None, axis=1).reindex(columns=cols).reset_index()

df.columns = [x.replace('R','R1:1').replace('L','L1:1').replace('G','G1:1').replace('C','C1:1') for x in df.columns]
df.to_csv('target.txt', index=False, sep='\t')

But it gives the following error:

TypeError: wrapper3() takes from 2 to 3 positional arguments but 5 were given

Can anyone help me to format it into target file.

Now I need another formatting other than target file. I need to format into like "target_2.txt". This is another unusual type of format that is also needed. You can see that each of the R1:1, L1:1, G1:1 and C1:1 data now seem like a block of array (though not an array). If you look closely, for freq, it should names as FORMAT Freq, then a tab, then :, then again a tab and then R1:1. If you see, it will be like - FORMAT Freq+tab+:+tab+R1:1. Then a new line, then 2 tabs, then L1:1. Then again a new line, then 2 tabs, then G1:1. And, finally the same for C1:1. After that a blank line, then follows the 1st row of data, 2nd row of data and continues. The data values will be according to the header line.

How to do that 2nd target file?

I am using Spyder 3.2.6 where python 3.6.4 64-bit is embedded.

3 Answers 3

2

you can't use str.endswith this way. For what you seem to look for, I would say str.contains is a better solution where you look for R or L or ... such as:

m = df['v'].str.contains('R|L|G|C')

Then your code until pivot. I got an error at the pivot line caused by rows with nan, so you may need a dropna and you can rename the columns at the same time:

df = (df.dropna().pivot('freq', 'g', 'v').rename_axis(None, axis=1)
        .reindex(columns=cols).reset_index()
        .rename(columns={col:'{}1:1'.format(col) for col in cols}))

and df looks like:

       freq      R1:1      L1:1      G1:1      C1:1
0  0.00E+00  2.66E+00  3.00E-07  2.76E-16  1.58E-10
1  1.00E+06  2.89E+00  3.10E-07  1.72E-05  1.46E-10
2  2.00E+06  2.98E+00  3.13E-07  3.43E-05  1.45E-10
3  3.00E+06  3.07E+00  3.15E-07  5.15E-05  1.44E-10
Sign up to request clarification or add additional context in comments.

2 Comments

thanks a lot. It worked. I have updated the question for another unusual formatting which is most important. Can you check on that? I have uploaded the 2nd target file as well.
It worked but while my data is big enough and the step size is decreasing between data then the data is not monotonic while saving in the new dataframe. I have created another question. Can you look at it? Here is the link
2

You can do this with pivot after some initial clean-up.

import pandas as pd
df = pd.read_table('data_in.txt', sep='\s+', names=['freq','v'])

# Determine where `'freq'` occurs
mask = df.freq == 'freq'

# Create the column headers you want for each measurement
df.loc[mask, 'col_names'] = df.loc[mask, 'v']
df['col_names'] = df.col_names.ffill() + '1:1'

# Pivot to desired output
df = df.loc[~mask].pivot(index = 'freq', 
                         columns ='col_names', 
                         values = 'v').reset_index()
df.columns.name=None
df = df.astype('float')

Output:

        freq          C1:1          G1:1          L1:1      R1:1
0        0.0  1.580132e-10  2.763283e-16  2.997629e-07  2.661409
1  1000000.0  1.459912e-10  1.716549e-05  3.096696e-07  2.892461
2  2000000.0  1.447848e-10  3.434434e-05  3.130131e-07  2.981991
3  3000000.0  1.440792e-10  5.152409e-05  3.151563e-07  3.066247

2 Comments

Good idea to rename what is going to be the column names during the ffill :) +1
thanks a lot. It also worked. I have updated the question for another unusual formatting which is most important. Can you check on that? I have uploaded the 2nd target file as well.
1

I would do it with regular string manipulations like this:

#open file
filename='data_in.txt'
file = open(filename,'r')
fileData=file.read()
file.close() 

#remove carriage returns
fileData=fileData.replace("\r","")


chunkNumber=0
data=[]

for chunk in fileData.split("\n\n\n"):
    chunkNumber+=1
    chunkType=chunk.split("\n")[0].split("\t")[1]
    firstData=["freq"]
    thisData=["%s:%s"%(chunkType,chunkNumber)]
    for line in chunk.split("\n")[1:]:
        entries=line.split("    ")
        thisData.append(entries[1])
        firstData.append(entries[0])
    data.append(thisData)
data=[firstData]+data

string=""
for j in range(5):
    for k in data:
        string+=k[j]+"\t"
    string=string[:-1]+"\n"

filename='output.txt'
file = open(filename,'w')
file.writelines(string)
file.close() 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.