3

I have a dataset with temperature as one column. Due to how the heater works, there are a number of gaps in the data. In order to make different data sets directly comparable, I want to fill in these missing temperatures and add corresponding NaNs in another column.

I've tried to use the answer given here, which seems to be exactly what I want: link. But that doesn't work - I get a data frame with the new temperature values I want, but the corresponding data has gone:

import pandas as pd 
import numpy as np           
A1 = pd.read_table('Test data.tsv', encoding='ISO-8859-1', header = 2) 
A1.columns = ['time',2,3,4,5,6,7,'freq',9,10,11,12,13,'temp',15,16,17,18,19] 
A1truncated = A1[A1.temp >= 25]; A1truncated=A1truncated[A1truncated.temp <= 350.1]
A1averaged = A1truncated.groupby(['temp'], as_index=False)['freq'].mean() 
A1averaged = np.around(A1averaged, decimals=1)

A1averaged.set_index('temp') 
new_index = pd.Index(np.arange(25, 350, 0.1), name='temp')
A1indexed = A1averaged.set_index('temp').reindex(new_index).reset_index() 

Turns my 19 columns into 1 with temperature as the index (A1averaged), then into 2 columns with new temperature list and a column of empty data (A1indexed). Any ideas why this doesn't work? Or another method to do the same?

1 Answer 1

2

Index with float can reindex with problem, inconsistency is probably because of floating point accuracies. So I use little hack - Int64Index instead of Float64Index.

I try to set subset easier way:

A1truncated = A1[(A1.temp >= 25) & ( A1.temp <= 350.1)]

then omit first set index, because is set twice:

A1averaged.set_index('temp')

set new_index to Int64Index:

new_index = pd.Index(np.arange(250, 3500), name='temp')

and use Int64Index by multiply column temp by 10 and last this column is divided by 10.

A1averaged['temp'] = A1averaged['temp'] * 10
A1indexed['temp'] = A1indexed['temp'] / 10

All together:

import pandas as pd 
import numpy as np           
A1 = pd.read_table('Test data.tsv', encoding='ISO-8859-1', header = 2) 

A1.columns = ['time',2,3,4,5,6,7,'freq',9,10,11,12,13,'temp',15,16,17,18,19] 

A1truncated = A1[(A1.temp >= 25) & ( A1.temp <= 350.1)]

A1averaged = A1truncated.groupby(['temp'], as_index=False)['freq'].mean() 
A1averaged = np.around(A1averaged, decimals=1)
new_index = pd.Index(np.arange(250, 3500), name='temp')

A1averaged['temp'] = A1averaged['temp'] * 10
A1indexed = A1averaged.set_index('temp').reindex(new_index).reset_index()
A1indexed['temp'] = A1indexed['temp'] / 10
print A1indexed.tail()
#       temp       freq
#3245  349.5  5830065.6
#3246  349.6  5830043.5
#3247  349.7  5830046.3
#3248  349.8  5830025.3
#3249  349.9  5830015.6
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.