Pandas dataframe - creating dataframe with one record

Question

I would like to create a dataframe without reading it from CSV.

For example, I would like to create the columns and one record. Please assume something like this:

    Feature1 Feature 2  Feature 3 ... Feature n
1     20      False        3.2          True

I build a classifier and I would like to make prediction: classifier.predict(dataframe)

I received the record as string with "," between the features. I used split for extracting list of features:

record_features = "16,713,Danny, ..."
features = record_features.split(',')

After that I convert the list into series:

series = pd.Series(features)

And after that I would like to create a dataframe: column_names = ['feature1', 'feature2', ..., 'feature102']

 df = pd.DataFrame(series, columns=column_names)

I got an error:

ValueError: Shape of passed values is (1, 102), indices imply (102, 102)

I have really 102 features and I would like to create a dataframe with columns and one record

Any suggestions?

jezrael · Accepted Answer · 2016-10-20 09:19:29Z

You can add []:

column_names = ['Feature1','Feature2','Feature102']
record_features = "16,713,Danny"
features = record_features.split(',')

df = pd.DataFrame([features], columns=column_names)
print (df)
  Feature1 Feature2 Feature102
0       16      713      Danny

Another numpy solution with reshape:

df = pd.DataFrame(np.array(features)
                    .reshape(len(features) // len(column_names), len(column_names)), 
                 columns=column_names)
print (df)
  Feature1 Feature2 Feature102
0       16      713      Danny

Timings:

column_names = ['Feature' + str(x) for x in range(102)]
record_features = "16,713,Danny"
features = record_features.split(',')
features = features * 34

In [222]: %timeit pd.DataFrame([features], columns=column_names)
100 loops, best of 3: 5.94 ms per loop

In [223]: %timeit pd.DataFrame(dict(zip(column_names, features)), index=[0], columns=column_names)
The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 5.25 ms per loop

In [224]: %timeit pd.DataFrame(np.array(features).reshape(len(features) // len(column_names), len(column_names)), columns=column_names)
The slowest run took 5.60 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 206 µs per loop

kezzos · Accepted Answer · 2016-10-20 09:16:54Z

0

You can pass in a dictionary to the DataFrame constructor:

column_names = ['Feature1','Feature2','Feature102']
record_features = "16",713,"Danny"

print pd.DataFrame(dict(zip(column_names, record_features)), index=[0], columns=column_names)

>>>   Feature1  Feature2 Feature102
0       16       713      Danny

edited Oct 20, 2016 at 9:16

answered Oct 20, 2016 at 9:07

kezzos

3,2413 gold badges25 silver badges40 bronze badges

Collectives™ on Stack Overflow

Pandas dataframe - creating dataframe with one record

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related