0

Working with python pandas 0.19.

I want to create a new dataframe (df2) as a subset of an existing dataframe (df1). df1 looks like this:

In [1]: df1.head()
Out [1]:
    col1_name    col2_name    col3_name
0          23           42           55
1          27           55           57
2          52           20           52
3          99           18           53   
4          65           32           51

The logic is:

df2 = []

for i in range(0,N):
    loc = some complicated logic
    df1_sub = df1.ix[loc,]
    df2.append(df1_sub)

df2 = pd.DataFrame.from_records(df2)

The result df2 is indeed a dataframe, but the content is all comprised of column names of df1. It looks like this:

In [2]: df2.head()
Out [2]:
    col1_name    col2_name    col3_name
0   col1_name    col2_name    col3_name
1   col1_name    col2_name    col3_name
2   col1_name    col2_name    col3_name
3   col1_name    col2_name    col3_name
4   col1_name    col2_name    col3_name

I know it's probably related to the conversion from list to dataframe but I'm not sure what exactly I'm missing here. Or is there a better way of doing this?

4
  • please include df1.head() and final result that you want. That makes the problem easier to understand. Commented Jan 6, 2017 at 14:40
  • 1
    I'm not sure exactly what you are asking but there are many things that need to be addressed. Do not use .ix unless absolutely necessary. You shouldn't have to create a list of dataframes to do this but if you do, the last line should be changed to pd.concat(df2). Please provide more info as it might be possible to not use a for loop to construct the logic. Also the name df2 implies you have a DataFrame. Use something like df_list instead. Commented Jan 6, 2017 at 14:43
  • in the for loop check the value of loc, it may tell you if there is something wrong Commented Jan 6, 2017 at 14:49
  • @ Ted Petrou pd.concat(df2) is the way to go. The logic is indeed complicated. I'll have to even do a while loop within the for loop: take a slice from df1 called df1_sub, take out one row of df1_sub if a condition is met, and check the remaining df1_sub until the condition is no longer met. Commented Jan 6, 2017 at 14:49

3 Answers 3

0

How about just slice the dataframe?

import pandas as pd
DF1 = pd.DataFrame()
DF1['x'] = ['a','b','c','a','c','b']
DF1['y'] = [1,3,2,-1,-2,-3]

DF2 = DF1[[(x == 'a' and y > 0) for x,y in zip(DF1['x'], DF1['y'])]]

This should be way more efficient than appending. DF1[Complicated Condition] takes any Boolean arguement

Sign up to request clarification or add additional context in comments.

Comments

0

You can take advantage of pandas' (actually numpy's) masked arrays.

import pandas as pd

df1 = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': ['a', 'b', 'c', 'd', 'e'],
                    'c': [10, 11, 12, 13, 14]})

#      a  b   c
#   0  1  a  10
#   1  2  b  11
#   2  3  c  12
#   3  4  d  13
#   4  5  e  14

Let's assume that df2 should be a subset of df1: it should have columns b and c and only the rows where column a has an even value:

df2 = df1[df1['a'] % 2 == 0][['b', 'c']]
#    b   c
# 1  b  11
# 3  d  13

Comments

0

As per Ted Petrou, the solution is simply:

pd.concat(df2)

I was confused by the data type of df2.

It is impossible, given the logic within the for loop, to directly select df1 using some index.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.