2

I have a large data set of 7000 rows with 40 features. I want to create two new data frames with rows from the original. I want to select which rows go into which dataframe using the values from a 1D numpy array, then compare the values in the array against the index of the original dataframe and if they match, I want to take the entire row of the original dataframe and add it to the new dataframe.

#reading in my cleaned customer data and creating the original dataframe.
customer_data = pd.read_excel('Clean Customer Data.xlsx', index_col = 0)
#this is the 1D array that has a single element that corresponds to the index number of customer_data
group_list = np.array([2045,323,41,...,n])
# creating the arrays with a slice from group_list with the values of the row indexes for the groups
group_1 = np.array(group_list[:1972])
group_2 = np.array(group_list[1972:])
for X in range(len(group_list):
    i = 0
    #this is where I get stuck
    if group_1[i] == **the index of the original dataframe**
        group1_df = pd.append(customer_data)
    else:
        group2_df = pd.append(customer_data)
    i = i+1

Obviously, I have some serious syntax and possibly some serious logic issues with what I'm doing, but I've been beating my head against this wall for a week now, and my brain is mush.

What I expect to happen is the row in the original data frame index of 2045 would go into group1_df.

Ultimately, I'm looking to create two data frames (group1_df and group2_df) that have the same features as the original dataset, the first one having 1,972 records and the second having 5,028.

The dataset looks something like this: Copy of the data set I'm working with

1
  • Welcome to StackOverflow. Your question is decribed well, only things that's missing is some example data (5-10 rows) and what your expected output looks like based on that example data. Commented Jul 13, 2019 at 19:49

2 Answers 2

1

Consider DataFrame.reindex to align each group values with indices of customer_data.

customer_data = pd.read_excel('Clean Customer Data.xlsx', index_col = 0)

group_list = np.array([2045,323,41,...,n])

group1_df = customer_data.reindex(group_list[:1972], axis = 'index')
group2_df = customer_data.reindex(group_list[1972:], axis = 'index')
Sign up to request clarification or add additional context in comments.

Comments

0

If your numpy array is a, and your dataframe is df,

group1_df = df.loc[df.index.isin(a[:1972]), :]
group2_df = df.loc[df.index.isin(a[1972:]), :]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.