I have a large data set of 7000 rows with 40 features. I want to create two new data frames with rows from the original. I want to select which rows go into which dataframe using the values from a 1D numpy array, then compare the values in the array against the index of the original dataframe and if they match, I want to take the entire row of the original dataframe and add it to the new dataframe.
#reading in my cleaned customer data and creating the original dataframe.
customer_data = pd.read_excel('Clean Customer Data.xlsx', index_col = 0)
#this is the 1D array that has a single element that corresponds to the index number of customer_data
group_list = np.array([2045,323,41,...,n])
# creating the arrays with a slice from group_list with the values of the row indexes for the groups
group_1 = np.array(group_list[:1972])
group_2 = np.array(group_list[1972:])
for X in range(len(group_list):
i = 0
#this is where I get stuck
if group_1[i] == **the index of the original dataframe**
group1_df = pd.append(customer_data)
else:
group2_df = pd.append(customer_data)
i = i+1
Obviously, I have some serious syntax and possibly some serious logic issues with what I'm doing, but I've been beating my head against this wall for a week now, and my brain is mush.
What I expect to happen is the row in the original data frame index of 2045 would go into group1_df.
Ultimately, I'm looking to create two data frames (group1_df and group2_df) that have the same features as the original dataset, the first one having 1,972 records and the second having 5,028.
