I have a dataframe with several columns and I need to re-sample from that data with more weight to one category. I think np.random.choice should work but not sure how to implement it. Following is the example data from which I want to sample randomly but want 70% probability of getting expensive home (based on the Expensive_home column, value = 1) and 30% probability for Expensive_home=0. How can I create the re-sampled data file? Thank you!
ID Lot_Area Year_Built Full_Bath Bedroom Sale_Price Expensive_home
1 31770 1960 1 3 215000 0
2 11622 1961 1 2 105000 0
3 5389 1995 2 2 236500 0
4 8402 1998 2 3 180400 0
5 10176 1990 1 2 171500 0
6 6820 1985 1 1 212000 0
7 53504 2003 3 4 538000 1
8 12134 1988 2 4 164000 0
9 11394 2010 1 1 394432 1
10 19138 1951 1 2 141000 0
11 13175 1978 2 3 210000 0
12 11751 1977 2 3 190000 0
13 10625 1974 2 3 170000 0
14 7500 2000 2 3 216000 0
15 11241 1970 1 2 149000 0
16 2280 1978 2 3 146000 0
17 12858 2009 2 3 376162 1
18 12883 2009 2 3 290941 0
19 12182 2005 2 3 220000 0
20 11520 2005 2 3 275000 0
similar data file but with more of randomly picked 1s in the last column
df.samplewith aweightsarguments, eg:df.sample(n, weights=df['Expensive_home'].replace({0:0.3, 1:0.7}))- not going to make that an answer at the moment though as not sure if that gives you the results you want... (you might have to do something different to provide it the weights that gives the desired result)replace=Trueso it can select the same row more than once, eg:df.sample(len(df), replace=True, weights=df['Expensive_home'].replace({0:0.3, 1:0.7}))- is that (close to) what you want?