1

I am new to Pandas and Python. I will write my question over an example. I have a data such as

df = pd.DataFrame([[1, 2], [1, 3], [4, 6], [5,6], [7,8], [9,10], [11,12], [13,14]], columns=['A', 'B'])
df 
    A   B

0   1   2

1   1   3

2   4   6

3   5   6

4   7   8

5   9   10

6   11  12

7   13  14

I am taking 3 samples from both column.

x = df['A'].sample(n=3)
x = x.reset_index(drop=True)
x

0     7
1     9
2    11

y = df['B'].sample(n=3)
y = y.reset_index(drop=True)
y

0     6
1    12
2     2

I would like to do this taking sample(n=3) 10 times. I tried [y] * 10, it produces columns 10 times out of 6,12,2. I want to do this 10 times from main data.Then I would like to make a new data out of this new columns generated from A and B. I thought maybe I should write for loop but I am not so familiar with them.

Thanks for the helps.

3
  • So, let me get this right... you want the same 3 values repeated 10 times? Commented Mar 16, 2018 at 19:43
  • What is your desired output? Are you looking to sample with replacement? Commented Mar 16, 2018 at 19:44
  • Oh no, I do not want same 3 values. I want every time a new data out of from df dataframe. 10 times a new data, from A and B column. Commented Mar 16, 2018 at 20:02

2 Answers 2

1

As WeNYoBen showed, it is good practice to split the task into

  1. generating the sample replicates,
  2. concatinating the data frames.

My suggestion: Write a generator function that is used to create a generator (instead of a list) of your sample replicates. Then you can concatenate the items (in this case, data frames) that the generator yields.

# a generator function
def sample_rep(dframe, n=None, replicates=None):
    for i in range(replicates):
        yield dframe.sample(n)

d = pd.concat(sample_rep(df, n=3, replicates=10),
              keys=range(1, 11), names=["replicate"])

The generator uses up less memory because it produces everything on the fly. The pd.concat() function triggers sample_rep() on your dataframe which generates the list of data frames to concatenate.

Sign up to request clarification or add additional context in comments.

Comments

0

Seems like you need

df.apply(lambda x : x.sample(3)).apply(lambda x : sorted(x,key=pd.isnull)).dropna().reset_index(drop=True)
Out[353]: 
      A     B
0   7.0   2.0
1  11.0   6.0
2  13.0  12.0

Sorry for the misleading , I overlook the 10 times

l=[]
count = 1
while (count < 11):
   l.append(df.apply(lambda x : x.sample(3)).apply(lambda x : sorted(x,key=pd.isnull)).dropna().reset_index(drop=True))
   count = count + 1

pd.concat(l)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.