0

I have a table of 10000 rows loaded in dataframe.

The below code pushes these using patch method to another source. I do not want to execute & push all 10000 rows at the same time using below command. Rather I want the first 100 rows from the table to be executed and pushed first, then again the next 100, and so on till the end of the table in a loop. My table doesn't have any row number column. How can this be achieved in python as a loop.

batch = clientlink.create_batch()
changeset = clientlink.create_changeset()
 for row in dfpatch.rdd.collect():
  changeset.add_request(clientlink.entity_sets.cc.update_entity(obj=row.obj, method='PATCH').set(seg=row.segment))
  print(row.obj,row.segment)
batch.add_request(changeset)
response = batch.execute()

2 Answers 2

1

I don't know what the function of clientlink is, but if you only think about the part that processes 100 in the for loop, you can implement it by adding count as shown below. For reference, since I don't know how to initialize a batch, I put an explanation in the comments.

batch = clientlink.create_batch()
changeset = clientlink.create_changeset()
count = 0
    for row in dfpatch.rdd.collect():
        changeset.add_request(clientlink.entity_sets.CorporateAccountCollection.update_entity(ObjectID=row.ObjectID, method='PATCH').set(CLMSegment_KUT=row.segment))
        count += 1
        print(row.ObjectID,row.segment)
        if count == 100:
            batch.add_request(changeset)
            response = batch.execute()
            # need to clear 'batch'
            count = 0
batch.add_request(changeset)
response = batch.execute()
Sign up to request clarification or add additional context in comments.

Comments

0

Try:

nSplits = int(dfpatch.shape[0]/100)  
    #Number of splits
    # we need for 100 rows per split

#You may want to put a check on the above to ensure 
#nSplits*100 >= nRows in the frame, should be OK for 
#the rounded numbers you have here

listOfFrames = np.array_split(dfpatch, nSplits)

After that, its pretty easy

for subFrame in listOfFrames:
    #Do something with the frame of 100

7 Comments

I get this below error when trying to use shape: AttributeError: 'DataFrame' object has no attribute 'shape'
Eh? Pandas dataframe? Try len(dfpatch.index)
AttributeError: 'DataFrame' object has no attribute 'index'
What kind of dataframe is this? Its obviously not a pandas dataframe.... IF its RDD, use dfpatch.count()
dfpatch:pyspark.sql.dataframe.DataFrame AccountID:string Name:string ObjectID:string segment:string
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.