8

I'd like to know equivalence in PySpark to the use of reset_index() command used in pandas. When using the default command (reset_index), as follows:

data.reset_index()

I get an error:

"DataFrame' object has no attribute 'reset_index' error"

4
  • Can you provide more to your question - what you are trying to achieve ? what is the expected outcome in a tabular format ? Commented Nov 6, 2020 at 5:57
  • 1
    You cannot use reset_index because Spark has not concept of index. The dataframe is distributed and is fundamentally different from pandas. Commented Nov 6, 2020 at 6:53
  • If you just want to provide a numerical id to the rows then you can use monotonically_increasing_id Commented Nov 6, 2020 at 8:23
  • If your problem is as simple as mine this can help https://stackoverflow.com/questions/52318016/pyspark-add-sequential-and-deterministic-index-to-dataframe Commented Jul 16, 2021 at 22:30

1 Answer 1

2

Like the other comments mentioned, if you do need to add an index to your DF, you can use:

from pyspark.sql.functions import monotonically_increasing_id

df = df.withColumn("index_column",monotonically_increasing_id())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.