Pandas Dataframe - Generate incremental values

Question

In my workflow there are multiple CSVs with four columns OID, value, count, unique_id. I am trying to figure how to generate incremental values under unique_id column. Using apply(), I can do something like df.apply(lambda x : x + 1) #where x = 0 and it will result in all values under unique_id as 1. However, I am confused on how to use apply() to generate incrementally values in each row for a specific column.

# Current Dataframe 
   OID  Value  Count  unique_id
0   -1      1      5          0
1   -1      2     46          0
2   -1      3     32          0
3   -1      4      3          0
4   -1      5     17          0

# Trying to accomplish
   OID  Value  Count  unique_id
0   -1      1      5          0
1   -1      2     46          1
2   -1      3     32          2
3   -1      4      3          3
4   -1      5     17          4

Sample code (I understand that the syntax is incorrect, but it is approximately what I am trying to accomplish):

def numbers():
    for index, row in RG_Res_df.iterrows():
        return index

RG_Res_df = RG_Res_df['unique_id'].apply(numbers)

you can just do df['unique_id'] = np.arange(df.shape[0])

EdChum
– EdChum

2017-03-02 16:42:59 +00:00
Commented Mar 2, 2017 at 16:42 — EdChum
– EdChum, Commented Mar 2, 2017 at 16:42

EdChum · Accepted Answer · 2017-03-02 16:43:42Z

7

don't loop you can just directly assign a numpy array to generate the id, here using np.arange and pass the num of rows which will be df.shape[0]

In [113]:
df['unique_id'] = np.arange(df.shape[0])
df

Out[113]:
   OID  Value  Count  unique_id
0   -1      1      5          0
1   -1      2     46          1
2   -1      3     32          2
3   -1      4      3          3
4   -1      5     17          4

or pure pandas method using RangeIndex, here the default start is 0 so we only need to pass stop=df.shape[0]:

In [114]:
df['unique_id'] = pd.RangeIndex(stop=df.shape[0])
df

Out[114]:
   OID  Value  Count  unique_id
0   -1      1      5          0
1   -1      2     46          1
2   -1      3     32          2
3   -1      4      3          3
4   -1      5     17          4

answered Mar 2, 2017 at 16:43

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

cpt_python Over a year ago

This worked beautifully. Is Numpy functions preferred over Pandas? or are they pretty comparable? Also, df['unique_id'] = pd.RangeIndex(stop=df.shape[0]) gives me AttributeError: 'module' object has no attribute 'RangeIndex'. Any idea? I was able to iterate using its index earlier.

EdChum Over a year ago

you may need to add import pandas as pd also generally there isn't much different but numpy methods will be faster so it should be preferred where it does what you want

cpt_python Over a year ago

I found the problem, I am using an older version of Pandas at work. Also, could you point why wouldn't the following np.arange syntax: df['unique_id'] = np.arange(57) throws this error: ValueError: Length of values does not match length of index?

EdChum Over a year ago

Well the error is telling you the lengths are different so what you tried will make an array from 0 to 56 making 57 rows is this correct?

cpt_python Over a year ago

Sort of. I just realized that I only need to generate the unique values on a select rows from 0 to 56. I assumed that since np.arange works much like range does in python, so if I were to provide a stop value (i.e. 57) it would only generate values for the said rows where start=0 by default. Sorry for the confusion!

Collectives™ on Stack Overflow

Pandas Dataframe - Generate incremental values

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related