1

I have a data-frame (df) with a column called Id which looks like

        Id
 0       3
 1      67
 2     356
 3      
 :
50      P4
51      P5
52     678
53 
54       2

The column has a type: dtype: object I have worked out the maximum Id value and assigned to a variable called maxId (which is 678 and am looking to apply a sequentially increasing maxId to the empty elements so in this example my output would be:

        Id
 0       3
 1      67
 2     356
 3     679
 :
50      P4
51      P5
52     678
53     680
54       2

Where element 3 and 53 are assigned values of 679 and 680 respectively.

I have tried the following code where i loop through the column looking for null elements and then applythuing the maxId to these:

for item, frame in df['Id'].iteritems():
        if pd.isnull(frame):
            maxId = maxId + 1
            frame['Id'] = maxId 

But I get an error:

TypeError: 'float' object is not subscriptable

What do I need to do for a fix?

2
  • In general with Pandas, you should look to avoid row-wise for loops. Vectorised column-wise operations are possible. Commented Jan 10, 2019 at 11:33
  • Please provide a minimal reproducible example, as well as the entire error message. Commented Apr 24, 2020 at 0:28

3 Answers 3

2

Using pd.Series.isnull and np.arange:

# calculate maximum value
maxId = int(pd.to_numeric(df['Id'], errors='coerce').max())

# calculate Boolean series of nulls
nulls = df['Id'].isnull()

# assign range starting from one above maxId
df.loc[nulls, 'Id'] = np.arange(maxId + 1, maxId + 1 + nulls.sum())

print(df)

#      Id
# 0     3
# 1    67
# 2   356
# 3   679
# 50   P4
# 51   P5
# 52  678
# 53  680
# 54    2
Sign up to request clarification or add additional context in comments.

Comments

1

As you say you have already figured the maxId you can try this vectorized solution :

>>df

    Id
0   3
1   67
2   356
3   NaN
5   P4
6   P5
7   678
8   NaN
9   2

n = 678
n=n+1
df.loc[df.Id.isnull(), 'Id'] = list(np.arange(n,n+len(df.Id[df.Id.isna()].values)))
>>df

Output:

    Id
0   3
1   67
2   356
3   679
5   P4
6   P5
7   678
8   680
9   2

Comments

0

Do you need the values like 'P4' and 'P5' ? I tried to reproduce a similar DataFrame to yours, but without those values and it just works :

df = pd.DataFrame({'A' : [20,4, np.nan, np.nan, 12, np.nan, 6, 10]})

maxID = df['A'].max()

for i in range (len(df['A'])):
    if pd.isnull(df['A'].loc[i]):
        maxID +=1
        df['A'].loc[i] = maxID

I think your error occurs because you are trying to access an element of a float such as you would do with a list.

Exemple :

my_float = 3.0 
my_float[0]

TypeError: 'float' object is not subscriptable

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.