Operations on specific elements of a dataframe in Python

Question

I'm trying to convert kilometer values in one column of a dataframe to mile values. I've tried various things and this is what I have now:

def km_dist(column, dist):
    length = len(column)
    for dist in zip(range(length), column):
        if (column == data["dist"] and dist in data.loc[(data["dist"] > 25)]):
            return dist / 5820
        else:
            return dist
    
data = data.apply(lambda x: km_dist(data["dist"], x), axis=1)

The dataset I'm working with looks something like this:

    past_score  dist    income  lab score   gender  race    income_bucket   plays_sports    student_id  lat long
0   8.091553    11.586920   67111.784934    0   7.384394    male    H   3   0   1   0.0 0.0
1   8.091553    11.586920   67111.784934    0   7.384394    male    H   3   0   1   0.0 0.0
2   7.924539    7858.126614 93442.563796    1   10.219626   F   W   4   0   2   0.0 0.0
3   7.924539    7858.126614 93442.563796    1   10.219626   F   W   4   0   2   0.0 0.0
4   7.726480    11.057883   96508.386987    0   8.544586    M   W   4   0   3   0.0 0.0

With my code above, I'm trying to loop through all the "dist" values and if those values are in the right column ("data["dist"]") and greater than 25, divide those values by 5820 (the number of feet in a kilometer). More generally, I'd like to find a way to operate on specific elements of dataframes. I'm sure this is at least a somewhat common question, I just haven't been able to find an answer for it. If someone could point me towards somewhere with an answer, I would be just as happy.

jezrael · Accepted Answer · 2021-12-13 06:17:05Z

1

Instead your solution filter rows with mask and divide column dist by 5820:

data.loc[data["dist"] > 25, 'dist'] /= 5820

Working same like:

data.loc[data["dist"] > 25, 'dist'] = data.loc[data["dist"] > 25, 'dist'] / 5820

data.loc[data["dist"] > 25, 'dist'] /= 5820
print (data)
   past_score       dist        income  lab      score gender race  \
0    8.091553  11.586920  67111.784934    0   7.384394   male    H   
1    8.091553  11.586920  67111.784934    0   7.384394   male    H   
2    7.924539   1.350194  93442.563796    1  10.219626      F    W   
3    7.924539   1.350194  93442.563796    1  10.219626      F    W   
4    7.726480  11.057883  96508.386987    0   8.544586      M    W   

   income_bucket  plays_sports  student_id  lat  long  
0              3             0           1  0.0   0.0  
1              3             0           1  0.0   0.0  
2              4             0           2  0.0   0.0  
3              4             0           2  0.0   0.0  
4              4             0           3  0.0   0.0

edited Dec 13, 2021 at 6:17

answered Dec 13, 2021 at 6:07

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Aaron Over a year ago

Thank you @jezrael this is perfect! Would you mind pointing me towards somewhere that talks about that half-bracket mask you've done with the 'dist' ('dist'])? I've found the mask() function for the pandas package but I've had trouble finding more about the half-bracket mask, but I'd love to read more about it!

jezrael Over a year ago

@Aaron - Yes, I remove it only one rason, because only one condition. If there is two or meore, are necessary e.g. like data.loc[(data["dist"] > 25) & (data["gender"] == 25), 'dist'] /= 5820 - if understand well your question

Collectives™ on Stack Overflow

Operations on specific elements of a dataframe in Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related