1

I'm trying to convert kilometer values in one column of a dataframe to mile values. I've tried various things and this is what I have now:

def km_dist(column, dist):
    length = len(column)
    for dist in zip(range(length), column):
        if (column == data["dist"] and dist in data.loc[(data["dist"] > 25)]):
            return dist / 5820
        else:
            return dist
    
data = data.apply(lambda x: km_dist(data["dist"], x), axis=1)

The dataset I'm working with looks something like this:

    past_score  dist    income  lab score   gender  race    income_bucket   plays_sports    student_id  lat long
0   8.091553    11.586920   67111.784934    0   7.384394    male    H   3   0   1   0.0 0.0
1   8.091553    11.586920   67111.784934    0   7.384394    male    H   3   0   1   0.0 0.0
2   7.924539    7858.126614 93442.563796    1   10.219626   F   W   4   0   2   0.0 0.0
3   7.924539    7858.126614 93442.563796    1   10.219626   F   W   4   0   2   0.0 0.0
4   7.726480    11.057883   96508.386987    0   8.544586    M   W   4   0   3   0.0 0.0

With my code above, I'm trying to loop through all the "dist" values and if those values are in the right column ("data["dist"]") and greater than 25, divide those values by 5820 (the number of feet in a kilometer). More generally, I'd like to find a way to operate on specific elements of dataframes. I'm sure this is at least a somewhat common question, I just haven't been able to find an answer for it. If someone could point me towards somewhere with an answer, I would be just as happy.

1 Answer 1

1

Instead your solution filter rows with mask and divide column dist by 5820:

data.loc[data["dist"] > 25, 'dist'] /= 5820

Working same like:

data.loc[data["dist"] > 25, 'dist'] = data.loc[data["dist"] > 25, 'dist'] / 5820

data.loc[data["dist"] > 25, 'dist'] /= 5820
print (data)
   past_score       dist        income  lab      score gender race  \
0    8.091553  11.586920  67111.784934    0   7.384394   male    H   
1    8.091553  11.586920  67111.784934    0   7.384394   male    H   
2    7.924539   1.350194  93442.563796    1  10.219626      F    W   
3    7.924539   1.350194  93442.563796    1  10.219626      F    W   
4    7.726480  11.057883  96508.386987    0   8.544586      M    W   

   income_bucket  plays_sports  student_id  lat  long  
0              3             0           1  0.0   0.0  
1              3             0           1  0.0   0.0  
2              4             0           2  0.0   0.0  
3              4             0           2  0.0   0.0  
4              4             0           3  0.0   0.0  
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you @jezrael this is perfect! Would you mind pointing me towards somewhere that talks about that half-bracket mask you've done with the 'dist' ('dist'])? I've found the mask() function for the pandas package but I've had trouble finding more about the half-bracket mask, but I'd love to read more about it!
@Aaron - Yes, I remove it only one rason, because only one condition. If there is two or meore, are necessary e.g. like data.loc[(data["dist"] > 25) & (data["gender"] == 25), 'dist'] /= 5820 - if understand well your question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.