1

just need a little bit of help in returning some values from a dataframe.


I've got a dataframe (call it df1) with some values:

ID      X    Y    Distance   Date
1       1    2    2.2        01/01/2000
2       2    3    1.8        02/02/2001
3       3    4    1.2        03/03/2002
4       4    5    2.7        04/04/2003 
5       5    6    3.8        05/05/2004

Currently I have code which creates a new column - df1['Within 2k'] - which returns True if the Distance is within 2 km. For example, this would look like:

df1['Within 2k'] = df1['distance'] <= 2
print("df1")

ID      X    Y    Distance   Date         Within 2k
1       1    2    2.2        01/01/2000   False
2       2    3    1.8        02/02/2001   True
3       3    4    1.2        03/03/2002   True
4       4    5    2.7        04/04/2003   False
5       5    6    3.8        05/05/2004   False

I also have code which changes ID & Distance to "Null" if they aren't within 2km. For instance, this looks like:

df1['ID'] = np.where((df1['Distance'] <= 2), df1['ID'], "Null")
df1['Distance'] = np.where((df1['Distance'] <= 2), df1['Distance'], "Null")
print(df1)

ID     X    Y    Distance   Date
Null   1    2    Null       01/01/2000
2      2    3    1.8        02/02/2001
3      3    4    1.2        03/03/2002
Null   4    5    Null       04/04/2003 
Null   5    6    Null       05/05/2004

The aim of my code is to return the first record (chronologically) where the distance is within 2km. Currently I have code which returns the value where the Date value is minimum, however that includes Null values.

My code at the moment looks a bit like this:

Site2km = df1.loc[df1['Date'].idxmin(),'ID']
Dist2km = df1.loc[df1['Date'].idxmin(),'Distance']

return pd.Series([Site2km, Dist2km])

I need some code which will:

1) Return the first ID & Distance where the Distance is less than 2

2) If every value in the table is outside the Distance 2km, return the strings "Null" for both ID & Distance.

1 Answer 1

2

actually you don't need additional columns:

In [35]: df
Out[35]:
   ID  X  Y  Distance       Date
0   1  1  2       2.2 2000-01-01
1   2  2  3       1.8 2001-02-02
2   3  3  4       1.2 2002-03-03
3   4  4  5       2.7 2003-04-04
4   5  5  6       3.8 2004-05-05

In [36]: df.loc[df['Distance'] <= 2].nsmallest(1, 'Date')[['ID','Distance']]
Out[36]:
   ID  Distance
1   2       1.8

UPDATE:

In [47]: df
Out[47]:
   ID  X  Y  Distance       Date
0   1  1  2       2.2 2000-01-01
1   2  2  3       1.8 2001-02-02
2   3  3  4       1.2 2002-03-03
3   4  4  5       2.7 2003-04-04
4   5  5  6       3.8 2004-05-05

In [48]: r = df.loc[df['Distance'] <= 2].nsmallest(1, 'Date')[['ID','Distance']]

In [49]: r
Out[49]:
   ID  Distance
1   2       1.8

let's simulate the situation when we don't have any points within 2km:

In [50]: df.Distance += 10

In [51]: r = df.loc[df['Distance'] <= 2].nsmallest(1, 'Date')[['ID','Distance']]

In [52]: r
Out[52]:
Empty DataFrame
Columns: [ID, Distance]
Index: []

In [53]: if r.empty:
    ...:     r.loc[0] = [np.nan, np.nan]
    ...:

In [54]: r
Out[54]:
   ID  Distance
0 NaN       NaN
Sign up to request clarification or add additional context in comments.

4 Comments

What should I do if every value in df1 is outside of the distance?
@CTaylor19, what is your desired output format if you have rows within 2km?
Just something like "Null" or anything that is easily noticed when I come to manipulate the data further
At the moment, this code (which i'm applying on a different dataframe) results in many 'Empty Dataframes'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.