If/else statement within loop over dataframe

Question

I have a dataframe with three columns: Depth, Shale Volume and Density.

What I need to do is to calculate porosity based on the shale volume and density. So, where the shale volume is >0.7 I apply certain parameters for the porosity calculation and where i have the volume < 0.2 I have other parameters.

For example if the Shale volume is < 0.2:

 porosity=density*2.3

and if shale volume is >0.7:

 porosity=density*1.7

this is the example of the part of the dataframe if have:

 depth       density    VSH
 5517        2.126      0.8347083
 5517.5      2.123      0.8310949
 5518        2.124      0.8012414
 5518.5      2.121      0.7838615
 5519        2.116      0.7674243
 5519.5      2.127      0.8405414

this is the piece of code I am trying to do. I want it to be in for loop because it will serve for the future purposes:

 for index, row in data.iterrows():
     if data.loc[index, 'VSH']<0.2:
          data.loc[index,'porosity']=(data['density']*2.3)
     elif data.loc[index, 'VSH'] > 0.7:
          data.loc[index,'porosity']=(data['density']*1.7)

The error I am getting is the following, it would be great if you can provide me with help:

 TypeError: '<' not supported between instances of 'str' and 'float'

You are trying to compare a string to a float. Try casting your vsh to float — Capie
– Capie, Commented May 9, 2019 at 13:37

jezrael · Accepted Answer · 2019-05-09 13:36:52Z

2

Here iterrows is bad choice, because slow and exist vectorized solution, check Does pandas iterrows have performance issues?

So use numpy.select:

m1 = data['VSH'] < 0.2
m2 = data['VSH'] > 0.7
s1 = data['density']*2.3
s2 = data['density']*1.7

data['porosity'] = np.select([m1, m2], [s1, s2])

print (data)
    depth  density       VSH  porosity
0  5517.0    2.126  0.834708    3.6142
1  5517.5    2.123  0.831095    3.6091
2  5518.0    2.124  0.801241    3.6108
3  5518.5    2.121  0.783861    3.6057
4  5519.0    2.116  0.767424    3.5972
5  5519.5    2.127  0.840541    3.6159

Better is also defined, whats happen between 0.2 and 0.7 - e.g. returned value of column data['density'] in default parameter:

data['porosity'] = np.select([m1, m2], [s1, s2], default=data['density'])

print (data)
    depth  density       VSH  porosity
0  5517.0    2.126  0.834708    3.6142
1  5517.5    2.123  0.831095    3.6091
2  5518.0    2.124  0.801241    3.6108
3  5518.5    2.121  0.783861    3.6057
4  5519.0    2.116  0.767424    3.5972
5  5519.5    2.127  0.840541    3.6159

answered May 9, 2019 at 13:36

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

akkab Over a year ago

Thanks for the reply but is there any way to use iteration within numpy approach to solve this issue

akkab Over a year ago

I have implemented the code that you have provided but the error still persists...TypeError: '<' not supported between instances of 'str' and 'float'

jezrael Over a year ago

@KamranAbbasov - There is problem non numeric values, so try data['VSH'] = data['VSH'].astype(float) and if not working, because some strings use data['VSH'] = pd.to_numeric(data['VSH'], errors='coerce')

jezrael Over a year ago

Use data['VSH'] = pd.to_numeric(data['VSH'], errors='coerce') and data['density'] = pd.to_numeric(data['density'], errors='coerce'), if necessary also data['depth'] = pd.to_numeric(data['depth'], errors='coerce')

akkab Over a year ago

yes, thats what i forgot to do) thats why ive deleted the message. thanks. seems to be working very well! thank you. any advise, using something instead of iterrows for the iterative approach?

|

Collectives™ on Stack Overflow

If/else statement within loop over dataframe

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related