2

I have a dataframe with three columns: Depth, Shale Volume and Density.

What I need to do is to calculate porosity based on the shale volume and density. So, where the shale volume is >0.7 I apply certain parameters for the porosity calculation and where i have the volume < 0.2 I have other parameters.

For example if the Shale volume is < 0.2:

 porosity=density*2.3

and if shale volume is >0.7:

 porosity=density*1.7

this is the example of the part of the dataframe if have:

 depth       density    VSH
 5517        2.126      0.8347083
 5517.5      2.123      0.8310949
 5518        2.124      0.8012414
 5518.5      2.121      0.7838615
 5519        2.116      0.7674243
 5519.5      2.127      0.8405414

this is the piece of code I am trying to do. I want it to be in for loop because it will serve for the future purposes:

 for index, row in data.iterrows():
     if data.loc[index, 'VSH']<0.2:
          data.loc[index,'porosity']=(data['density']*2.3)
     elif data.loc[index, 'VSH'] > 0.7:
          data.loc[index,'porosity']=(data['density']*1.7)

The error I am getting is the following, it would be great if you can provide me with help:

 TypeError: '<' not supported between instances of 'str' and 'float'
1
  • You are trying to compare a string to a float. Try casting your vsh to float Commented May 9, 2019 at 13:37

1 Answer 1

2

Here iterrows is bad choice, because slow and exist vectorized solution, check Does pandas iterrows have performance issues?

So use numpy.select:

m1 = data['VSH'] < 0.2
m2 = data['VSH'] > 0.7
s1 = data['density']*2.3
s2 = data['density']*1.7

data['porosity'] = np.select([m1, m2], [s1, s2])

print (data)
    depth  density       VSH  porosity
0  5517.0    2.126  0.834708    3.6142
1  5517.5    2.123  0.831095    3.6091
2  5518.0    2.124  0.801241    3.6108
3  5518.5    2.121  0.783861    3.6057
4  5519.0    2.116  0.767424    3.5972
5  5519.5    2.127  0.840541    3.6159

Better is also defined, whats happen between 0.2 and 0.7 - e.g. returned value of column data['density'] in default parameter:

data['porosity'] = np.select([m1, m2], [s1, s2], default=data['density'])

print (data)
    depth  density       VSH  porosity
0  5517.0    2.126  0.834708    3.6142
1  5517.5    2.123  0.831095    3.6091
2  5518.0    2.124  0.801241    3.6108
3  5518.5    2.121  0.783861    3.6057
4  5519.0    2.116  0.767424    3.5972
5  5519.5    2.127  0.840541    3.6159
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the reply but is there any way to use iteration within numpy approach to solve this issue
I have implemented the code that you have provided but the error still persists...TypeError: '<' not supported between instances of 'str' and 'float'
@KamranAbbasov - There is problem non numeric values, so try data['VSH'] = data['VSH'].astype(float) and if not working, because some strings use data['VSH'] = pd.to_numeric(data['VSH'], errors='coerce')
Use data['VSH'] = pd.to_numeric(data['VSH'], errors='coerce') and data['density'] = pd.to_numeric(data['density'], errors='coerce'), if necessary also data['depth'] = pd.to_numeric(data['depth'], errors='coerce')
yes, thats what i forgot to do) thats why ive deleted the message. thanks. seems to be working very well! thank you. any advise, using something instead of iterrows for the iterative approach?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.