Index by condition in python-numpy?

Question

I'm trying to migrate from Matlab to Python. I'm rewriting some code that I had in Matlab to Python for testing. I've installed Anaconda and currently using Spyder IDE. Using Matlab I created a function that returns the values of the commercial API 5L diameter(diametro) and thickness(espesor) of pipes that are closer to the input parameters of the function. I did this using a Matlab table.

Note that the inputs of the diameter(diametro_entrada) and thickness(espesor_entrada) are in meters[m] and the thickness inside the function are in millimeters [mm], that's why at the end I had to multiply espesor_entrada*1000

    function tabla_seleccion=tablaAPI(diametro_entrada,espesor_entrada)
%Proporciona la tabla de caños API 5L, introducir diámetro en [m] y espesor
%en [m]
    Diametro_m=[0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;0.3556;...
    0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;0.4064;...
    0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;0.4570;...
    0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;0.5080;...
    0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;0.559;...
    0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;0.610;...
    0.660;0.660;0.660;0.660;0.660;0.660;0.660;0.660;0.660;0.660;0.660;0.660;0.660;0.660;0.660;0.660;0.660;...
    0.711;0.711;0.711;0.711;0.711;0.711;0.711;0.711;0.711;0.711;0.711;0.711;0.711;0.711;0.711;0.711;0.711;...
    0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;0.762;...
    0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813;0.813];

Espesor_mm=[4.8;5.2;5.3;5.6;6.4;7.1;7.9;8.7;9.5;10.3;11.1;11.9;12.7;14.3;15.9;17.5;19.1;20.6;22.2;23.8;25.4;27.0;28.6;31.8;...
    4.8;5.2;5.6;6.4;7.1;7.9;8.7;9.5;10.3;11.1;11.9;12.7;14.3;15.9;17.5;19.1;20.6;22.2;23.8;25.4;27.0;28.6;30.2;31.8;...
    4.8;5.6;6.4;7.1;7.9;8.7;9.5;10.3;11.1;11.9;12.7;14.3;15.9;17.5;19.1;20.6;22.2;23.8;25.4;27.0;28.6;30.2;31.8;...
    5.6;6.4;7.1;7.9;8.7;9.5;10.3;11.1;11.9;12.7;14.3;15.9;17.5;19.1;20.6;22.2;23.8;25.4;27.0;28.6;30.2;31.8;33.3;34.9;...
    5.6;6.4;7.1;7.9;8.7;9.5;10.3;11.1;11.9;12.7;14.3;15.9;17.5;19.1;20.6;22.2;23.8;25.4;27.0;28.6;30.2;31.8;33.3;34.9;36.5;38.1;...
    6.4;7.1;7.9;8.7;9.5;10.3;11.1;11.9;12.7;14.3;15.9;17.5;19.1;20.6;22.2;23.8;25.4;27.0;28.6;30.2;31.8;33.3;34.9;36.5;38.1;39.7;...
    6.4;7.1;7.9;8.7;9.5;10.3;11.1;11.9;12.7;14.3;15.9;17.5;19.1;20.6;22.2;23.8;25.4;...
    6.4;7.1;7.9;8.7;9.5;10.3;11.1;11.9;12.7;14.3;15.9;17.5;19.1;20.6;22.2;23.8;25.4;...
    6.4;7.1;7.9;8.7;9.5;10.3;11.1;11.9;12.7;14.3;15.9;17.5;19.1;20.6;22.2;23.8;25.4;27.0;28.6;30.2;31.8;...
    6.4;7.1;7.9;8.7;9.5;10.3;11.1;11.9;12.7;14.3;15.9;17.5;19.1;20.6;22.2;23.8;25.4;27.0;28.6;30.2;31.8];

TablaAPI=table(Diametro_m,Espesor_mm);
tabla_seleccion=TablaAPI(abs(TablaAPI.Diametro_m-diametro_entrada)<0.05 & abs(TablaAPI.Espesor_mm-(espesor_entrada*1000))<1.2,:);
end

With the input diameter(d) and the input thickness(e) I get the commercial pipe that has less than 0.05 in diameter and 1.2 in thickness from the former.

I want to do reproduce this in Python with Numpy or another package. First I defined 2 Numpy arrays, with the same names as in Matlab but comma separated instead of semicolon and without the "..." at the end of each line, then defined another Numpy array as:

TablaAPI=numpy.array([Diametro_m,Espesor_mm])

I want to know if I can index that array in some way like I did in Matlab or I have to define something else totally different.

Thanks a lot!

No, you don't, there's the thickness missing. You might do better to include the MATLAB approach (which I'm unlikely myself to be familiar with). searchsorted won't work here. I was originally in chem. eng. And I find it hard to believe that there's just an indexing issue going on here. — roganjosh
– roganjosh, Commented Jan 5, 2019 at 1:19

Daniel Scott · Accepted Answer · 2019-01-05 04:22:38Z

1

You sure can!

Here's an example of how you can use numpy:

Using Numpy

import math
import numpy as np

# Declare your Diametro_m, Espesor_mmhere just like you did in your example

# Transpose and merge the columns
arr = np.concatenate((Diametro_m, Espesor_mm.T), axis=1)
selection = arr[np.ix_(abs(arr[:0])<0.05,abs(arr[:1]-(math.e*1000)) > <1.2 )]

Example usage from John Zwinck's answer

Using Dataframes

Dataframes may also be great for your application in case you need to do heavier queries or mix column datatypes. This code should work for you, if you choose that option:

# These imports go at the top of your document
import pandas as pd
import numpy as np
import math


# Declare your Diametro_m, Espesor_mmhere just like you did in your example

df_d = pd.DataFrame(data=Diametro_m,
          index=np.array(range(1, len(Diametro_m))),
          columns=np.array(range(1, len(Diametro_m))))

df_e = pd.DataFrame(data=Espesor_mm,
          index=np.array(range(1, len(Diametro_m))),
          columns=np.array(range(1, len(Diametro_m))))

# Merge the dataframes
merged_df = pd.merge(left=df_d , left_index=True
                  right=df_e , right_index=True,
                  how='inner')

# Now you can perform your selections like this:
selection = merged_df.loc[abs(merged_df['df_d']) <0.05, abs(merged_df['df_e']-(math.e*1000))) <1.2]

# This "mask" of the dataframe will return all results that satisfy your query.
print(selection)

edited Jan 5, 2019 at 4:22

answered Jan 5, 2019 at 1:34

Daniel Scott

9857 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Azimuth Over a year ago

Hello, thanks for the reply, I can't get it work with neither of the options (Numpy or data frames). I have edited my question so is clearer because the "e" that I putted there wasn't the universal constant, it was for "espesor", I didn't eddit your answer because I don't know if changing this is the correct form to make it work the way I want. Here are the errors that came up: With the numpy method with arr: AxisError: axis 1 is out of bounds for array of dimension 1 With df method: construction error passed, implied)), ValueError: Shape of passed values is (1,223), indices imply(222,222)

Daniel Scott Over a year ago

Could you post either version of your code here? That much easier for us to test it ourselves and help you quickly.

ahed87 · Accepted Answer · 2019-01-06 13:07:44Z

Since you have not given an example of your expected output it's a bit of guessing what you are really after, but here is one version with numpy.

# rewritten arrays for numpy
Diametro_m=[0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,0.3556,
    0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,0.4064,
    0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,0.4570,
    0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,0.5080,
    0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,0.559,
    0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,0.610,
    0.660,0.660,0.660,0.660,0.660,0.660,0.660,0.660,0.660,0.660,0.660,0.660,0.660,0.660,0.660,0.660,0.660,
    0.711,0.711,0.711,0.711,0.711,0.711,0.711,0.711,0.711,0.711,0.711,0.711,0.711,0.711,0.711,0.711,0.711,
    0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,0.762,
    0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813,0.813]

Espesor_mm=[4.8,5.2,5.3,5.6,6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,31.8,
    4.8,5.2,5.6,6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8,
    4.8,5.6,6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8,
    5.6,6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8,33.3,34.9,
    5.6,6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8,33.3,34.9,36.5,38.1,
    6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8,33.3,34.9,36.5,38.1,39.7,
    6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,
    6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,
    6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8,
    6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8]


import numpy as np

diametro_entrada = 0.4
espesor_entrada = 5

Diametro_m = np.array(Diametro_m)
Espesor_mm = np.array(Espesor_mm)
# Diametro_m and Espesor_mm has shape (223,)
# if not change so that they have that shape
table = np.array([Diametro_m, Espesor_mm]).T

mask = np.where((np.abs(Diametro_m - diametro_entrada) < 0.05) &
                (np.abs(Espesor_mm - espesor_entrada) < 1.2)
                )
result = table[mask]
print('with numpy')
print(result)

or you can do it with just python...

# redo with python only
# based on a simple dict and list comprehension
D_m = [0.3556, 0.4064, 0.4570, 0.5080, 0.559, 0.610, 0.660, 0.711, 0.762, 0.813]
E_mm = [[4.8,5.2,5.3,5.6,6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,31.8],
    [4.8,5.2,5.6,6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8],
    [4.8,5.6,6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8],
    [5.6,6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8,33.3,34.9],
    [5.6,6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8,33.3,34.9,36.5,38.1],
    [6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8,33.3,34.9,36.5,38.1,39.7],
    [6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4],
    [6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4],
    [6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8],
    [6.4,7.1,7.9,8.7,9.5,10.3,11.1,11.9,12.7,14.3,15.9,17.5,19.1,20.6,22.2,23.8,25.4,27.0,28.6,30.2,31.8]]

table2 = dict(zip(D_m, E_mm))
result2 = []
for D, E in table2.items():
    if abs(D - diametro_entrada) < 0.05:
        Et = [t for t in E if abs(t - espesor_entrada) < 1.2]
        result2 += [(D, t) for t in Et]
print('with vanilla python')
print('\n'.join((str(r) for r in result2)))

Once you are in python there are endless ways to do this, you could easily do the same with pandas, or sqlite. My personal preference tends to lean towards as little dependencies as possible, in this case I would go for a csv file as input and then do it without numpy, if it was a true large scale problem I would consider sqlite/numpy/pandas.

Good luck with the transition, I don't think you will regret it.

Thanks a lot, this was what I was looking for, I tried the Numpy approach and works like a charm.
Nino problemos. btw when I copied your data and did a search-replace for the semicolon to colon and used it directly to a np.array it took some tricks to get it working, I therefore made a python-friendly dataset in the answer, maybe that tripped the earlier answer. In any case the normal shape in the numpy world for a 1d array is (m,) not (m,1), I have seen that trip some people used to matlab when starting to use numpy.

Collectives™ on Stack Overflow

Index by condition in python-numpy?

2 Answers 2

Using Numpy

Using Dataframes

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Using Numpy

Using Dataframes

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related