I have a large catalog that I am selecting data from according to the following criteria:
columns = ["System", "rp", "mp", "logg"]
catalog = pd.read_csv('data.txt', skiprows=1, sep ='\s+', names=columns)
# CUTS
i = (catalog.rp != -1) & (catalog.mp != -1)
new_catalog = pd.DataFrame(catalog[i])
print("{0} targets after cuts".format(len(new_catalog)))
When I perform the above cuts the code is working fine. Next, I want to add one more cut: I want to select all the targets that have 4.0 < logg < 5.0. However, some of the targets have logg = -1 (which stands for the fact that the value is not available). Luckily, I can calculate logg from the other available parameters. So here is my updated cuts:
# CUTS
i = (catalog.rp != -1) & (catalog.mp != -1)
if catalog.logg[i] == -1:
catalog.logg[i] = catalog.mp[i] / catalog.rp[i]
i &= (4 <= catalog.logg) & (catalog.logg <= 5)
However, I am receiving an error:
if catalog.logg[i] == -1:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Can someone please explain what I am doing wrong and how I can fix it. Thank you
Edit 1
My dataframe looks like the following:
Data columns:
System 477 non-null values
rp 477 non-null values
mp 477 non-null values
logg 477 non-null values
dtypes: float64(37), int64(3), object(3)None
Edit 2
System rp mp logg FeH FeHu FeHl Mstar Mstaru Mstarl
0 target-01 5196 24 24 0.31 0.04 0.04 0.905 0.015 0.015
1 target-02 5950 150 150 -0.30 0.25 0.25 0.950 0.110 0.110
2 target-03 5598 50 50 0.04 0.05 0.05 0.997 0.049 0.049
3 target-04 6558 44 -1 0.14 0.04 0.04 1.403 0.061 0.061
4 target-05 6190 60 60 0.05 0.07 0.07 1.194 0.049 0.050
....
[5 rows x 43 columns]
Edit 3
My code in a format that I understand should be:
for row in range(len(catalog)):
parameter = catalog['logg'][row]
if parameter == -1:
parameter = catalog['mp'][row] / catalog['rp'][row]
if parameter > 4.0 and parameter < 5.0:
# select this row for further analysis
However, I am trying to write my code in a more simple and professional way. I don't want to use the for loop. How can I do it?
EDIT 4
Consider the following small example:
System rp mp logg
target-01 2 -1 2 # will NOT be selected since mp = -1
target-02 -1 3 4 # will NOT be selected since rp = -1
target-03 7 6 4.3 # will be selected since mp != -1, rp != -1, and 4 < logg <5
target-04 3.2 15 -1 # will be selected since mp != -1, rp != -1, logg = mp / rp = 15/3.2 = 4.68 (which is between 4 and 5)
dfhas more columns than the one I posted. I removed them for simplicity.mp[i]andrp[i]? Should it be ascatalog.mp[i]andcatalog.rp[i]?describeoutput but could you show actual data? Likedf.head(10)?