4

I have a 1D "from"-array (call it "frm") containing values with an associated Boolean mask-array: "mask" (same shape as frm). Then I have a third "replace" array: "repl", also 1D but shorter in length than the other two.

With these, I would like to generate a new array ("to") which contains the frm values except where mask==True in which case it should take in-order the values from repl. (Note that the number of True elements in mask equals the length of repl).

I was looking for a "clever" numpy way of implementing this? I looked at methods like np.where, np.take, np.select, np.choose but none seem to "fit the bill"?

"Cutting to the code", here's what I have thus far. It works fine but doesn't seem "Numpythonic"? (or even Pythonic for that matter)

frm  = [1, 2, 3, 4, 5]
mask = [False, True, False, True, True]
repl = [200, 400, 500]
i = 0; to = []
for f,m in zip(frm,mask):
    if m:
        to.append(repl[i])
        i += 1
    else:
        to.append(f)
print(to)

Yields: [1, 200, 3, 400, 500]

(Background: the reason I need to do this is because I'm subclassing Pandas pd.Dataframe class and need a "setter" for the Columns/Index. As pd.Index cannot be "sliced indexed" I need to first copy the index/column array, replace some of the elements in the copy based on the mask and then have the setter set the complete new value. Let me know if anyone would know a more elegant solution to this).

1 Answer 1

6

numpy solution:

Its pretty straightforward like this:

# convert frm to a numpy array:
frm = np.array(frm)
# create a copy of frm so you don't modify original array:
to = frm.copy()

# mask to, and insert your replacement values:
to[mask] = repl

Then to returns:

>>> to
array([  1, 200,   3, 400, 500])

pandas solution:

if your dataframe looks like:

>>> df
   column
0       1
1       2
2       3
3       4
4       5

Then you can use loc:

df.loc[mask,'column'] = repl

Then your dataframe looks like:

>>> df
   column
0       1
1     200
2       3
3     400
4     500
Sign up to request clarification or add additional context in comments.

3 Comments

Re the numpy solution: Nice! Here I am looking for "special methods" completely overlooking the fact that I can simply assign to a variable using the mask for the indexing! :-)
Re the Pandas solution: I'm aware using "loc" for this for the contents of a DataFrame. As I can tell, there's no equivalent for the Axes (the "index" and "column" names, not the actual values inside the dataframe). For example: df.columns[3] works. But df.columns[3] = "new-name" gives TypeError: "Index does not support mutable operations" (which prompted me to take it into numpy for the solution).
Oh I guess I misunderstood what you were going for... yeah it's probably best to do it in numpy and use the resulting array as your index (IIUC)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.