0

I'm trying to extract values from array rows of a specific column with specified indices.

A dummy example, if I have a column called 'arr' in my dataframe where each array below is a row-

[1, 2, 3, 4, 5]

[6, 7, 8, 9, 10]

[11, 12, 13, 14, 15]

[16, 17, 18, 19, 20]

I've tried:

for row in df.itertuples(): 
    i1 = [0,1,2]
    r1 = np.array(df.arr)[i1]

    i2 = [2,3]
    r2 = np.array(df.arr)[i2]

which gives the rows 0, 1 and 2 from the dataframe.

And I've tried:

for row in df.itertuples(): 
    i1 = [0,1,2]
    r1 = np.array(row.arr)[i1]

    i2 = [2,3]
    r2 = np.array(row.arr)[i2]

which gives the values from only the last row. I don't understand why.

What I want to get are the indices specified in i1 and i2 as two different variables (r1 and r2) for each row. So-

r1 should give-

[1, 2, 3]

[6, 7, 8]

[11, 12, 13]

[16, 17, 18]

And r2 should give-

[3, 4]

[8, 9]

[13, 14]

[18, 19]

I've also used iterrows() with no luck.

8
  • Does this answer your question? How to iterate over rows in a DataFrame in Pandas? Commented May 31, 2020 at 18:00
  • No, because the answers in the link don't talk about extracting values from arrays for each row. Commented May 31, 2020 at 18:04
  • @Sp_95 do you want two columns for each row, where 1st column contains the df[a] whereas 2nd column contains df[b] ? Commented May 31, 2020 at 18:07
  • I would like 2 columns containing the values of r1 and r2. So basically the extracted elements. Commented May 31, 2020 at 18:09
  • 1
    Please post an example of your desired output Commented May 31, 2020 at 18:10

2 Answers 2

1

if you want columns r1 and r2 in same dataframe , you can use:

df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['arr'] = df[['b', 'c', 'd', 'e']].values.tolist()
df['r1']=df['arr']
df['r1']=df['r1'].apply(lambda x:x[0:3])
df['r2']=df['arr']
df['r2']=df['r2'].apply(lambda x:x[2:4])

I have applied lambda that does the work, is this what you want?

If you want a new dataframe with rows r1 and r2 , you can use

from operator import itemgetter 
a=[0,1,2]
b=[2,3]
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['arr'] = df[['b', 'c', 'd', 'e']].values.tolist()
data=pd.DataFrame()
data['r1']=df['arr']
data['r2']=df['arr']
data['r1']=data['r1'].apply(lambda x:itemgetter(*a)(x))
data['r2']=data['r2'].apply(lambda x:itemgetter(*b)(x))
data  

does this edit help you!

Sign up to request clarification or add additional context in comments.

4 Comments

If I have another array from where I pull and store the indices in different variables, how can I apply that to the lambda function? In other words, is there a way to not to explicitly specify [0:3] and [2:4] and instead call the variable where I saved the indices (which in my case are i1 and i2)?
The output is exactly what I want, except I don't want to hard code those indices into the solution.
Thanks for your solution bu what I'm looking for is-Is there a way to still maintain i1=[0, 1, 2] and call i1 into the lambda function instead of coding it as [0:3] for r1 and similarly r2?
@Sp_95 see new edit in the answer. i have used a and b directly and not slicing , so this should solve your problem
1

Try:

i1, i2 = [0,1,2],[2,3]
number_rows = 4
r1, r2 = np.zeros((number_rows,3)), np.zeros((number_rows,2))
for i in range(number_rows):
    r1[i] = np.array(df.arr)[i][i1]
    r2[i] = np.array(df.arr)[i][i2]

The problem with your first attempt was, that if you give a 2D (like np.array(df.arr)) array only one index, it will return the whole row for each index.

In your second attempt, you actually get the results you want in each row, but you overwrite the results of former rows, so you only get the values of the last row. You can fix this by inserting the results of each row into your result arrays, as done above.

2 Comments

Ah, I see what you mean by overwriting previous results. I tried out your solution but it gives the error 'list indices must be integers or slices, not list'
Hm that is weird, because it works for me. Maybe you forgot to convert your array to a numpy array, because you numpy array indices can be lists, whereas list indices cannot be lists. But as I see the other solution worked for you, which is actually more elegant ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.