Extract a sub-array from and array according to a condition in python

Question

I would like to extract some lines and columns (kind of sub array) based on conditions

here is an exemple of input and desired output

[["00:00:01","data_update","data1",10.5,"blabla"],
 ["00:00:02","proc_call","xxx","xxx","blalla"],
 ["00:00:15","data_update","data2",34.5,"blabla"],
 ["00:00:25","proc_call","xxx","xxx","blalla"]]

desired output (keep "data_update" line with col 0, 2 and 3)

here is an exemple of input and desired output

[["00:00:01","data1",10.5],
 ["00:00:15","data2",34.5]]

Is there a simple way to do that in python ?

FloLie · Accepted Answer · 2020-05-15 07:26:45Z

4

You can either use a for loop like thus:

reduced_array = []

for i in range(len(full_array)):
  if full_array[i][1] == 'data_update':
    reduced_array.append([i[0],i[2],i[3]])

or by list comprehension

reduced_array = [[i[0],i[2],i[3]] for i in full_array if i[1] == 'data_update']

if you need to handle more columns you could also use

cols = [0,2,3]
reduced_array = [[i[col] for col in cols] for i in full_array if i[1] == 'data_update']

With regard to adnanmuttaleb answer, using lambda functions is way faster than the list comprehension method proposed by me, however it is also more difficult if someone is not familiar with the concept. For comprehensiveness and without wanting to take credit for his answer I add it here.

reduced_array = map(lambda sub: [sub[i] for i in cols], filter(lambda sub: "data_update" in sub, full_array))

Runtime comparison:

import random as rd
import time

full_array = [[rd.random(),"data_update" if rd.random()< 0.2 else "no",rd.random(),rd.random()] for i in range(1000000)]
cols = [0,2,3]

start1 = time.time()
reduced_array = map(lambda sub: [sub[i] for i in cols], filter(lambda sub: "data_update" in sub, full_array))
print(time.time()-start1)

start2 = time.time()
reduced_array2 = [[i[col] for col in cols] for i in full_array if i[1] == 'data_update']
print(time.time()-start2)

results in

#Lambda function:
0.004003286361694336
#List comprehension
0.254199743270874

edited May 15, 2020 at 7:26

answered May 14, 2020 at 9:04

FloLie

1,8511 gold badge10 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sleli Over a year ago

Thank you very much, it does the job perfectly and I really appreciate all the explanations

sleli Over a year ago

looks like the first example should refer to full_array[i][1] and not i[1]

adnanmuttaleb · Accepted Answer · 2020-05-14 09:11:13Z

1

For Inputs:

l = [["00:00:01","data_update","data1",10.5,"blabla"],
 ["00:00:02","proc_call","xxx","xxx","blalla"],
 ["00:00:15","data_update","data2",34.5,"blabla"],
 ["00:00:25","proc_call","xxx","xxx","blalla"]]

cols = (0, 2, 3)

Do:

result = map(lambda sub: [sub[i] for i in cols], filter(lambda sub: "data_update" in sub, l))
print(list(result))

Output:

[['00:00:01', 'data1', 10.5], ['00:00:15', 'data2', 34.5]]

answered May 14, 2020 at 9:11

adnanmuttaleb

3,6641 gold badge34 silver badges52 bronze badges

Comments

Zhd Zilin · Accepted Answer · 2020-05-14 09:16:46Z

0

result = filter(lambda x: "data_update" in x, a)
result = [[item[0],item[2],item[3]] for item in result]

The first line, find out all lines contains "data_update" The second line, rebuild the result with the 3 columns you need.

answered May 14, 2020 at 9:16

Zhd Zilin

1337 bronze badges

Comments

FloLie · Accepted Answer · 2020-05-14 09:27:04Z

0

What about looping through the list?

needle = 'data_update'
haystack = [
    ["00:00:01","data_update","data1",10.5,"blabla"],
    ["00:00:02","proc_call","xxx","xxx","blalla"],
    ["00:00:15","data_update","data2",34.5,"blabla"],
    ["00:00:25","proc_call","xxx","xxx","blalla"]
]

container = []
for x in range(len(haystack)):
    if needle in haystack[x]:
        container.append([haystack[x][0], haystack[x][2], haystack[x][3]])

This loops through each element in the list and tests if your needle is present in the list item. If it is, then it adds it appends the data to a new output container, made up of only the data that you asked for.

edited May 14, 2020 at 9:27

FloLie

1,8511 gold badge10 silver badges19 bronze badges

answered May 14, 2020 at 9:24

user1952604

12 bronze badges

1 Comment

FloLie Over a year ago

Correct result, but first, loop is slower than both list comprehension and lambda, answer has been provided and x in y is a computational intensive search algorithm

Collectives™ on Stack Overflow

Extract a sub-array from and array according to a condition in python

4 Answers 4

2 Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related