0

I would like to extract some lines and columns (kind of sub array) based on conditions

here is an exemple of input and desired output

[["00:00:01","data_update","data1",10.5,"blabla"],
 ["00:00:02","proc_call","xxx","xxx","blalla"],
 ["00:00:15","data_update","data2",34.5,"blabla"],
 ["00:00:25","proc_call","xxx","xxx","blalla"]]

desired output (keep "data_update" line with col 0, 2 and 3)

here is an exemple of input and desired output

[["00:00:01","data1",10.5],
 ["00:00:15","data2",34.5]]

Is there a simple way to do that in python ?

4 Answers 4

4

You can either use a for loop like thus:

reduced_array = []

for i in range(len(full_array)):
  if full_array[i][1] == 'data_update':
    reduced_array.append([i[0],i[2],i[3]])

or by list comprehension

reduced_array = [[i[0],i[2],i[3]] for i in full_array if i[1] == 'data_update']

if you need to handle more columns you could also use

cols = [0,2,3]
reduced_array = [[i[col] for col in cols] for i in full_array if i[1] == 'data_update']

With regard to adnanmuttaleb answer, using lambda functions is way faster than the list comprehension method proposed by me, however it is also more difficult if someone is not familiar with the concept. For comprehensiveness and without wanting to take credit for his answer I add it here.

reduced_array = map(lambda sub: [sub[i] for i in cols], filter(lambda sub: "data_update" in sub, full_array))

Runtime comparison:

import random as rd
import time

full_array = [[rd.random(),"data_update" if rd.random()< 0.2 else "no",rd.random(),rd.random()] for i in range(1000000)]
cols = [0,2,3]

start1 = time.time()
reduced_array = map(lambda sub: [sub[i] for i in cols], filter(lambda sub: "data_update" in sub, full_array))
print(time.time()-start1)

start2 = time.time()
reduced_array2 = [[i[col] for col in cols] for i in full_array if i[1] == 'data_update']
print(time.time()-start2)

results in

#Lambda function:
0.004003286361694336
#List comprehension
0.254199743270874
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much, it does the job perfectly and I really appreciate all the explanations
looks like the first example should refer to full_array[i][1] and not i[1]
1

For Inputs:

l = [["00:00:01","data_update","data1",10.5,"blabla"],
 ["00:00:02","proc_call","xxx","xxx","blalla"],
 ["00:00:15","data_update","data2",34.5,"blabla"],
 ["00:00:25","proc_call","xxx","xxx","blalla"]]

cols = (0, 2, 3)

Do:

result = map(lambda sub: [sub[i] for i in cols], filter(lambda sub: "data_update" in sub, l))
print(list(result))

Output:

[['00:00:01', 'data1', 10.5], ['00:00:15', 'data2', 34.5]]

Comments

0
result = filter(lambda x: "data_update" in x, a)
result = [[item[0],item[2],item[3]] for item in result]

The first line, find out all lines contains "data_update" The second line, rebuild the result with the 3 columns you need.

Comments

0

What about looping through the list?

needle = 'data_update'
haystack = [
    ["00:00:01","data_update","data1",10.5,"blabla"],
    ["00:00:02","proc_call","xxx","xxx","blalla"],
    ["00:00:15","data_update","data2",34.5,"blabla"],
    ["00:00:25","proc_call","xxx","xxx","blalla"]
]

container = []
for x in range(len(haystack)):
    if needle in haystack[x]:
        container.append([haystack[x][0], haystack[x][2], haystack[x][3]])

This loops through each element in the list and tests if your needle is present in the list item. If it is, then it adds it appends the data to a new output container, made up of only the data that you asked for.

1 Comment

Correct result, but first, loop is slower than both list comprehension and lambda, answer has been provided and x in y is a computational intensive search algorithm

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.