Creating list of list of a dataset with better performance python

Question

i have a dataset consist of C/C++ functions as rows. i want to get each function, spilt them, and create a list of words(A). and put that list A to list B as List of Lists in python

so far I was using this but my dataset has 128312 items and it is slow.

can we improve this? if yes I am open to suggestions

functionSourceDF = hdf.get('functionSource')

.
.
.

FSDarray = []
for i in range(0,size):
    FSDarray.append(functionSourceDF[i].split(" "))
FSDarray = np.array(FSDarray)

Thank you.

Do you know where you're spending the time? In the split? in getting elements from the hdf data structure? I would start by trying to profile my code. Either by using a profiler, or by just removing one piece at a time and see what's the performance impact. — Roy2012
– Roy2012, Commented Jul 1, 2020 at 6:20

Rakshit Arora · Accepted Answer · 2020-07-01 06:33:49Z

2

You can actually use numpy for this kind of problem.

import numpy as np
a = ["This is a test", "of numpy", "splitting words"]
a = np.array(a)
a = np.char.split(a)
print(a)

Output

[list(['This', 'is', 'a', 'test']) list(['of', 'numpy']) list(['splitting', 'words'])]

answered Jul 1, 2020 at 6:33

Rakshit Arora

6747 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Creating list of list of a dataset with better performance python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related