0

i have a dataset consist of C/C++ functions as rows. i want to get each function, spilt them, and create a list of words(A). and put that list A to list B as List of Lists in python

so far I was using this but my dataset has 128312 items and it is slow.

can we improve this? if yes I am open to suggestions

functionSourceDF = hdf.get('functionSource')

.
.
.

FSDarray = []
for i in range(0,size):
    FSDarray.append(functionSourceDF[i].split(" "))
FSDarray = np.array(FSDarray)

Thank you.

1
  • 1
    Do you know where you're spending the time? In the split? in getting elements from the hdf data structure? I would start by trying to profile my code. Either by using a profiler, or by just removing one piece at a time and see what's the performance impact. Commented Jul 1, 2020 at 6:20

1 Answer 1

2

You can actually use numpy for this kind of problem.

import numpy as np
a = ["This is a test", "of numpy", "splitting words"]
a = np.array(a)
a = np.char.split(a)
print(a)

Output

[list(['This', 'is', 'a', 'test']) list(['of', 'numpy']) list(['splitting', 'words'])]

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.