Using python split leaves empty string

Question

I'm using the python split method to manipulate with some filepaths. It looks like this, where I split the filepath to make a list, and then do some slicing on it:

array = "/home/ask/Git/Zeeguu-API/zeeguu_core/user_statistics/main.py"
split = array.split("/")

Which outputs:

['', 'home', 'ask', 'Git', 'Zeeguu-API', 'zeeguu_core', 'user_statistics', 'main.py']

The issue here is the little empty string in the beginning of the list, it makes sense that it is there but is annoying, and messes with the slicing that I want to do.

How can I split, but omit the empty strings? I would rather like to avoid having to do a O(n) operation, just to filter out the empty string, I really hope there is somehow to avoid it in the call to split()

Would it be fine if you just popped the starting string in the list? — Thavas Antonio
– Thavas Antonio, Commented May 13, 2021 at 22:01
You understand that the leading / in the path conveys information (it distinguishes a relative path from an absolute one), right? What are you trying to do with the split-up path information? Do you also want to omit empty path components in the middle (like in foo//bar)? Why? Have you considered using the built-in standard library support for manipulating file paths (pathlib, or at the very least os.path)? — Karl Knechtel
– Karl Knechtel, Commented May 13, 2021 at 22:01
i would prefer not to .I have to work with filepaths that both start with and without "/" — Garsty100
– Garsty100, Commented May 13, 2021 at 22:02
And i'm going to use the filepaths to build a graph that can model python projects. I therefore need lists that can represent directory paths, and therefore slashes are not important — Garsty100
– Garsty100, Commented May 13, 2021 at 22:04

Muhd Mairaj · Accepted Answer · 2021-05-13 22:16:09Z

2

You can do the following to solve the problem:

array.strip("/").split("/")

The fact that the empty string is there is useful when you want to reverse the operation with "/".join() and get the original string

answered May 13, 2021 at 22:16

Muhd Mairaj

6775 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

StyleZ Over a year ago

Out of curiosity, isnt this answer a bad one ? To my knowledge, OP said that he/she wants to avoid additional O(n) operation, yet thats exactly what strip does (according to this SO post -> stackoverflow.com/a/27684660/9559884) ... In my opinion, the time complexity of this code is O(2n), which is equal to O(n), yet why to run through the same string twice ? (If I am mistaken, please let me know, I will gladly learn something new :) )

tdelaney Over a year ago

@StyleZ - its not really even 0(2n) as any one string creation in the split is as expensive as popping the list. And there isn't really such a thing as O(2n)... its still O(n) in growth rate.

StyleZ Over a year ago

@tdelaney thank you for clarification. What I mean when I was talking about O(2n) was that in my opinion what OP wanted was that he/she does not want to iterate through the same string twice (i might be mistaken), and by doing "strip" you are creating a copy of a string removing leading and trailing characters, which is a single iteration of a string and then an additional iteration where its being split. To make things clear ... I like this solution and now its up to OP whether this is the solution he is looking for.

Muhd Mairaj Over a year ago

The reason i mentioned this solution is because OP said that they work with files that both have "/" in the beginning and also dont. The other solution would be to use if array.startswith("/"): array[1:].split("/"). Im not sure of that has a lower time complexity, if you know pls let me know so i cant add that to my answer

StyleZ · Accepted Answer · 2021-05-13 22:23:09Z

0

I would rather like to avoid having to do a O(n)

In case that this is an algorithmic homework, I recommend you to do not read this solution unless you figure it out on your own ... But if its not, then I recommend instead of using a split() function doing your own split ... The way you do that is just that you will iterate char by char in the string and you will manually create the list of output ...

index = 0
output_array = ['']

for character in string:
    if character != '/':
        if output_array[index] == None:
            output_array[index] = ''
        output_array[index] += character
    else if output_array[index] != '':   # character == '/'
        index++

The above code is just a pseudocode, so you will have to rewrite it on your own ... also, the problem with this solution is that it will output the [''] in case there is nothing in the path variable, but that is an easy fix :)

answered May 13, 2021 at 22:23

StyleZ

1,2733 gold badges15 silver badges30 bronze badges

4 Comments

tdelaney Over a year ago

But a split in python is going to be more expensive than the default one implemented in C. Each one of those characters is really a string, with the overhead needed to both create and concatenate them.

StyleZ Over a year ago

thats a good point ... in that can it should be run using indexes of the string too right? (instead of forloop, there should be while character_pointer < len(string)) Do you think that this wouldsolve the problem ?

tdelaney Over a year ago

Python doesn't implement a special character object, just a str with a single character in it. Any time you reference a single character in a string, python has to create a str to represent it. Using an index is more expensive than iterating the characters because there is also a dereference of the string in there also. You could write this comparison in C or perhaps cython, but as long as you stick with python, you have its inherent inefficiencies.

StyleZ Over a year ago

thanks for clarification. I understand the problem now.

aterzgar · Accepted Answer · 2021-05-14 08:25:02Z

0

array = "/home/ask/Git/Zeeguu-API/zeeguu_core/user_statistics/main.py"
list1= array.split("/")
list_without_space = []
for element in list1:
    if element.strip():
        list_without_space.append(element)

print(list_without_space)

answered May 14, 2021 at 8:25

aterzgar

11 bronze badge

1 Comment

Thavas Antonio Over a year ago

Answer's are great. But please add some text around your code to help the OP understand your answer. This encourages them not to copy/paste the answer. Thank you!

G.T. · Accepted Answer · 2022-01-18 09:28:06Z

0

There is another possibility using a list comrehension:

array1 = "/home/ask/Git/Zeeguu-API/zeeguu_core/user_statistics/main.py"
split1 = array1.split(sep="/")
noempty_strings = [x for x in split1 if x !='']
print(noempty_strings)

I think this method is the fastest among all the ones mentioned above.

P.S. Actually, I recieved an inspiration for this anoswer from @paxdiablo's answer to this question. So, credit goes to paxdiablo.

answered Jan 18, 2022 at 9:28

G.T.

1401 silver badge11 bronze badges

Collectives™ on Stack Overflow

Using python split leaves empty string

4 Answers 4

4 Comments

4 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related