0

I need to extract path component from url string at different depth levels. If the input is:

http//10.6.7.9:5647/folder1/folder2/folder3/folder4/df.csv

Output should be:

    folder1_path = 'http//10.6.7.9:5647/folder1'
    folder2_path = 'http//10.6.7.9:5647/folder1/folder2'
    folder3_path = 'http//10.6.7.9:5647/folder1/folder2/folder3' 
    folder4_path = 'http//10.6.7.9:5647/folder1/folder2/folder3/folder4'

Output is to create 3 new string variable by doing string operation on my_url_path.

3 Answers 3

1

You can use a clever combination of string split and join. Something like this should work:

def path_to_folder_n(url, n):
  """
  url: str, full url as string
  n: int, level of directories to include from root
  """
  base = 3
  s = url.split('/')
  return '/'.join(s[:base+n])


my_url_path =   'http//10.6.7.9:5647/folder1/folder2/folder3/folder4/df.csv'

# folder 1
print(path_to_folder_n(my_url_path, 1))

# folder 4
print(path_to_folder_n(my_url_path, 4))

# folder 3
print(path_to_folder_n(my_url_path, 3))

Output:

>> http//10.6.7.9:5647/folder1
>> http//10.6.7.9:5647/folder1/folder2/folder3/folder4
>> http//10.6.7.9:5647/folder1/folder2/folder3

Keep in mind you may want to add error checks to avoid n going too long.

See it in action here: https://repl.it/repls/BelovedUnhealthyBase#main.py

Sign up to request clarification or add additional context in comments.

Comments

0

For getting the parent directory from a string in this format you could simply do

my_url_path.split('/')[-2]

For any parent you subtract the number from the index of the list.

Comments

0

I've made this function that address your problem.

It just uses split() and join() methods of the str class, and also the takewhile() function of the itertools module, which basically takes elements from an iterable while the predicate (its first argument) is true.

from itertools import takewhile


def manipulate_path(target, url):
    path_parts = url.split('/')
    partial_output = takewhile(lambda x: x != target, path_parts)
    return "/".join(partial_output) + f'/{target}'

You can use it as follows:

manipulate_path('folder1', my_url_path)    # returns 'http//10.6.7.9:5647/folder1'
manipulate_path('folder2', my_url_path)    # returns 'http//10.6.7.9:5647/folder1/folder2'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.