0

How can I select the characters or file path after the Dev\” and dev\ from the column in a spark DF?

Sample rows of the pyspark column:

\\D\Dev\johnny\Desktop\TEST
\\D\Dev\matt\Desktop\TEST\NEW
\\D\Dev\matt\Desktop\TEST\OLD\TEST
\\E\dev\peter\Desktop\RUN\SUBFOLDER\New

Expected Output

johnny\Desktop\TEST
matt\Desktop\TEST\NEW
matt\Desktop\TEST\OLD\TEST
peter\Desktop\RUN\SUBFOLDER\New

I tried to use the code below.

df = df.withColumn(
        "sub_path",
        F.element_at(F.split(F.col("path"), "Dev\\\\"), -1)
    )

It's only giving the part correct results that I want. Appreciate someone can help.

1 Answer 1

1

The following modification [Dd] matches both upper and lower case d.

df = df.withColumn(
        "sub_path",
        F.element_at(F.split(F.col("path"), "[Dd]ev\\\\"), -1)
    )

Let me know if this works for you.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for the answer. Is there a way to choose by the number of backslashes? In this case, after 4th backslash, we choose the rest of the string? When we have large numbers of rows with different charactors.
Please add this as another question with sample data and expected results to test possible solutions so that we may look at this also.
I have added a new question. stackoverflow.com/questions/69024095/… Thank you for the support

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.