1

So I have converted a pdf to a dataframe and am almost in the final stages of what I wish the format to be. However I am stuck in the following step. I have a column which is like -

Column A
1234[321]
321[3]
123
456[456]

and want to separate it into two different columns B and C such that -

Column B          Column C
1234              321
321               3
123               0
456               456

How can this be achieved? I did try something along the lines of

df.Column A.str.strip(r"\[\d+\]")

but I have not been able to get through after trying different variations. Any help will be greatly appreciated as this is the final part of this task. Much thanks in advance!

0

2 Answers 2

1

An alternative could be:

# Create the new two columns
df[["Column B", "Column C"]]=df["Column A"].str.split('[', expand=True)
# Get rid of the extra bracket
df["Column C"] = df["Column C"].str.replace("]", "")
# Get rid of the NaN and the useless column
df = df.fillna(0).drop("Column A", axis=1)
# Convert all columns to numeric
df = df.apply(pd.to_numeric)
Sign up to request clarification or add additional context in comments.

Comments

1

You may use

import pandas as pd
df = pd.DataFrame({'Column A': ['1234[321]', '321[3]', '123', '456[456]']})
df[['Column B', 'Column C']] = df['Column A'].str.extract(r'^(\d+)(?:\[(\d+)])?$', expand=False)
# If you need to drop Column A here, use
# df[['Column B', 'Column C']] = df.pop('Column A').str.extract(r'^(\d+)(?:\[(\d+)])?$', expand=False)
df['Column C'][pd.isna(df['Column C'])] = 0
df
#    Column A Column B Column C
# 0  1234[321]     1234      321
# 1     321[3]      321        3
# 2        123      123        0
# 3   456[456]      456      456

See the regex demo. It matches

  • ^ - start of string
  • (\d+) - Group 1: one or more digits
  • (?:\[(\d+)])? - an optional non-capturing group matching [, then capturing into Group 2 one or more digits, and then a ]
  • $ - end of string.

1 Comment

Regex solution is powerful indeed but may be hard to comprehend by begginers. regex101 is really helpful site in understanding syntax. Upvoted this answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.