2

I am trying to concatnate all files in the file list file_list:

result = pd.concat([pd.read_csv(f).set_index(['a', 'b', 'c']) for f in file_list])

The challenge is that, I would like to replace string 'xyz' with nothing in column[b] before set_index. How can I achieve this in the same line?

1 Answer 1

1

I believe you need replace with nested dict:

dfs=[pd.read_csv(f).replace({'b':{'xyz':''}}).set_index(['a', 'b', 'c']) for f in file_list]
result = pd.concat(dfs)

Or if xyz strings are not in columns a and c is possible create MultiIndex and then replace all xyz:

dfs = [pd.read_csv(f, index_col=['a','b','c']).rename({'xyz':''}) for f in file_list]
result = pd.concat(dfs)

Last if nothing is NaN only use {'xyz':np.nan} instead {'xyz':''}

EDIT by comment:

For replace by regex:

dfs= [pd.read_csv(f).replace({'b':{'xyz*':''}}, regex=True).set_index(['a', 'b', 'c']) for f in file_list]
result = pd.concat(dfs)
Sign up to request clarification or add additional context in comments.

2 Comments

Just to add, I used regular expression dfs=[pd.read_csv(f).replace({'b':{'xyz*':''}}, regex=True).
I add it to answer too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.