0

I have a Pandas Dataframe read in from a CSV file.

I want to create a larger Dataframe what includes some of the columns in the CSV file - however the header names are different so translation is needed.

The larger Dataframe values not included in the CSV file should be settable to some default value.

My best idea so far is to start with the dataframe from the CSV and use a dictionary to translate the column names. Then I can add the remaining columns to the resulting structure. This feels a bit clunky tho - any suggestions on how to best approach this?

An illustrative example

Initial CSV file:

Name,Age,Address,PhoneNumber

Dataframe output:

Given the above we have - Age=Age, FullName=Name, HomeAddress=Address.

The defaults for example could be - Nationality="USA", WorkAddress="Google", StarSign="Leo".

PhoneNumber is ignored completely in the output.

Age,Nationality,FullName,HomeAddress,WorkAddress,StarSign

1 Answer 1

2

IIUC, you can make use of dual pd.concat and rename i.e

init_df = init_df.rename(columns = {'Name':'FullName','Address':'HomeAddress'})

df = pd.DataFrame({'Nationality':["USA"], 'WorkAddress':["Google"], 'StarSign':["Leo"]})

final_df = pd.concat([init_df,pd.concat([df]*len(init_df)).set_index(init_df.index)],axis=1)
Sign up to request clarification or add additional context in comments.

2 Comments

Use of pd.concat([df]*len(init_df) is nice. If my initial_df contained columns that I wanted to remove from the final_df is there a way to do that without a separate statement to drop them?
Just select what you want i.e pd([init_df[cols],.... where cols is list of columns

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.