35

I have two pandas.DataFrames which I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings in different languages. How can I efficiently combine these dataframes?

df_ger
index  Datum   Zahl1   Zahl2
0      1-1-17  1       2
1      2-1-17  3       4

df_uk
index  Date    No1     No2
0      1-1-17  5       6
1      2-1-17  7       8

desired output
index  Datum   Zahl1   Zahl2
0      1-1-17  1       2
1      2-1-17  3       4
2      1-1-17  5       6
3      2-1-17  7       8

The only approach I came up with so far is to rename the column headings and then use pd.concat([df_ger, df_uk], axis=0, ignore_index=True). However, I hope to find a more general approach.

3
  • What is your concern with renaming? Commented Aug 9, 2017 at 13:08
  • The dataframe I am working with is quite large. (>30 columns). I get it from an external source, the labels could change. Commented Aug 9, 2017 at 13:13
  • Maybe there is a more general way that works with the column index, ignoring the set column names, but I couldn't find anything, yet. Commented Aug 9, 2017 at 13:13

4 Answers 4

19

If the columns are always in the same order, you can mechanically rename the columns and the do an append like:

Code:

new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))

Test Code:

df_ger = pd.read_fwf(StringIO(
    u"""
        index  Datum   Zahl1   Zahl2
        0      1-1-17  1       2
        1      2-1-17  3       4"""),
    header=1).set_index('index')

df_uk = pd.read_fwf(StringIO(
    u"""
        index  Date    No1     No2
        0      1-1-17  5       6
        1      2-1-17  7       8"""),
    header=1).set_index('index')

print(df_uk)
print(df_ger)

new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))

print(df_out)

Results:

         Date  No1  No2
index                  
0      1-1-17    5    6
1      2-1-17    7    8

        Datum  Zahl1  Zahl2
index                      
0      1-1-17      1      2
1      2-1-17      3      4

        Datum  Zahl1  Zahl2
index                      
0      1-1-17      1      2
1      2-1-17      3      4
0      1-1-17      5      6
1      2-1-17      7      8
Sign up to request clarification or add additional context in comments.

1 Comment

this answer is outdated as of Pandas 2.0, since append has been removed
10

Provided you can be sure that the structures of the two dataframes remain the same, I see two options:

  1. Keep the dataframe column names of the chosen default language (I assume en_GB) and just copy them over:

    df_ger.columns = df_uk.columns
    df_combined = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
    

    This works whatever the column names are. However, technically it remains renaming.

  2. Pull the data out of the dataframe using numpy.ndarrays, concatenate them in numpy, and make a dataframe out of it again:

    np_ger_data = df_ger.as_matrix()
    np_uk_data = df_uk.as_matrix()
    np_combined_data = numpy.concatenate([np_ger_data, np_uk_data], axis=0)
    df_combined = pd.DataFrame(np_combined_data, columns=["Date", "No1", "No2"])
    

    This solution requires more resources, so I would opt for the first one.

Comments

8

You can concat the dataframe values:

df = pd.DataFrame(np.vstack([df1.values, df2.values]), columns=df1.columns)
# or
df = pd.DataFrame(np.concatenate([df1.values, df2.values], axis=0), columns=df1.columns)
print(df)

  index   Datum Zahl1 Zahl2
0     0  1-1-17     1     2
1     1  2-1-17     3     4
2     0  1-1-17     5     6
3     1  2-1-17     7     8

If you want to reindex the index column

df['index'] = range(len(df))
print(df)

   index   Datum Zahl1 Zahl2
0      0  1-1-17     1     2
1      1  2-1-17     3     4
2      2  1-1-17     5     6
3      3  2-1-17     7     8

Comments

5

I am not sure if this will be simpler than what you had in mind, but if the main goal is for something general then this should be fine with one assumption: The columns in the two files match for example if date is the first column, the translated version will also be the first column.

# number of columns
n_columns = len(df_ger.columns)

# save final columns names
columns = df_uk.columns

# rename both columns to numbers
df_ger.columns = range(n_columns)
df_uk.columns = range(n_columns)

# concat columns
df_out = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)

# rename columns in new dataframe
df_out.columns = columns

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.