Python Pandas - Concat dataframes with different columns ignoring column names

Question

I have two pandas.DataFrames which I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings in different languages. How can I efficiently combine these dataframes?

df_ger
index  Datum   Zahl1   Zahl2
0      1-1-17  1       2
1      2-1-17  3       4

df_uk
index  Date    No1     No2
0      1-1-17  5       6
1      2-1-17  7       8

desired output
index  Datum   Zahl1   Zahl2
0      1-1-17  1       2
1      2-1-17  3       4
2      1-1-17  5       6
3      2-1-17  7       8

The only approach I came up with so far is to rename the column headings and then use pd.concat([df_ger, df_uk], axis=0, ignore_index=True). However, I hope to find a more general approach.

The dataframe I am working with is quite large. (>30 columns). I get it from an external source, the labels could change. — Axel
– Axel, Commented Aug 9, 2017 at 13:13
Maybe there is a more general way that works with the column index, ignoring the set column names, but I couldn't find anything, yet. — Axel
– Axel, Commented Aug 9, 2017 at 13:13

Stephen Rauch · Accepted Answer · 2017-08-09 13:28:03Z

19

If the columns are always in the same order, you can mechanically rename the columns and the do an append like:

Code:

new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))

Test Code:

df_ger = pd.read_fwf(StringIO(
    u"""
        index  Datum   Zahl1   Zahl2
        0      1-1-17  1       2
        1      2-1-17  3       4"""),
    header=1).set_index('index')

df_uk = pd.read_fwf(StringIO(
    u"""
        index  Date    No1     No2
        0      1-1-17  5       6
        1      2-1-17  7       8"""),
    header=1).set_index('index')

print(df_uk)
print(df_ger)

new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))

print(df_out)

Results:

         Date  No1  No2
index                  
0      1-1-17    5    6
1      2-1-17    7    8

        Datum  Zahl1  Zahl2
index                      
0      1-1-17      1      2
1      2-1-17      3      4

        Datum  Zahl1  Zahl2
index                      
0      1-1-17      1      2
1      2-1-17      3      4
0      1-1-17      5      6
1      2-1-17      7      8

answered Aug 9, 2017 at 13:28

Stephen Rauch♦

50.1k32 gold badges118 silver badges143 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ondřej Janča Over a year ago

this answer is outdated as of Pandas 2.0, since append has been removed

C. Nitschke · Accepted Answer · 2017-08-09 13:29:34Z

Provided you can be sure that the structures of the two dataframes remain the same, I see two options:

Keep the dataframe column names of the chosen default language (I assume en_GB) and just copy them over:
```
df_ger.columns = df_uk.columns
df_combined = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
```
This works whatever the column names are. However, technically it remains renaming.

Pull the data out of the dataframe using numpy.ndarrays, concatenate them in numpy, and make a dataframe out of it again:

np_ger_data = df_ger.as_matrix()
np_uk_data = df_uk.as_matrix()
np_combined_data = numpy.concatenate([np_ger_data, np_uk_data], axis=0)
df_combined = pd.DataFrame(np_combined_data, columns=["Date", "No1", "No2"])

This solution requires more resources, so I would opt for the first one.

Ynjxsjmh · Accepted Answer · 2022-04-28 16:07:41Z

8

You can concat the dataframe values:

df = pd.DataFrame(np.vstack([df1.values, df2.values]), columns=df1.columns)
# or
df = pd.DataFrame(np.concatenate([df1.values, df2.values], axis=0), columns=df1.columns)

print(df)

  index   Datum Zahl1 Zahl2
0     0  1-1-17     1     2
1     1  2-1-17     3     4
2     0  1-1-17     5     6
3     1  2-1-17     7     8

If you want to reindex the index column

df['index'] = range(len(df))

print(df)

   index   Datum Zahl1 Zahl2
0      0  1-1-17     1     2
1      1  2-1-17     3     4
2      2  1-1-17     5     6
3      3  2-1-17     7     8

answered Apr 28, 2022 at 16:07

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Comments

osbon123 · Accepted Answer · 2017-08-09 13:33:53Z

5

I am not sure if this will be simpler than what you had in mind, but if the main goal is for something general then this should be fine with one assumption: The columns in the two files match for example if date is the first column, the translated version will also be the first column.

# number of columns
n_columns = len(df_ger.columns)

# save final columns names
columns = df_uk.columns

# rename both columns to numbers
df_ger.columns = range(n_columns)
df_uk.columns = range(n_columns)

# concat columns
df_out = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)

# rename columns in new dataframe
df_out.columns = columns

answered Aug 9, 2017 at 13:33

osbon123

4993 silver badges11 bronze badges

Collectives™ on Stack Overflow

Python Pandas - Concat dataframes with different columns ignoring column names

4 Answers 4

Code:

Test Code:

Results:

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Code:

Test Code:

Results:

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related