0

I have an Excel file that contains a column with header 'Original Translation'. I also have a DataFrame with column 'Original Translation - {language}', based on the language I am using and some manipulations.

My goal is to open the Excel file, write over the column with header 'Original Translation' with all the data from my DataFrame column 'Original Translation - {language}', preserve the original Excel file formatting, and save to a new output folder.

Here is the code I currently have:

def output_formatted_capstan_file(df, original_file, country, language):
    # this is where you generate the file:
    # https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas
    try:
        print(original_file)
        book = load_workbook(original_file)
        writer = pd.ExcelWriter(original_file, engine='openpyxl')
        writer.book = book
        df.to_excel(writer, ['Original Translation - {}'.format(language)])
        writer.save()
    except:
        print('Failed')

1 Answer 1

1

I would approach this using the following method.

1) Import the excel file using a function such as pandas.read_excel, thus taking the data from excel into a dataframe. I'll call this exceldf

2) Merge this dataframe with the data you already have in the Pandas DataFrame. I will call your existing translated dataframe translateddf

3) Reorder the newly merged dataframe newdf and then export out the data. Further options for how to reorder are shown here: re-ordering data frame

4) Export the data to Excel. I'll leave you to integrate it into your initial code. For the generic answer to the question, others may want to look into the integrated Pandas options here to_excel

Example Code

import pandas

# Read in the Excel file
exceldf = pandas.read_excel(open('your_xls_xlsx_filename'), sheetname='Sheet 1')

# Create a new dataframe with your merged data, merging on 'key1'.
# We then drop the column of the original translation, as it should no longer be needed
# I've included the rename argument in case you need it.
newdf = exceldf.merge(translateddf, left_on=['key1'], \
                right_on=['key1']) \
.rename(columns={'Original Translation {language}': 'Original Translation {language}'}) \
.drop(['Original Translation'], axis=1)

# Re-order your data. 
# Note that if you renamed anything above, you have to update it here too
newdf = newdf[['0', '1', '2', 'Original Translation {language}']]

# An example export, that uses the generic implementation, not your specific code
pandas.newdf.to_excel("output.xlsx")
Sign up to request clarification or add additional context in comments.

5 Comments

This is great. I appreciate the detail tremendously. My only issue, I am losing the formatting of the original excel, which contains color coded stuff, etc. I would ideally like to use something that does change those things.. do you have an suggestions?
Formatting is often excel proprietary settings (especially conditional formatting). Instead of trying to read this into a dataframe, could I suggest that you copy > paste special > formatting, between the old and new excel file?
Sure thing, this is what I have been doing currently. Unfortunately, when I start processing thousands of Excel files, I won't have a chance to do that..
In that case, you will likely be forced to write the rules in xlsxwriter: xlsxwriter.readthedocs.io/example_pandas_conditional.html
You're welcome. Sorry, but I've not been forced to use that yet. Best of luck!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.