3

It seems the default for pd.read_csv() is to read in the column names as str. I can't find the behavior documented and thus can't find where to change it.

Is there a way to tell read_csv() to read in the column names as integer?

Or maybe the solution is specifying the datatype when calling pd.DataFrame.to_csv(). Either way, at the time of writing to csv, the column names are integers and that is not preserved on read.

The code I'm working with is loosely related to this (credit):

df = pd.DataFrame(index=pd.MultiIndex.from_arrays([[], []]))
for row_ind1 in range(3):
    for row_ind2 in range(3, 6):
        for col in range(6, 9):
            entry = row_ind1 * row_ind2 * col
            df.loc[(row_ind1, row_ind2), col] = entry

df.to_csv("df.csv")

dfr = pd.read_csv("df.csv", index_col=[0, 1])
print(dfr.loc[(0, 3), 6])       # KeyError
print(dfr.loc[(0, 3), "6"])     # No KeyError
2
  • "at the time of writing to the csv, the column names are integers and that is not preserved on read." It is a text file so there's no way to preserve data types. I looked at the documentation for read_csv() and didn't see anything that could be helpful, so you may want to look at other formats of writing the DataFrame which would preserve data types. Commented Oct 26, 2021 at 22:33
  • Does this answer your question? converting column names to integer with read_csv Commented Mar 3, 2022 at 12:31

2 Answers 2

4

My temporary solution is:

dfr.columns = dfr.columns.map(int)
Sign up to request clarification or add additional context in comments.

1 Comment

You could also (if the columns are known) overwrite the existing ones by specifying the names dfr = pd.read_csv("df.csv", index_col=[0, 1], header=0, names=[6, 7, 8]). However, converting to int after reading probably is the best solution.
0

A slight variation on @mcp's solution:

dfr = dfr.rename(columns=int)

This only works if all the column values are convertible to integers. Otherwise it raises an exception.

It also gets more complicated if your columns are a MultiIndex. I wrote a function that deals with mixed str/int columns and the MultiIndex case:

def change_strings_to_integers(names):
    new_names = []
    for name in names:
        try:
            new_name = int(name)
        except ValueError:
            new_name = name
        new_names.append(new_name)
    return new_names


def change_column_names_to_integers(frame, levels=None):
    """Try to convert any column names that look like integers 
    (e.g. '10') back to integers.
    """
    if levels is None:
        try:
            levels = frame.columns.levels
        except AttributeError:
            pass
        else:
            levels = range(len(levels))
    if levels is None:
        frame.columns = change_strings_to_integers(frame.columns)
    else:
        for level in levels:
            new_names = change_strings_to_integers(frame.columns.levels[level])
            frame.columns = frame.columns.set_levels(new_names, level=level)
    return frame

Example:

from io import StringIO

csv_text = """1,2,three\n1,0,0\n0,1,0\n0,0,1\n"""
df1 = pd.read_csv(StringIO(csv_text))
assert df1.columns.tolist() == ['1', '2', 'three']
csv_text = """4,5,6\n1,0,0\n0,1,0\n0,0,1\n"""
df2 = pd.read_csv(StringIO(csv_text))
assert df2.columns.tolist() == ['4', '5', '6']
df = pd.concat([df1, df2], keys=['100', '200'], axis=1)
print(df.columns.tolist())
print(change_column_names_to_integers(df).columns.tolist())

Output:

[('100', '1'), ('100', '2'), ('100', 'three'), ('200', '4'), ('200', '5'), ('200', '6')]
[(100, 1), (100, 2), (100, 'three'), (200, 4), (200, 5), (200, 6)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.