2

I have dataframe as follows:

        A   B   
1   2   3   4   5
4   5   6   7   8

I am trying to fetch data from this dataframe in following ways:

print (file_dataframe.columns)

Index(['A', 'B', 'Unnamed: 2'], dtype='object')

file_dataframe_values = [cell for column in file_dataframe.columns for cell in file_dataframe[column].values.tolist()]
print (file_dataframe_values )

['3', '6', '4', '7', '5', '8']

Why it is starting dataframe from first values in first row?

When I am using following dataframe:

    A
1   2   3   4   5
4   5   6   7   8


print (file_dataframe.columns)

Index(['A', 'Unnamed: 1', 'Unnamed: 2','Unnamed: 3'], dtype='object')

file_dataframe_values = [cell for column in file_dataframe.columns for cell in file_dataframe[column].values.tolist()]
print (file_dataframe_values )

['2','5','3', '6', '4', '7', '5', '8']

When I am using following data frame as first row is empty:

1   2   3   4   5
4   5   6   7   8


print (file_dataframe.columns)

Index(['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2','Unnamed: 3','Unnamed: 4'], dtype='object')

file_dataframe_values = [cell for column in file_dataframe.columns for cell in file_dataframe[column].values.tolist()]
print (file_dataframe_values )

['1','4','2','5','3', '6', '4', '7', '5', '8']

Can anyone please explain this behavior?

1
  • 1
    I think python interprets unknown column(s) prior to the first named one as (multi) index in case you don't specify 'header = 1'. In contrast, If all are unnamed, python assigns 'unnamed' to all since a dataframe can't be made of index columns only. Commented Jun 21, 2018 at 8:24

1 Answer 1

1

Don't judge a dataframe by print

In the first instance, you have a dataframe with a MultiIndex:

df = pd.DataFrame([[3, 4, 5], [6, 7, 8]],
                  columns=['A', 'B', ''],
                  index=pd.MultiIndex.from_tuples([(1, 2), (4, 5)]))

print(df)

     A  B   
1 2  3  4  5
4 5  6  7  8

In the second instance, you have a dataframe with a regular Index:

df = pd.DataFrame([[2, 3, 4, 5], [5, 6, 7, 8]],
                  columns=['A', '', '', ''],
                  index=[1, 4])

print(df)

   A         
1  2  3  4  5
4  5  6  7  8

When you extract columns, index and values from each dataframe, you will have different results. This shouldn't be surprising. However, it does require you to learn about Pandas indexing, which is a useful exercise in any case. The following sections of the official docs may be helpful:

Unfortunately, there's no shortcut. This is purely API-specific logic.

Sign up to request clarification or add additional context in comments.

3 Comments

I don't have data index actually it's excel sheet data, it's considering it as an index. can I do something as it will not consider it as index.
@PiyushS.Wanare, Use df = df.reset_index() to elevant indices to columns.
@jpg, I got it working by mentioning header=None while reading excel.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.