0

I am a python newbie so please excuse this basic question. My .xlsx File looks like this

Unnamend:1    A     Unnamend:2    B
2015-01-01    10    2015-01-01    10
2015-01-02    20    2015-01-01    20
2015-01-03    30    NaT           NaN

When I read it in Python using pandas.read_excel(...) pandas automatically uses the first column as the time index.

Is there a one-liner that tells pandas to notice, that every second column is a time index belonging to the time series right next to it?

The desired output would look like this:

date          A     B
2015-01-01    10    10
2015-01-02    20    20
2015-01-03    30    NaN

2 Answers 2

1

In order to parse chunks of adjacent columns and align on their respective datetime indexes, you can do the following:

Starting with df:

Int64Index: 3 entries, 0 to 2
Data columns (total 4 columns):
Unnamed: 0    3 non-null datetime64[ns]
A             3 non-null int64
Unnamed: 1    2 non-null datetime64[ns]
B             2 non-null float64
dtypes: datetime64[ns](2), float64(1), int64(1)

You could iterate over chunks of 2 columns and merge on index like so:

def chunks(l, n):
    """ Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

merged = df.loc[:, list(df)[:2]].set_index(list(df)[0])
for cols in chunks(list(df)[2:], 2):
    merged = merged.merge(df.loc[:, cols].set_index(cols[0]).dropna(), left_index=True, right_index=True, how='outer')

to get:

             A   B
2015-01-01  10  10
2015-01-01  10  20
2015-01-02  20 NaN
2015-01-03  30 NaN

pd.concat unfortunately doesn't work as it can't handle duplicate index entries, otherwise one could use a list comprehension:

pd.concat([df.loc[:, cols].set_index(cols[0]) for cols in chunks(list(df), 2)], axis=1)
Sign up to request clarification or add additional context in comments.

4 Comments

Hi Stefan. Assume that in my example Series A and B switch the index, such that B is now the longest series. Wound´t that lead to missing index values (i.e. missing "2015-01-03") if I choose index_col=0 by default?
Indeed. One would also think you'd like to merge the columns on the dates. If that's not necessary, we can add a step where the longest column becomes the index. If you want to merge instead we'd have of course to take a different approach.
Whats the exact difference between merging and taking the longest index aligning all other series on that index? I come from R and here the magic word indeed is "merge" or cbind...
Difference is alignment - do you want the values to align on dates or just maintain the row order?
0

I use xlrd for import the data, after i use pandas to display

import xlrd
import pandas as pd
workbook = xlrd.open_workbook(xls_name)
workbook = xlrd.open_workbook(xls_name, encoding_override="cp1252")
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
    first_row.append( worksheet.cell_value(0,col) )
data =[]
for row in range(10, worksheet.nrows):
    elm = {}
    for col in range(worksheet.ncols):
          elm[first_row[col]]=worksheet.cell_value(row,col)
    data.append(elm)

first_column=second_column=third_column=[]
for elm in data :
    first_column.append(elm(first_row[0]))
    second_column.append(elm(first_row[1]))
    third_column.append(elm(first_row[2]))

dict1={}
dict1[first_row[0]]=first_column
dict1[first_row[1]]=second_column
dict1[first_row[2]]=third_column
res=pd.DataFrame(dict1, columns=['column1', 'column2', 'column3'])
print res

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.