How to create a certain dataframe based on an Excel-sheet?

Question

I am working with google or tools and in one of the examples a data structure is given. I would like to import this data structure based on an Excel sheet.

This is the given data structure:

jobs = [[[(3, 0), (1, 1), (5, 2)], 
         [(2, 0), (4, 1), (6, 2)],
         [(2, 0), (3, 1), (1, 2)]],
        [[(2, 0), (3, 1), (4, 2)], 
         [(1, 0), (5, 1), (4, 2)],
         [(2, 0), (1, 1), (4, 2)]], 
        [[(2, 0), (1, 1), (4, 2)],
         [(2, 0), (3, 1), (4, 2)],
         [(3, 0), (1, 1), (5, 2)]]]

What I like to do is to import jobs based on an Excel sheet with data given as:

Job Task    M1  M2  M3
1    1      3   1   5
1    2      2   4   6
1    3      2   3   1
2    1      2   3   4
2    2      2   5   4
2    3      2   1   4
3    1      2   3   4
3    2      3   1   5

What have you tried so far? Did you consider pandas.read_excel? — luca.vercelli
– luca.vercelli, Commented Apr 1, 2019 at 13:45
Yes I tried pandas.read_excel (which works fine). But I am not sure how to import it in order to receive the mentioned data type (list with tuples?) — StefanOverFlow
– StefanOverFlow, Commented Apr 1, 2019 at 14:07

luca.vercelli · Accepted Answer · 2019-04-02 12:15:46Z

1

You should reorganize all your data, grouping by Job. For example:

import pandas as pd

df = pd.read_excel('bb.xlsx')

jobs = set(df['Job'])      #remove duplicates
result = [[[ (df['M1'][i],0), (df['M2'][i],1), (df['M3'][i],2) ] for i in df.index if df['Job'][i] == job] for job in jobs]
print(result)

WARNING: the result is not exactly what you wrote. I think you mispelled some data. Tell me if I am wrong.

edited Apr 2, 2019 at 12:15

answered Apr 1, 2019 at 14:52

luca.vercelli

1,1389 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

StefanOverFlow Over a year ago

Yes you are right. M1 of Task2 at Job2 should be '1' instead of '2'. I tried to run your code but unfortunately I receive a syntax error

print [[[ (df['M1'][i],0), (df['M2'][i],1), (df['M3'][i],2) ] for i in df.in  dex if df['Job'][i] == job] for job in jobs]                                ^ SyntaxError: invalid syntax

luca.vercelli Over a year ago

are you using python 2 or 3? with version 3, the syntax of print is different, print (...)

StefanOverFlow Over a year ago

Using print ([[ (df['M1'][i],0), (df['M2'][i],1), (df['M3'][i],2) ] for i in df.index if df['Job'][i] == job] for i in jobs) returns <generator object <genexpr> at 0x00000000094E4750>. How can I receive the data structure as seen in in the initial question? The data structure is being used within a couple of loops for example.

luca.vercelli Over a year ago

I have edited my answer to work both in Python 2 and 3. You missed some parenthesis.

luca.vercelli · Accepted Answer · 2019-04-11 09:58:12Z

0

I give another answer, using pandas' groupby API:

import pandas as pd

df = pd.read_excel('bb.xlsx')

result = [[[ (row['M1'],0), (row['M2'],1), (row['M3'],2) ] for idx, row in grpdf.iterrows()] for grpname, grpdf in df.groupby('Job')]
print(result)

answered Apr 11, 2019 at 9:58

luca.vercelli

1,1389 silver badges31 bronze badges

Collectives™ on Stack Overflow

How to create a certain dataframe based on an Excel-sheet?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related