0

I have the following data frame.

          'a1'                'f1'             'a0'
0  [5261, 5247, 5246]  [526, 557, 5246]    [1, 32, 5261]
1   [521, 5547, 5246]             'NaN'    [61, 5247, 246]


[5261, 5247, 5246] join with [526, 557, 5246] and the resultant array does 
 not have duplicates.
 required ans:[5261,5247,5246,526,557].
 Same with the rest below.
[5261, 5247, 5246]  with 'NaN'
[521, 5547, 5246]  with [526, 557, 5246]
[521, 5547, 5246] with 'NaN'

These results need to be stored somewhere and the resultant result (counts to 4 nos) is repeated with 'a0' too.

I tried many ways but doesn't work out. Any help is appreciated.

Thanks, Sonia

4
  • It looks like you have lists in the entries? If you provide a piece of code to construct the dataframe it might make it easier for people to help. Commented Mar 29, 2020 at 12:11
  • {'a1': [['5261', '5247', '5246'], ['521', '5547', '5246']], 'f1': [['526', '557', '5246']], 'a0': [['1', '32', '5261'], ['61', '5247', '246']]} df=pd.DataFrame() for x in result1: df[x]=pd.Series((result1[x])) Commented Mar 29, 2020 at 12:17
  • You can edit the question too. Commented Mar 29, 2020 at 12:18
  • this is how dataframe was created Commented Mar 29, 2020 at 12:18

1 Answer 1

1

I would try to get it is tidy format (some term, look it up I think R people invented the term).

    In [58]: s = pd.Series({'a1': [['5261', '5247', '5246'], ['521', '5547', '5246']], 'f1': [['526', '557', '5246']], 'a0': [['1', '32', '26'], ['61', '47', '246']]})                                                

    In [59]: s                                                                                                                                                                                                         
    Out[59]: 
    a1    [[5261, 5247, 5246], [521, 5547, 5246]]
    f1                         [[526, 557, 5246]]
    a0               [[1, 32, 26], [61, 47, 246]]
    dtype: object

    In [60]: s.exp                                                                                                                                                                                                     
    s.expanding s.explode   
    In [60]: s.explode()                                                                                                                                                                                               
    Out[60]: 
    a1    [5261, 5247, 5246]
    a1     [521, 5547, 5246]
    f1      [526, 557, 5246]
    a0           [1, 32, 26]
    a0         [61, 47, 246]
    dtype: object

    In [61]: s.explode().explode()                                                                                                                                                                                     
    Out[61]: 
    a1    5261
    a1    5247
    a1    5246
    a1     521
    a1    5547
    a1    5246
    f1     526
    f1     557
    f1    5246
    a0       1
    a0      32
    a0      26
    a0      61
    a0      47
    a0     246
    dtype: object

    In [62]: s.index                                                                                                                                                                                                   
    Out[62]: Index(['a1', 'f1', 'a0'], dtype='object')

    In [63]: s.values                                                                                                                                                                                                  
    Out[63]: array([list([['5261', '5247', '5246'], ['521', '5547', '5246']]), list([['526', '557', '5246']]), list([['1', '32', '26'], ['61', '47', '246']])], dtype=object)

In [68]: d = s.explode().explode()                                                                                                                                                                                 

In [69]: d = d.reset_index()                                                                                                                                                                                       

In [70]: d                                                                                                                                                                                                         
Out[70]: 
   index     0
0     a1  5261
1     a1  5247
2     a1  5246
3     a1   521
4     a1  5547
5     a1  5246
6     f1   526
7     f1   557
8     f1  5246
9     a0     1
10    a0    32
11    a0    26
12    a0    61
13    a0    47
14    a0   246

In [71]: d.columns = ['A', 'B'] # whatever                                                                                                                                                                         

In [72]: d.to_parquet('here.parquet')   
Sign up to request clarification or add additional context in comments.

6 Comments

could you please explain the logic behind? I did not totally understand.
Have a look at the doc for explode. Rune it is the REPL (IPython) and see the results, use <dot><tab> completion in the REPL on python objects to see command suggestions. I would just try to not get the data in that format if you can. But explode just unrolls list elements into rows of a Series. With pandas and the REPL you often don't need to think too hard, just mash away in the REPL until you get roughtly what you want and then maybe think through at that point about if the code makes sense. It's kind of a thinking tool.
It didn't totally work out. Any help is appreciated.
there is no function like d.columns and d.to_parquet.
Mash the keyboard in Ipython REPL, mash the keyboard in google. Head to the docs. This is definitely something where you learn by trying 100 things quickly not by staring and thinking. Not trying to be mean but I see a lof people starting off with this stuff thinking they need to think a lot. Just read and type and try and you will get there faster. But make sure you are in the REPL. <dot><TAB> is your friend. pandas.pydata.org/pandas-docs/stable/reference/api/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.