1

I have the following code that is supposed to sort a dataframe called "coint_results"

results = np.array([combinations[pair][0], combinations[pair][1], a, b, egtest[0]])
coint_results = coint_results.append(pd.DataFrame(results).T)
coint_results = coint_results.reset_index().dropna().drop("index", axis=1)
coint_results.columns = ['stock_1', 'stock_2', 'a', 'b', 't-stat']
coint_results.sort_values(by=['t-stat'], ascending=True, inplace=True, ignore_index=True)
print(coint_results)

I thought it should worked well, but here the result

  stock_1 stock_2  ...                      b               t-stat
0    EXO.MI  TIT.MI  ...    0.03767853882823848   -10.03626667966398
1   BAMI.MI  TIT.MI  ...   0.002448683245146175  -10.054806761275424
2    STM.MI  TIT.MI  ...  -0.002184095752726706  -10.068964759967473
3    SRG.MI  TIT.MI  ...     0.4478998680510796  -10.115909021808724
4    SPM.MI  TIT.MI  ...    0.06460752028453558  -10.354583264537016
5   ENEL.MI  TIT.MI  ...    0.11502811550372381  -11.004160730546342
6    SRG.MI  UNI.MI  ...     11.243141736537956   -3.906435255815229
7     IP.MI  STM.MI  ...      -4.73690752582681  -3.9585359373191573
8   STLA.MI  STM.MI  ...      5.160916664191976   -4.121749682409465
9   BAMI.MI  STM.MI  ...     0.5317077958371442   -4.311274260473334
10   TIT.MI  UCG.MI  ...      6.132134120131706   -4.475906614947818
11   ISP.MI  STM.MI  ...     0.6185417904201413   -4.506308325127962
12   BZU.MI  UCG.MI  ...    -4.5712728593642025  -5.0446571567547585
13     G.MI  UCG.MI  ...     5.2911521693982175  -5.1304167685585265
14   SRG.MI  UCG.MI  ...    -17.988827453255993  -5.1784094787560635
15  BMED.MI  UCG.MI  ...      9.924937184752496   -5.406007568494866
16    IP.MI  UCG.MI  ...    -10.742575715380323    -5.54071721843284
17   STM.MI  UCG.MI  ...     2.3788486425865387   -5.620706678454393
18   ISP.MI  UCG.MI  ...      8.092421011705971   -6.016276738559553
19  STLA.MI  UCG.MI  ...       21.6916172567256   -6.176169724335034
20  BAMI.MI  UCG.MI  ...     1.5867053473179111   -6.749604800157616
21     G.MI  TIT.MI  ...     0.2427544870802295   -9.048408747110315
22    IP.MI  TIT.MI  ...    0.03859781538585183    -9.87594779908127
23   ISP.MI  TIT.MI  ...   0.017164281671812998   -9.887616220510456
24  BMED.MI  TIT.MI  ...   -0.04082879951061795     -9.8910162081357
25  STLA.MI  TIT.MI  ...    0.09583782780784517   -9.910261629446035
26    MB.MI  TIT.MI  ...     0.1551557504463611    -9.91316985348034
27   BZU.MI  TIT.MI  ...    0.03270215832874518   -9.944786934999167

Apparently Pandas sorts value in a descending order, but the first 6 values should go at the bottom of the dataframe. It doesn't make any sense to me.

Could someone help me to figure it out?

2
  • Unfortunately is not a str, it's an object... @Psidom Commented Jul 29, 2021 at 17:25
  • 1
    @HenryEcker Ahh, you're correct. Commented Jul 29, 2021 at 17:42

1 Answer 1

4

That is lexicographic ordering. For some reason, t-stat is of type object.

Convert to_numeric before sorting:

coint_results['t-stat'] = pd.to_numeric(coint_results['t-stat'])
coint_results.sort_values(by=['t-stat'], ascending=True,
                          inplace=True, ignore_index=True)

Default display behaviour will hide some values can set_option display precision to see more:

pd.set_option('display.precision', 15)
print(coint_results)

coint_results:

                t-stat
0  -11.004160730546342
1  -10.354583264537016
2  -10.115909021808724
3  -10.068964759967473
4  -10.054806761275424
5  -10.036266679663980
6   -9.944786934999167
7   -9.913169853480341
8   -9.910261629446035
9   -9.891016208135699
10  -9.887616220510456
11  -9.875947799081271
12  -9.048408747110315
13  -6.749604800157616
14  -6.176169724335034
15  -6.016276738559553
16  -5.620706678454393
17  -5.540717218432840
18  -5.406007568494866
19  -5.178409478756063
20  -5.130416768558526
21  -5.044657156754758
22  -4.506308325127962
23  -4.475906614947818
24  -4.311274260473334
25  -4.121749682409465
26  -3.958535937319157
27  -3.906435255815229

loc can be used to select indexes based on the sorted column:

coint_results = (
    coint_results.loc[
        pd.to_numeric(coint_results['t-stat'])
            .sort_values(ascending=True).index
    ].reset_index(drop=True)
)

Or, a temporary column can bed created to sort by, then drop from the DataFrame:

coint_results['sortby'] = pd.to_numeric(coint_results['t-stat'])
coint_results.sort_values(by='sortby', ascending=True, inplace=True,
                          ignore_index=True)
coint_results.drop('sortby', axis=1, inplace=True)
print(coint_results)

Both of these options create a separate Series to sort with so no change in the values in the t-stat column.

coint_results:

                 t-stat
0   -11.004160730546342
1   -10.354583264537016
2   -10.115909021808724
3   -10.068964759967473
4   -10.054806761275424
5    -10.03626667966398
6    -9.944786934999167
7     -9.91316985348034
8    -9.910261629446035
9      -9.8910162081357
10   -9.887616220510456
11    -9.87594779908127
12   -9.048408747110315
13   -6.749604800157616
14   -6.176169724335034
15   -6.016276738559553
16   -5.620706678454393
17    -5.54071721843284
18   -5.406007568494866
19  -5.1784094787560635
20  -5.1304167685585265
21  -5.0446571567547585
22   -4.506308325127962
23   -4.475906614947818
24   -4.311274260473334
25   -4.121749682409465
26  -3.9585359373191573
27   -3.906435255815229

DataFrame Constructor:

coint_results = pd.DataFrame({
    't-stat': ['-10.03626667966398', '-10.054806761275424',
               '-10.068964759967473', '-10.115909021808724',
               '-10.354583264537016', '-11.004160730546342',
               '-3.906435255815229', '-3.9585359373191573',
               '-4.121749682409465', '-4.311274260473334', '-4.475906614947818',
               '-4.506308325127962', '-5.0446571567547585',
               '-5.1304167685585265', '-5.1784094787560635',
               '-5.406007568494866', '-5.54071721843284', '-5.620706678454393',
               '-6.016276738559553', '-6.176169724335034', '-6.749604800157616',
               '-9.048408747110315', '-9.87594779908127', '-9.887616220510456',
               '-9.8910162081357', '-9.910261629446035', '-9.91316985348034',
               '-9.944786934999167']
})
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.