0

I have a function returns series.index and series.values, how to write the returned results to a dataframe ?

Generate random data

import string
import random
import pandas as pd

text = []
i = 0
while i < 20:
    text.extend(random.choice(string.ascii_letters[:4]))
    i += 1

boolean = ['True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False']
bool1 = random.sample(boolean, 20)
bool2 = random.sample(boolean, 20)
bool3 = random.sample(boolean, 20)
bool4 = random.sample(boolean, 20)

d = {'c1':text, 'c2':bool1, 'c3':bool2, 'c4':bool3, 'y':bool4}
dd = pd.DataFrame(data=d)

dd.head(2)

    c1  c2  c3  c4  y
0   b   False   False   False   True
1   a   True    True    False   True

The function

def relative_frequency(df, col):
    series = df.groupby(col)['y'].value_counts(normalize=True)
    true_cnt = series.xs('True', level=1)  # a series with single layer index
    max_index = true_cnt.index[true_cnt.argmax()]
    max_val = true_cnt[max_index]
    true_cnt_dropped = true_cnt.drop(max_index)
    ans = max_val / true_cnt_dropped
    ans.index = [(col + ' ' + max_index + '/' + idx) for idx in ans.index]
    return ans.index, ans.values

Run the function

for i in dd.columns[:-1]:
    print(relative_frequency(dd, i))

It returns

(Index(['c1 c/a', 'c1 c/b', 'c1 c/d'], dtype='object'), array([1.8 , 1.05, 1.2 ]))
(Index(['c2 False/True'], dtype='object'), array([1.5]))
(Index(['c3 True/False'], dtype='object'), array([2.33333333]))
(Index(['c4 False/True'], dtype='object'), array([1.5]))

I would like to build a dataframe like this

enter image description here

1 Answer 1

1

In the last part (where you run the function) do this instead -

  1. Converts the output of the function into a Dataframe
  2. df.T Transposes it (swaps rows and cols)
  3. dfs.append() appends it to an empty list called dfs
  4. df.concat combines them vertically as rows
  5. Columns names are added
dfs = []

for i in dd.columns[:-1]:
    dfs.append(pd.DataFrame(relative_frequency(dd, i)).T)
    
result = pd.concat(dfs)
result.columns = ['features', 'relative_freq']
result

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.