dynamically create string from three data frames

Question

Dynamically create string from pandas column

I have three data frame like below one is df and another one is anomalies:-

d = {'10028': [0], '1058': [25], '20120': [29], '20121': [22],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
    
    df1 = pd.DataFrame(data=d)

Basically anomalies in a mirror copy of df just in anomalies the value will be 0 or 1 which indicates anomalies where value is 1 and non-anomaly where value is 0

d = {'10028': [0], '1058': [1], '20120': [1], '20121': [0],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}

df2 = pd.DataFrame(data=d)

And a third data frame like below:-

d = {'10028': ['US,IN'], '1058': ['NA, JO, US'], '20120': [''], '20121': ['US,PK'],'20122': ['IN'], '20123': ['Us,LN'], '5043': ['AI,AL'], '5046': ['AA,AB']}

df3 = pd.DataFrame(data=d)

and I am converting that into a specific format with the below code:-

details = (
        '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' + '\t' + 'Country' 
        '\n' + '10028:' + '\t'+ '\t' + str(df1.tail(1)['10028'][0]) + '\t' + str(df2['10028'][0]) + '\t'+ str(df3['10028'][0]) + 
        '\n' + '1058:' + '\t' + '\t' + str(df1.tail(1)['1058'][0]) + '\t' + str(df2['1058'][0]) + '\t'+ str(df3['1058'][0]) +
        '\n' + '20120:' + '\t' +'\t' + str(df1.tail(1)['20120'][0]) + '\t' + str(df2['20120'][0]) + '\t'+ str(df3['20120'][0]) +
        '\n' + '20121:' + '\t' + '\t' +str(round(df1.tail(1)['20121'][0], 2)) + '\t' + str(df2['20121'][0]) + '\t'+ str(df3['20121'][0]) +
        '\n' + '20122:' + '\t' + '\t' +str(round(df1.tail(1)['20122'][0], 2)) + '\t' + str(df2['20122'][0]) + '\t'+str(df3['20122'][0]) +
        '\n' + '20123:' + '\t' + '\t' +str(round(df1.tail(1)['20123'][0], 3)) + '\t' + str(df2['20123'][0]) + '\t'+str(df3['20123'][0]) +
        '\n' + '5043:' + '\t' + '\t' +str(round(df1.tail(1)['5043'][0], 3)) + '\t' + str(df2['5043'][0]) + '\t'+str(df3['5043'][0]) +
        '\n' + '5046:' + '\t' + '\t' +str(round(df1.tail(1)['5046'][0], 3)) + '\t' + str(df2['5046'][0]) + '\t'+str(df3['5046'][0]) +
        '\n\n' + 'message:' + '\t' +
        'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
            )

The problem is the column values are changing always in every run I mean like in this run its '10028', '1058', '20120', '20121', '20122', '20123', '5043', '5046' but maybe in next run it will be '10029', '1038', '20121', '20122', '20123', '5083', '5946'

How I can create the details dynamically depending on what columns are present in the data frame as I don't want to hard code and in the message I want to pass the name of columns whose value is 1.

The value of columns will always be either 1 or 0 for df1 and df2 and for df3 either a list or blank.

Expected Output:-

For two data frames I got a working solution which is below :-

# first part of the string
s = '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' 

# dynamically add the data
for idx, val in df1.iloc[-1].iteritems():
    s += f'\n{idx}\t{val}\t{df2[idx][0]}' 
# last part
s += ('\n\n' + 'message:' + '\t' +
      'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
     )

and if the matching value is not present then print null

anurag · Accepted Answer · 2021-02-02 08:06:15Z

1

To obtain the expected result, you can do the following (the input data must be the dictionaries as shown in question, if not, please provide the real input data):

import pandas as pd

final_d = []
d = {'10028': 0, '1058': 25, '20120': 29, '20121': 22,'20122': 0, '20123': 0, '5043': 0, '5046': 0}
final_d.append(d)

d = {'10028': 0, '1058': 1, '20120': 1, '20121': 0,'20122': 0, '20123': 0, '5043': 0, '5046': 0, '91111':0}
final_d.append(d)

d = {'10028': ['US','IN'], '1058': ['NA', 'JO', 'US'], '20120': [''], '20121': ['US','PK'],'20122': ['IN'], '20123': ['Us','LN'], '5043': ['AI','AL'], '5046': ['AA','AB'], '00000':['kk','dd','ee']}
final_d.append(d)

# Now, we will merge the dictionaries on key
data = {}
for i, dt in enumerate(final_d):
    for k,v in dt.items():
        if k in data:
            if type(v)==list:
                data[k][i] = ','.join(v)
            else:
                data[k][i] = v
        else:
            data[k] = ['']*len(final_d)
            if type(v)==list:
                data[k][i] = ','.join(v)
            else:
                data[k][i] = v
maxlen = max([len(v) for v in data.values()])
data = {k:v if len(v)==maxlen else v+['']*(maxlen-len(v)) for k,v in data.items()}

# Creating the base dataframe
df = pd.DataFrame.from_dict(data)

# Converting the column headers (metric names) into a row in the dataframe
df = pd.concat([pd.DataFrame.from_dict({k:[v] for k,v in zip(df.columns.tolist(), df.columns.tolist())}), df], ignore_index=True)

# removing column names
df.columns = [''] * len(df.columns)

# organising the dataframe according to your required output
result = df.T.reset_index(drop=True)

# Adding the column names as required
result.columns = ['Metric Name', 'Count', 'Anomaly', 'Country']

# Voila!
print(result.to_string(index=False))

The generated dataframe:

Metric Name Count Anomaly   Country
      10028     0       0     US,IN
       1058    25       1  NA,JO,US
      20120    29       1          
      20121    22       0     US,PK
      20122     0       0        IN
      20123     0       0     Us,LN
       5043     0       0     AI,AL
       5046     0       0     AA,AB
      91111             0          
      00000                kk,dd,ee

edited Feb 2, 2021 at 8:06

answered Feb 2, 2021 at 6:44

anurag

1,9801 gold badge13 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

abhi Over a year ago

Hey, Anurag thank you for your answer. but how we will handle the key error. Suppose there is one extra column 11111 in df1 then how we will handle that.

anurag Over a year ago

That would be an easy fix, let me update the answer

anurag Over a year ago

Finalized from my end! Have a look!

anurag Over a year ago

Kindly upvote/accept iff this answers your question!

abhi Over a year ago

it's giving me an error! like you did that out of the dictionary and I am doing this with df columns.TypeError: 'int' object is not subscriptable

|

Collectives™ on Stack Overflow

dynamically create string from three data frames

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related