python pandas data comparing

Question

I'm trying to compare two excels, one is the user matrix, the other one is I generated from a host. I want to know if the user settings are correct as of the matrix.

the results I got the from the host, I imported to pandas: the user groups here is as column names!

    Name Users  Domain Admins     Administrators   Schema Admins 
0   xxx   NaN             Yes                Yes             NaN

the problem is :

the excel matrix is like

user:         groups
xxx           administrators
              schema admins
              domain admins

here is what I have tried:

I will replace all the Yes with the columns name:

for i in df.columns:
df[i].replace('Yes',i,inplace=True)

remove the null from it.

group=df.dropna(axis='columns',how='all')

now it's like this:

  Name Users  Domain  Admins     Administrators  Schema Admins 

   0     xxx   Domain admins    Administrators  Schema Admins

the other one like:

User Account Name    Group
0    xxx             Domain Admins, Local admin,Administrators

I don't know what to do next. please guide me how to compare the index values in a loop for all the indexs.

the original two excel like this:

user:         groups
xxx           administrators
              schema admins
              domain admins

yyy           administrators
              domain admins

zzz           administrators
              schema admins

the other file like:

username   administrators   schema admins  domain admins
xxx               yes            yes            NaN
yyy               yes            NaN            yes

Your excel matrix , does it have blanks for the user row when it is associated with multiple groups or it is repeated multiple times ? — Rahul Agarwal
– Rahul Agarwal, Commented May 17, 2019 at 9:50
No, After I imported the matrix into the pandas , actually the record has \n in between.Domain Admins\nLocal admin\nAdministrators. — LLY
– LLY, Commented May 17, 2019 at 10:07

Serge Ballesta · Accepted Answer · 2019-05-17 10:37:43Z

1

I would let the pandas imported from the host (let us call it df_host) unchanged, and create columns for groups in the pandas imported from the matrix (called df_matrix):

groups = ['Users', 'Domain Admins', 'Administrators', 'Schema Admins']

for g in groups:
    df_matrix[g] = df_matrix.Group.str.contains(g)

Next I would use the user name as index in both dataframes:

df_matrix.set_index('Account Name', inplace=True)
df_host.set_index('Name', inplace=True)

You can now easily join the dataframes:

df_comp = df_matrix.join(df_host, how='outer', lsuffix='_matrix', rsuffix='_host')

You finally should have a single dataframe with one row per user, and one column for groups seen from the host and seen from the excel matrix, which should allow easy comparisons.

answered May 17, 2019 at 10:37

Serge Ballesta

150k13 gold badges137 silver badges267 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Rahul Agarwal · Accepted Answer · 2019-05-17 11:40:06Z

0

This is how it can be done:

Step 1: Transform host df

cols = ['administrators', 'schema admins', 'domain admins']
df1['merged'] = df1[cols].apply(lambda x: ', '.join(x[x.notnull()]), axis = 1) ##df1 is host df

Step 2: Transform your matrix df

df.user = df.user.ffill()  ## Fill the empty rows with same user name
grouped_df = df.groupby("user")['groups'].apply(','.join).reset_index() ## merge same user name to 1 row

Step 3: Comparing both df

result_df = pd.merge(df1, grouped_df, how='inner', left_on="merged", right_on="user") ## The left_on/right_on will change according to the column name you have

edited May 17, 2019 at 11:40

answered May 17, 2019 at 10:10

Rahul Agarwal

4,1168 gold badges33 silver badges56 bronze badges

7 Comments

LLY Over a year ago

Hi Rahul, it doesn't work. maybe my question is not clear. let me explain again. ``` user: groups xxx administrators schema admins domain admins yyy administrators domain admins zzz administrators schema admins ``` the other file like: ```` username administrators schema admins domain admins xxx yes yes NaN yyy yes NaN yes

Rahul Agarwal Over a year ago

You need to check the sample df which I created in my answer and what you have..are those look similar or different. If it is different, kindly put a image of how you df look in pandas!1

LLY Over a year ago

it's different, I've updated the question, I put the original excel at the end.

Rahul Agarwal Over a year ago

Changed according to your new excel format..you can upvote the answer if you found useful

LLY Over a year ago

can you run it?

|

rnso · Accepted Answer · 2019-05-17 11:18:32Z

0

You can add data to a dictionary to make things easier. If following is data file:

user:         groups
xxx           administrators
              schema admins
              domain admins
user:         groups
yyy           administrators
              domain admins
user:         groups
zzz           administrators
              schema admins

Following code will create a dictionary:

with open('userdata.txt', 'r') as f:
    # read data file and split into lines; also trim lines; 
    datalist = list(map(lambda x: x.strip(), f.readlines())) 
    userdict = {}                               # dictionary to collect data; 
    username=""; grplist = []; newuser = True   # variable to read data from file: 
    for line in datalist: 
        if line.startswith('user:'):
            if not(username=="" and len(grplist)==0):   # omit at first run
                userdict[username] = grplist            # put user data into dictionary
                username=""; grplist=[]; newuser=True       # clear variable for new user; 
        elif newuser:
            username, grpname = list(map(lambda x: x.strip(), line.split()))
            grplist.append(grpname)     # append group name to temporary list
            newuser = False
        else: 
            grplist.append(line)        # append more groups; 

userdict[username] = grplist
print(userdict)

Output:

{'yyy': ['administrators', 'domain admins'], 'zzz': ['administrators', 'schema admins'], 'xxx': ['administrators', 'schema admins', 'domain admins']}

If data in second file is as follows:

  Account Name                               Group
          xxx  administrators , schema admins, domain admins
          yyy  administrators , domain admins
          zzz  administrators , schema admins

Following code will get dictionary from it:

with open('userdata2.txt', 'r') as f:
    # read data file and split into lines; also trim lines; 
    datalines = list(map(lambda x: x.strip(), f.readlines())) 
    userdict2={}
    for line in datalines[1:]:  # omit first line which is only header
        infolist = list(map(lambda x: x.strip(), line.split(" ",1)))
        username = infolist[0].strip()
        grplist = list(map(lambda x: x.strip(), infolist[1].split(",")))
        userdict2[username] = grplist

print(userdict2)

Output:

{'zzz': ['administrators', 'schema admins'], 'xxx': ['administrators', 'schema admins', 'domain admins'], 'yyy': ['administrators', 'domain admins']}

To compare 2 dictionaries, just use ==:

print(userdict == userdict2)

Output:

True

To compare groups of a particular user:

print(userdict['xxx'] == userdict1['xxx'])

Output:

True

edited May 17, 2019 at 11:18

answered May 17, 2019 at 10:13

rnso

24.7k26 gold badges127 silver badges270 bronze badges

4 Comments

LLY Over a year ago

yes, it's a very good idea to use the dictionary. I think this will be the right way, I've updated the excel at the end. I want to know if user xxx 's group matches in both excel.

rnso Over a year ago

To compare groups of a particular user, one can use print(userdict['xxx'] == userdict1['xxx']). I have added this in my answer above. You should also upvote/accept answer that is helpful to you.

LLY Over a year ago

hi, the second file is different, please see my update.

rnso Over a year ago

You should accept this answer and post another question with new comparison. Otherwise answers will become very long.

Collectives™ on Stack Overflow

python pandas data comparing

3 Answers 3

Comments

7 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

7 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related