0

I'm trying to compare two excels, one is the user matrix, the other one is I generated from a host. I want to know if the user settings are correct as of the matrix.

the results I got the from the host, I imported to pandas: the user groups here is as column names!

    Name Users  Domain Admins     Administrators   Schema Admins 
0   xxx   NaN             Yes                Yes             NaN                                  

the problem is :

the excel matrix is like

user:         groups
xxx           administrators
              schema admins
              domain admins

here is what I have tried:

I will replace all the Yes with the columns name:

for i in df.columns:
df[i].replace('Yes',i,inplace=True)

remove the null from it.

group=df.dropna(axis='columns',how='all')

now it's like this:

  Name Users  Domain  Admins     Administrators  Schema Admins 

   0     xxx   Domain admins    Administrators  Schema Admins 

the other one like:

User Account Name    Group
0    xxx             Domain Admins, Local admin,Administrators

I don't know what to do next. please guide me how to compare the index values in a loop for all the indexs.

the original two excel like this:

user:         groups
xxx           administrators
              schema admins
              domain admins

yyy           administrators
              domain admins

zzz           administrators
              schema admins

the other file like:

username   administrators   schema admins  domain admins
xxx               yes            yes            NaN
yyy               yes            NaN            yes
3
  • Your excel matrix , does it have blanks for the user row when it is associated with multiple groups or it is repeated multiple times ? Commented May 17, 2019 at 9:50
  • No, After I imported the matrix into the pandas , actually the record has \n in between.Domain Admins\nLocal admin\nAdministrators. Commented May 17, 2019 at 10:07
  • Answer given let me know if it works Commented May 17, 2019 at 10:14

3 Answers 3

1

I would let the pandas imported from the host (let us call it df_host) unchanged, and create columns for groups in the pandas imported from the matrix (called df_matrix):

groups = ['Users', 'Domain Admins', 'Administrators', 'Schema Admins']

for g in groups:
    df_matrix[g] = df_matrix.Group.str.contains(g)

Next I would use the user name as index in both dataframes:

df_matrix.set_index('Account Name', inplace=True)
df_host.set_index('Name', inplace=True)

You can now easily join the dataframes:

df_comp = df_matrix.join(df_host, how='outer', lsuffix='_matrix', rsuffix='_host')

You finally should have a single dataframe with one row per user, and one column for groups seen from the host and seen from the excel matrix, which should allow easy comparisons.

Sign up to request clarification or add additional context in comments.

Comments

0

This is how it can be done:

Step 1: Transform host df

cols = ['administrators', 'schema admins', 'domain admins']
df1['merged'] = df1[cols].apply(lambda x: ', '.join(x[x.notnull()]), axis = 1) ##df1 is host df 

Step 2: Transform your matrix df

df.user = df.user.ffill()  ## Fill the empty rows with same user name
grouped_df = df.groupby("user")['groups'].apply(','.join).reset_index() ## merge same user name to 1 row

Step 3: Comparing both df

result_df = pd.merge(df1, grouped_df, how='inner', left_on="merged", right_on="user") ## The left_on/right_on will change according to the column name you have

7 Comments

Hi Rahul, it doesn't work. maybe my question is not clear. let me explain again. ``` user: groups xxx administrators schema admins domain admins yyy administrators domain admins zzz administrators schema admins ``` the other file like: ```` username administrators schema admins domain admins xxx yes yes NaN yyy yes NaN yes
You need to check the sample df which I created in my answer and what you have..are those look similar or different. If it is different, kindly put a image of how you df look in pandas!1
it's different, I've updated the question, I put the original excel at the end.
Changed according to your new excel format..you can upvote the answer if you found useful
can you run it?
|
0

You can add data to a dictionary to make things easier. If following is data file:

user:         groups
xxx           administrators
              schema admins
              domain admins
user:         groups
yyy           administrators
              domain admins
user:         groups
zzz           administrators
              schema admins

Following code will create a dictionary:

with open('userdata.txt', 'r') as f:
    # read data file and split into lines; also trim lines; 
    datalist = list(map(lambda x: x.strip(), f.readlines())) 
    userdict = {}                               # dictionary to collect data; 
    username=""; grplist = []; newuser = True   # variable to read data from file: 
    for line in datalist: 
        if line.startswith('user:'):
            if not(username=="" and len(grplist)==0):   # omit at first run
                userdict[username] = grplist            # put user data into dictionary
                username=""; grplist=[]; newuser=True       # clear variable for new user; 
        elif newuser:
            username, grpname = list(map(lambda x: x.strip(), line.split()))
            grplist.append(grpname)     # append group name to temporary list
            newuser = False
        else: 
            grplist.append(line)        # append more groups; 

userdict[username] = grplist
print(userdict)

Output:

{'yyy': ['administrators', 'domain admins'], 'zzz': ['administrators', 'schema admins'], 'xxx': ['administrators', 'schema admins', 'domain admins']}

If data in second file is as follows:

  Account Name                               Group
          xxx  administrators , schema admins, domain admins
          yyy  administrators , domain admins
          zzz  administrators , schema admins

Following code will get dictionary from it:

with open('userdata2.txt', 'r') as f:
    # read data file and split into lines; also trim lines; 
    datalines = list(map(lambda x: x.strip(), f.readlines())) 
    userdict2={}
    for line in datalines[1:]:  # omit first line which is only header
        infolist = list(map(lambda x: x.strip(), line.split(" ",1)))
        username = infolist[0].strip()
        grplist = list(map(lambda x: x.strip(), infolist[1].split(",")))
        userdict2[username] = grplist

print(userdict2)

Output:

{'zzz': ['administrators', 'schema admins'], 'xxx': ['administrators', 'schema admins', 'domain admins'], 'yyy': ['administrators', 'domain admins']}

To compare 2 dictionaries, just use ==:

print(userdict == userdict2)

Output:

True

To compare groups of a particular user:

print(userdict['xxx'] == userdict1['xxx'])

Output:

True

4 Comments

yes, it's a very good idea to use the dictionary. I think this will be the right way, I've updated the excel at the end. I want to know if user xxx 's group matches in both excel.
To compare groups of a particular user, one can use print(userdict['xxx'] == userdict1['xxx']). I have added this in my answer above. You should also upvote/accept answer that is helpful to you.
hi, the second file is different, please see my update.
You should accept this answer and post another question with new comparison. Otherwise answers will become very long.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.