1

I used SQL to convert a social security number to MD5 hash. I am wondering if there is a module or function in python/pandas that can do the same thing.

My sql script is:

CREATE OR REPLACE FUNCTION MD5HASH(STR IN VARCHAR2) RETURN VARCHAR2 IS
  V_CHECKSUM VARCHAR2(32);

BEGIN
  V_CHECKSUM := LOWER(RAWTOHEX(UTL_RAW.CAST_TO_RAW(SYS.DBMS_OBFUSCATION_TOOLKIT.MD5(INPUT_ST    RING => STR))));
  RETURN V_CHECKSUM;
EXCEPTION
  WHEN NO_DATA_FOUND THEN
    NULL;
  WHEN OTHERS THEN
    RAISE;
END MD5HASH;

SELECT HRPRO.MD5HASH('555555555') FROM DUAL

thanks.

I apologize, now that I read back over my initial question it is quite confusing.

I have a data frame that contains the following headings:

df[['ssno','regions','occ_ser','ethnicity','veteran','age','age_category']][:10]

Where ssno is personal information that I would like to convert to an md5 hash number and then create a new column into the dataframe.

thanks... sorry for the confusion.

Right now I have to send my file to Oracle and then convert the ssn to hash and then export back out so that I can continue working with it in Pandas. I want to eliminate this step.

5
  • 3
    Have you tried googling python md5? Second result for me is: docs.python.org/2/library/hashlib.html Commented Jan 28, 2015 at 11:36
  • Isn't this simply hashlib.md5(ssn).hexdigest()? Though sha256 would be a better choice. Commented Jan 28, 2015 at 11:39
  • @timkofu: I think david wants to use MD5 to be compatible with his existing SQL code. But I could be totally wrong. :) In which case, SHA256 would be a better choice if he needs the extra security it affords. Commented Jan 28, 2015 at 11:45
  • @timkofu: thank you for the response. I do not want to use the SQL code. It is an extra step in my process that I would like to eliminate. Commented Jan 28, 2015 at 13:37
  • @david: select correct answer if your issue resolved Commented Jan 29, 2015 at 6:04

2 Answers 2

2

Using the standard hashlib module:

import hashlib

hash = hashlib.md5()
hash.update('555555555')
print hash.hexdigest()

output

3665a76e271ada5a75368b99f774e404

As mentioned in timkofu's comment, you can also do this more simply, using

print hashlib.md5('555555555').hexdigest()

The .update() method is useful when you want to generate a checksum in stages. Please see the hashlib documentation (or the Python 3 version) for further details.

Sign up to request clarification or add additional context in comments.

Comments

1

hashlib with md5 might be of your interest.

import hashlib
hashlib.md5("Nobody inspects the spammish repetition").hexdigest()

output:

bb649c83dd1ea5c9d9dec9a18df0ffe9

Constructors for hash algorithms that are always present in this module are md5(), sha1(), sha224(), sha256(), sha384(), and sha512().

If you want more condensed result, then you may try sha series

output for sha224:

'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'

For more details : hashlib

1 Comment

My point is: He explicitly asked for the solution for a md5 hash. Your answer is a little bit confusing because for some reason you focus on sha224.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.