How can I apply hash algorithm instead of for loops to reduce time complexity in python?

Question

This code compare two files and when the find the same lines, it'll write the line another text file as output.

I guess its time complexity is O(n^2). It takes too much time when the increase the lines.

I think that using Hash could be more effective.

How can I apply for the following code?

Thanks.

fin = open('x.csv')
file1 = open("y.txt","r")
file_output = open("z.txt","w")

lines = file1.readlines()
a = []
for line in lines:
     a.append(line.split("\n")[0])




for line in fin:
    id=line.split(',')[0]
    for w in a:
        if w==id:
           file_output.write(line)

Would you mind explaining what your code is supposed to do, apart from "This code find the lines"? — timgeb
– timgeb, Commented Jan 1, 2018 at 22:36
Explanation was added and "found" part was unnecessary and deleted. — Mr.Hyde
– Mr.Hyde, Commented Jan 1, 2018 at 22:49

algrid · Accepted Answer · 2018-01-01 22:46:19Z

1

Make a set out of a and then to check for presence of id in a you won't need a loop, you'll need just id in set_a.

answered Jan 1, 2018 at 22:46

algrid

6,0343 gold badges38 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

arunk2 · Accepted Answer · 2018-01-02 06:23:11Z

A dict/mapping object maps hashable values(keys) to arbitrary objects. By default all the primitive immutable objects in python have this hash function implemented. For custom objects you have to implement them for correct functionality. Refer - Object of custom type as dictionary key for more details.

Internals of 'hashtable/hashmap/dictionary/dict' DataStructure:

The important aspect of 'dict' datastructure is - it provides complexity of 'dictionary lookups' in O(1). While others need atleast O(log n) or O(n) for lookup.

For providing this O(1) complexity, it requires the key objects to provide a "hash" function.

This hash function takes the information in a key object and uses it to produce an integer, called a "hash value".
This hash value is then used to determine which "bucket" this (key, value) pair should be placed. Then, go to the bucket and get the value directly or by traversing the list to get correct value for given key.

Following code should run in O(n), where n is the no.of lines in your 2nd File.

fin = open('x.csv')
file1 = open("y.txt","r")
file_output = open("z.txt","w")

lines = file1.readlines()
a = {}
for line in lines:
    key = line.split("\n")[0]
    a[key] = 1

for line in fin.readlines():
    id=line.split(',')[0]
    if id in a:
       file_output.write(line)

Collectives™ on Stack Overflow

How can I apply hash algorithm instead of for loops to reduce time complexity in python?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related