0

This code compare two files and when the find the same lines, it'll write the line another text file as output.

I guess its time complexity is O(n^2). It takes too much time when the increase the lines.

I think that using Hash could be more effective.

How can I apply for the following code?

Thanks.

fin = open('x.csv')
file1 = open("y.txt","r")
file_output = open("z.txt","w")

lines = file1.readlines()
a = []
for line in lines:
     a.append(line.split("\n")[0])




for line in fin:
    id=line.split(',')[0]
    for w in a:
        if w==id:
           file_output.write(line)
3
  • 3
    Would you mind explaining what your code is supposed to do, apart from "This code find the lines"? Commented Jan 1, 2018 at 22:36
  • 1
    What is the purpose of found? Commented Jan 1, 2018 at 22:38
  • 1
    Explanation was added and "found" part was unnecessary and deleted. Commented Jan 1, 2018 at 22:49

2 Answers 2

1

Make a set out of a and then to check for presence of id in a you won't need a loop, you'll need just id in set_a.

Sign up to request clarification or add additional context in comments.

Comments

0

A dict/mapping object maps hashable values(keys) to arbitrary objects. By default all the primitive immutable objects in python have this hash function implemented. For custom objects you have to implement them for correct functionality. Refer - Object of custom type as dictionary key for more details.

Internals of 'hashtable/hashmap/dictionary/dict' DataStructure:

The important aspect of 'dict' datastructure is - it provides complexity of 'dictionary lookups' in O(1). While others need atleast O(log n) or O(n) for lookup.

For providing this O(1) complexity, it requires the key objects to provide a "hash" function.

  1. This hash function takes the information in a key object and uses it to produce an integer, called a "hash value".
  2. This hash value is then used to determine which "bucket" this (key, value) pair should be placed. Then, go to the bucket and get the value directly or by traversing the list to get correct value for given key.

Following code should run in O(n), where n is the no.of lines in your 2nd File.

fin = open('x.csv')
file1 = open("y.txt","r")
file_output = open("z.txt","w")

lines = file1.readlines()
a = {}
for line in lines:
    key = line.split("\n")[0]
    a[key] = 1

for line in fin.readlines():
    id=line.split(',')[0]
    if id in a:
       file_output.write(line)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.