1

i am trying to find same words/text between two different file but didn't get the result which i'm looking for.

i have tried to compare line by line but didn't get the result

with open('top_1k_domain.txt', 'r') as file1:
with open('latesteasylist.txt', 'r') as file2:
    same = set(file1).intersection(file2)

 same.discard('\n')

 with open('some_output_file1.txt', 'w') as file_out:
for line in same:
    file_out.write(line)

like my first file containing the text

 google.com
 youtube.com
 facebook.com
 doublepimp.com
 uod2quk646.com
 qq.com
 yahoo.com
 tmall.com

where as the second file contains

 ||doublepimp.com^$third-party
 ||uod2quk646.com^$third-party
 ....etc

it did not produce output which i m looking for that there should be doublepimp.com and uod2quk646.com in the some_output_file1.txt file but its empty.can any body help me out here

2
  • Hello, I hope you are doing well, Could you give us and example of the two files you use? and the wished output? Please. Thank you in advance. Commented Mar 23, 2019 at 9:53
  • first file contain the domain name where as second file contain the filter rule . i have to check that for which domain name the rule is described in the filter rule. i m trying to extract the domain name from both file which are common and for which rule is defined so your response will be apriciated @GuillaumeLastecoueres thanks Commented Mar 23, 2019 at 11:07

2 Answers 2

1

By using set intersection, the items in the two sets will only match if they are identical, which they are not in the case of the two files, since the lines in the second file contain not just the domain names, but also other AdBlock syntax.

You should extract the domain name portion from the lines in the second file before you perform a set intersection with lines in the first file:

import re
same = set(file1).intersection((re.findall(r'[a-z0-9.-]+', line) or [''])[0] + '\n' for line in file2)
Sign up to request clarification or add additional context in comments.

4 Comments

it getting an AttributeError: 'NoneType' object has no attribute 'group' what i'm missing here
That's because some of the lines in your second file do not have a domain name at all. I've updated my answer so that those lines are ignored.
i have an other question if you will do it for me i'll be thankfull to you @blhsing i am also trying to fetch the type of rule which contain only this category of rule /example.js $script,domain=example.com will you make me patteren for this so that i can fetch this type of rule from the filter list ?
Glad to be of help. That really is out of the scope of this question though. Please ask about this in a new question with formatted code so that people can better help.
0

The core idea is okay, but since the second file contains more than just the domain, you will need to strip that out first.

||example.com^$third-party will never equal example.com

One possibility:

same = set(file1).itersection(set(x[2, x.index('^')-2]+'\n' for x in file2))

1 Comment

its getting an error that substring not found , Could you please complete my code @mhhollomon because still i im in the learning stage

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.