-1

I am currently involved in mathematics of machine learning (NLP to be precise). While on the task I have encountered a problem. I want to print out lines containing any of the following regexes:

1)fbchat

2)fb_timeline

3)Facebook Wall Post

into a separate text files, one for each string mentioned above.

Then in each of the resulting text files, I would like to sort each line with respect to the thread ID field of the Database mentioned in the very first line of messaged.dmp. I am a theoretical person with very less programming experience.

The download link to the database dump is given below

messages.dmp

Update:

This is the script I tried to write:

import re
from sys import argv

scrip, file_name = argv

dfile = open(file_name, 'r')

for line in dfile:
    if re.match("fbchat", line):
        print line

But the script performs nothing.

2
  • 2
    I understand that you are a theoretical person with very less programming experience but please refer to the help You can't ask questions you haven't tried to find an answer for you need to show your work. Commented Aug 17, 2014 at 13:28
  • @KobiK I have updated my question...pls go through Commented Aug 17, 2014 at 13:49

1 Answer 1

1

Given the following text file.txt:

text1
fbchat !
text2
Facebook Wall Post line

You can use the following code:

# open the file
with open('c:\\file.txt') as f:
    # read all lines as list
    lines = f.readlines()
# iterate over the list
for line in lines:
    # if any of the the strings in the list is in the line print it (using list comprehensions)
    if any(s in line for s in ['fbchat', 'fb_timeline', 'Facebook Wall Post']):
        # print but first remove new line character
        print line.strip("\n")

Output:

fbchat !
Facebook Wall Post line

You can read more about Python With, Python: List Comprehensions, Strip()

Sign up to request clarification or add additional context in comments.

3 Comments

Thanx this worked .... but could you pls point out what mistakes were there in my script....that would be helpful as i intend to learn..............and thanks for sharing additional resources....its very hard for newbie to filter down enormous amount of material that google churns up for any reference query.....
Glad it helped, You problem was with understanding re.match(), You can read this tutorial about regex and python it's short and easy, Also you can try this post.
but i am unable to come up with a solution to the second part ....that is if i get all lines containing fbchat in a chat file...how will i be able to sort these lineS based on the thread ID field in them...pls go thru ...thank you

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.