0

I have a txt file containing lines as follows (the first field is keywords, the second field is frequency of keywords, and the third field is related texts):

anorexia nervosa    1       在专利网看到一
glaucoma    10      want to suck out my eyeballs and have them replaced with
cancer  691     there is a drug that helps fight cancer called avastin
gene therapy    1       writing a review paper on gene therapy 
hormone 35      glad my hormone injections end in a month 
depression  259     depression? just made depression cake: recipe here

I want to parse the file like this (keywords sorted by those of keywords' frequency):

cancer  691
depression  259
hormone 35
glaucoma    10
anorexia nervosa    1
gene therapy    1

I check other questions about sort and order problem but I couldn't find any good example. sort() doesn't seem to be worked. Please let me know good start point!

1
  • Please don't modify the question substantially after posting. The way you got the data is distracting. Post another question if you want to optimize that or such. Commented Oct 8, 2011 at 8:15

2 Answers 2

2

solution by eudoxos will work, you have to split with tabs (\t) ie.,

data=file(yourFile).readlines()
data.sort(key=lambda l: float(l.split('\t')[1]),reverse=True)

Here, by the looks of your input text, I assumes that, the different fields are delimited by tabs.

However, delimits by comma will be a better solution, because, there's a possibility of mixing tabs and spaces.

Sign up to request clarification or add additional context in comments.

1 Comment

yeah, you're right. So already I fixed it and tried the code. Thanks!
1

If you have your lines in an array, use key parameter to the sort function; the lambda will split line at spaces/tabs, take the second column, convert to float and use that for comparison. reverse causes the order to be descending (sorry, not tested, but 99% works modulo typing errors):

data=file(yourFile).readlines()
data.sort(key=lambda l: float(l.split()[1]),reverse=True)

4 Comments

I got an error "ValueError: could not convert string to float: nervosa" Do you have idea?
Ah I see, you have spaces between words in the first column. I would suggest that you adjust your data generation procedure to write just one word. Then it will work. As a sidenote, since you generate that text file yourself, why don't you work on the data structures you already have? Another thing, if you have hard time understanding two-liners in python (no offense, I am beginner in many other things), open your data in a spreadsheet and sort it there.
Thanks for your help and comment! Actually my data is over 10,000 lines. So, I couldn't use a spreadsheet.
yeah, it worked very well! I accepted your comments. So I'm running my code to generate data structure differently. Thanks a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.