2

hi all I am new to python. please help me with this requirement.

http://www.example.com/ratings/ratings-rationales.jsp?date=true&result=true

In this link, I have to choose date first, then the rating company will list its publications as links. Now i wanted to search a link that contains a word of my interest say "stable". I have tried the following using python 3.4.2

from bs4 import BeautifulSoup
from urllib.parse import urljoin
import requests

url = "http://www.example.com/ratings/ratings-rationales.jsp?date=true&result=true"   
r = requests.get(url)
soup = BeautifulSoup(r.content)

example_links = lambda tag: getattr(tag, 'name', None) == 'a' and 'stable' in tag.get_text().lower() and 'href' in tag.attrs
results = soup.find_all(example_links)
result_links = [urljoin(url, tag['href']) for tag in results]
print (result_links)

This is not printing anything. Iam seeing below as result

>>>
[]

Obviously Iam not giving date as input.
1. How to input from and to dates as today's date ? (Obviously to check periodically for updates of the links containing a word of interest, which will be question for later time)
For example after giving from date: 31-12-2014 to date: 31-12-2014 as inputs

is the output I need as hyperlink.

Any suggestion will be much useful. Thanks in advance

Here is the updated code still Iam not able to get the result. >>> [] is the output

from datetime import datetime
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import requests

#Getting the current date
today = datetime.today()

#For the sake of brevity some parameters are missing on the payload
payload = {
    'selArchive': 1,
    'selDay': 31, 
    'selMonth': 12, 
    'selYear': 2014,
    'selDay1': 31, 
    'selMonth1': 12, 
    'selYear1': 2014,
    'selSector': '',
    'selIndustry': '',
    'selCompany': ''
}

example_url = "http://www.example.com/
r = requests.post(example_url, data=payload)    
rg = requests.get(example_url)
soup = BeautifulSoup(rg.content)

crisil_links = lambda tag: getattr(tag, 'name', None) == 'a' and 'stable' in tag.get_text().lower() and 'href' in tag.attrs   
results = soup.find_all(example_links)
result_links = [urljoin(url, tag['href']) for tag in results]
print (result_links)
8
  • you should consider that dates cannot be equal, and also that they cannot differ in more than one month. Commented Jan 1, 2015 at 14:53
  • But in the website I gave both as same ( 31 dec 2014). I see only this condition and two more conditions but not same date condition if(todate-fromdate>2678400000){ alert('The Date range can not exceeds one month'); document.frmCrisil.selDay.focus(); return false; } Commented Jan 1, 2015 at 14:55
  • Yep, but when you try to click a range of dates with the same dates, an error message is displayed. Commented Jan 1, 2015 at 15:05
  • I am not sure I understood. If you try to key in today's date it will not display anything because there are no updates yet for today. but if you any date (otherthan sunday) you can see the results. Once again sorry if Iam taking up your time. So same date is valid i think. Isn't it? Commented Jan 1, 2015 at 15:09
  • Yes, you are right...I was trying on a date from the future Commented Jan 1, 2015 at 15:26

1 Answer 1

2

You should be doing a POST instead of a GET for this particular site (this link on how to form a post request with parameters).

Check this example:

from datetime import datetime
from urllib.parse import urljoin

from bs4 import BeautifulSoup

import requests

#Getting the current date
today = datetime.today()

#Here I'm only passing from and to dates (current date) and the industry parameter
payload = {
    'selDay': 31, 
    'selMonth': 12, 
    'selYear': 2014,
    'selDay1': 31, 
    'selMonth1': 12, 
    'selYear1': 2014,
    'selIndustry': '',
    'txtPhrase': '',
    'txtInclude': '',
    'txtExclude': '',
    'selSubServices': 'ALL',
    'selServices': 'all',
    'maxresults': 10,
    'pageno': 1,
    'srchInSrchCol': '01',
    'sortOptions': 'date',
    'isSrchInSrch': '01',
    'txtShowQuery': '01',
    'tSearch': 'Find a Rating',
    'txtSearch': '',
    'selArchive': 1,
    'selSector': 148,
    'selCompany': '',
    'x': 40,
    'y': 11,
}

crisil_url = "http://www.crisil.com/ratings/ratings-rationales.jsp?result=true&Sector=true"
r = requests.post(crisil_url, data=payload)

soup = BeautifulSoup(r.content)

crisil_links = lambda tag: getattr(tag, 'name', None) == 'a' and 'stable' in tag.get_text().lower() and 'href' in tag.attrs
results = soup.find_all(crisil_links)
result_links = [urljoin(crisil_url, tag['href']) for tag in results]
print (result_links)

You will need to check the ids of the industries you are filtering, so be sure to check them via Inspect Element, selecting a the select box of industries on the browser.

After that, you will get the response and do the parsing via BeautifulSoup, as you are doing now.

Checking periodically: To check this periodically you should consider crontab if using Linux/Unix or a Scheduled task if using Windows.

Sign up to request clarification or add additional context in comments.

13 Comments

maybe adding an actual example of a post request instead of a link would be better
@avenet Can you also suggest how to get the response from website to which we passed the inputs. Below is the way i have used. But nothing resulting r = requests.post(crisil_url, data=payload) rg = requests.get(crisil_url) soup = BeautifulSoup(rg.content)
What's the result? I can get the result from the request, but it's the empty list of results, because some parameters from the list I sent you are missing, so you need to provide them all
BTW, this sites take a long time (5 or 6 seconds) to process your POST request, so you should consider this!!
yeah, little slow. Can you please guide me how to find other parameters for which i need to provide values or whatever you has sent is the full set. I found this section. Pardon me for not knowing html var day = document.frmCrisil.selDay.value; var month = document.frmCrisil.selMonth.value; month = month-1; var year = document.frmCrisil.selYear.value; var day1 = document.frmCrisil.selDay1.value; var month1 = document.frmCrisil.selMonth1.value; month1 = month1-1; var year1 = document.frmCrisil.selYear1.value;
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.