0

I want to scrape the Sector Weightings Table from the following link:

http://portfolios.morningstar.com/fund/summary?t=SPY&region=usa&culture=en-US&ownerCountry=USA

The table i want is table 6 in the website's source code. I have the following script written in R:

 library(rvest)
 turl = 'http://portfolios.morningstar.com/fund/summary?t=SPY'
 turlr = read_html(turl) 
 df6<-html_table(html_nodes(turlr, 'table')[[6]], fill = TRUE) 

However when i run the last line of the script i get the following error message

Error in out[j + k, ] : subscript out of bounds

4
  • You should see How to create a Minimal, Complete, and Verifiable example Commented Nov 26, 2017 at 23:44
  • Precisely you didn't include the important code that had produced this error Commented Nov 26, 2017 at 23:47
  • 1
    There are embedded charts and groupings in your target table. You will need to alter the returned node before it will be accepted by html_table. See this question for some guidance. Commented Nov 27, 2017 at 0:10
  • There are nigh countless R + scraping + morningstar posts on SO. Which ones did not have info that could have helped you? I'm constantly mystified abt this since it take more energy to create a q than to do an actual search. Commented Nov 27, 2017 at 1:25

1 Answer 1

3

Since the required table is designed in a different way rvest is not able to format it into proper table. But using XML package you can do it quite easily.

library(XML)
library(dplyr)

#read required table
turl = 'http://portfolios.morningstar.com/fund/summary?t=SPY'
temp_table <- readHTMLTable(turl)[[6]]

#process table to readable format
final_table <- temp_table %>%
  select(V2, V3, V4, V5) %>%
  na.omit() %>%
  `colnames<-` (c("","% Stocks","Benchmark","Category Avg")) %>%
  `rownames<-` (seq_len(nrow(.)))
final_table

Output is:

                          % Stocks Benchmark Category Avg
1                Cyclical                                
2         Basic Materials     2.79      3.16         3.22
3       Consumer Cyclical    11.06     11.42        11.15
4      Financial Services    16.39     16.50        17.22
5             Real Estate     2.24      3.18         2.00
6               Sensitive                                
7  Communication Services     3.56      3.37         3.50
8                  Energy     5.83      5.79         5.79
9             Industrials    10.37     10.89        11.70
10             Technology    22.16     21.41        19.72
11              Defensive                                
12     Consumer Defensive     8.20      7.60         8.56
13             Healthcare    14.24     13.57        14.57
14              Utilities     3.15      3.11         2.59

Hope it helps!

Sign up to request clarification or add additional context in comments.

1 Comment

This is great. Thanks a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.