3

This should be easy but I can't get it to work. I want to read the following URL, which is a CSV file but dosen't have the ".csv" suffix:

http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3&wy=2013&per=APR-SEP&type=ESP10

One additional "non-standard" aspect of the data structure is that there are two comment lines at the beginning of the filename which start with "#". Below are the first few lines of the file:

# Water Supply Forecast for COLUMBIA - THE DALLES DAM (TDAO3) 
# ESP Generated Forecasts with 10 day QPF
ID,Forecast Date,Start Month,End Month,90% FCST,50% FCST,10% FCST
TDAO3,2013-04-29,APR,SEP,82280.5,87857.86,93216.58
TDAO3,2013-04-28,APR,SEP,81707.62,87079.18,93104.28
TDAO3,2013-04-27,APR,SEP,81298.03,86753.18,92658.75
TDAO3,2013-04-26,APR,SEP,83142.93,88694.89,94804.66
TDAO3,2013-04-25,APR,SEP,83937.66,89378.74,95840.54
TDAO3,2013-04-24,APR,SEP,83045.52,88362.98,95224.37
TDAO3,2013-04-23,APR,SEP,82921.77,88242.32,95658.01
TDAO3,2013-04-22,APR,SEP,82992.71,88539.25,95768.61
TDAO3,2013-04-20,APR,SEP,82637.34,88036.47,95859.98
TDAO3,2013-04-19,APR,SEP,83258.96,88906.11,96523.07
TDAO3,2013-04-18,APR,SEP,82768.39,88486.72,96165.99

I thought the syntax would be simple:

fname <- "http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3&wy=2013&per=APR-SEP&type=ESP10"
df <- read.table(fname, header=TRUE, sep=",", skip=2)

Any assistance would be greatly appreciated.

5
  • 7
    That's not a csv file, it's actually HTML. (View the source in your browser.) Commented May 1, 2013 at 2:50
  • ok....thanks, so i guess read.html Commented May 1, 2013 at 3:03
  • 4
    Oh that is just perverse. Copy the page and paste it into a text editor, save it with a .csv extension then send it to NOAA and tell them to stop being idiots. Commented May 1, 2013 at 3:28
  • @Gavin: Perverse because NOAA didn't provide it in csv format in the first place? Or something about my question? Commented May 1, 2013 at 5:12
  • 2
    No, the way NOAA provided the file. Your Q is well written with fully working code. I know the file is being produced by a CGI, but it's not that hard to get the script to spit out a header and pipe the data stream to the browser as a file, or even just send back a plain text file. Dropping this into a html file, wrapped in <pre> tags with odd <br> line breaks is pretty perverse. Commented May 1, 2013 at 5:35

2 Answers 2

4

Here's how to approach it from within R but I'm sure there's gurus with better approaches:

fname <- "http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3&wy=2013&per=APR-SEP&type=ESP10"

x <-readLines(fname)
y <- unlist(strsplit(x[[3]], "<br>"))
y2 <- y[4:194]
dat <- strsplit(y2, ",")
dat <- data.frame(do.call(rbind, dat))
colnames(dat) <- unlist(strsplit(y[3], ","))
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your response. I've never used do.call before and it looks handy. Thanks again.
3

Here is another approach using regex to replace the html tags appropriately

x <-readLines(fname)
# you want the "third" line
xx <- x[3]
## replace <br> with \n
xn <- gsub('<br>' ,'\n', xx)
## remove all other html tags (<pre> <body> etc)
xtext <- gsub("<(.|\n)*?>","", xn)
## read in (Lines starting with # are automagically read as comments (and discarded)
## because comment.char = '#' by default

mydata <- read.table(textConnection(xtext), header = TRUE, sep = ',')

1 Comment

Thank you for your response. I really appreciated the comments you added to explain the through process.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.