Read csv file from URL without "csv" suffix

Question

This should be easy but I can't get it to work. I want to read the following URL, which is a CSV file but dosen't have the ".csv" suffix:

http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3&wy=2013&per=APR-SEP&type=ESP10

One additional "non-standard" aspect of the data structure is that there are two comment lines at the beginning of the filename which start with "#". Below are the first few lines of the file:

# Water Supply Forecast for COLUMBIA - THE DALLES DAM (TDAO3) 
# ESP Generated Forecasts with 10 day QPF
ID,Forecast Date,Start Month,End Month,90% FCST,50% FCST,10% FCST
TDAO3,2013-04-29,APR,SEP,82280.5,87857.86,93216.58
TDAO3,2013-04-28,APR,SEP,81707.62,87079.18,93104.28
TDAO3,2013-04-27,APR,SEP,81298.03,86753.18,92658.75
TDAO3,2013-04-26,APR,SEP,83142.93,88694.89,94804.66
TDAO3,2013-04-25,APR,SEP,83937.66,89378.74,95840.54
TDAO3,2013-04-24,APR,SEP,83045.52,88362.98,95224.37
TDAO3,2013-04-23,APR,SEP,82921.77,88242.32,95658.01
TDAO3,2013-04-22,APR,SEP,82992.71,88539.25,95768.61
TDAO3,2013-04-20,APR,SEP,82637.34,88036.47,95859.98
TDAO3,2013-04-19,APR,SEP,83258.96,88906.11,96523.07
TDAO3,2013-04-18,APR,SEP,82768.39,88486.72,96165.99

I thought the syntax would be simple:

fname <- "http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3&wy=2013&per=APR-SEP&type=ESP10"
df <- read.table(fname, header=TRUE, sep=",", skip=2)

Any assistance would be greatly appreciated.

That's not a csv file, it's actually HTML. (View the source in your browser.) — joran
– joran, Commented May 1, 2013 at 2:50
Oh that is just perverse. Copy the page and paste it into a text editor, save it with a .csv extension then send it to NOAA and tell them to stop being idiots. — Gavin Simpson
– Gavin Simpson, Commented May 1, 2013 at 3:28
@Gavin: Perverse because NOAA didn't provide it in csv format in the first place? Or something about my question? — MikeTP
– MikeTP, Commented May 1, 2013 at 5:12
No, the way NOAA provided the file. Your Q is well written with fully working code. I know the file is being produced by a CGI, but it's not that hard to get the script to spit out a header and pipe the data stream to the browser as a file, or even just send back a plain text file. Dropping this into a html file, wrapped in <pre> tags with odd <br> line breaks is pretty perverse. — Gavin Simpson
– Gavin Simpson, Commented May 1, 2013 at 5:35

Tyler Rinker · Accepted Answer · 2013-05-01 03:32:19Z

4

Here's how to approach it from within R but I'm sure there's gurus with better approaches:

fname <- "http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3&wy=2013&per=APR-SEP&type=ESP10"

x <-readLines(fname)
y <- unlist(strsplit(x[[3]], "<br>"))
y2 <- y[4:194]
dat <- strsplit(y2, ",")
dat <- data.frame(do.call(rbind, dat))
colnames(dat) <- unlist(strsplit(y[3], ","))

answered May 1, 2013 at 3:32

Tyler Rinker

111k74 gold badges335 silver badges534 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MikeTP Over a year ago

Thank you for your response. I've never used do.call before and it looks handy. Thanks again.

mnel · Accepted Answer · 2013-05-01 04:36:07Z

3

Here is another approach using regex to replace the html tags appropriately

x <-readLines(fname)
# you want the "third" line
xx <- x[3]
## replace <br> with \n
xn <- gsub('<br>' ,'\n', xx)
## remove all other html tags (<pre> <body> etc)
xtext <- gsub("<(.|\n)*?>","", xn)
## read in (Lines starting with # are automagically read as comments (and discarded)
## because comment.char = '#' by default

mydata <- read.table(textConnection(xtext), header = TRUE, sep = ',')

answered May 1, 2013 at 4:36

mnel

116k28 gold badges269 silver badges255 bronze badges

1 Comment

MikeTP Over a year ago

Thank you for your response. I really appreciated the comments you added to explain the through process.

Collectives™ on Stack Overflow

Read csv file from URL without "csv" suffix

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related