13

I'd like to parse rss feeds and download podcasts on my ReadyNas which is running 24/7 anyway.

So I'm thinking about having a shell script checking periodically the feeds and spawning wget to download the files.

What is the best way to do the parsing?

Thanks!

1
  • Perhaps I have to add: I'm on a very slow line, that's why I'm not running my workstation. Commented Jan 14, 2009 at 17:48

5 Answers 5

24

Sometimes a simple one liner with shell standard commands can be enough for this:

 wget -q -O- "http://www.rss-specifications.com/rss-podcast.xml" | grep -o '<enclosure url="[^"]*' | grep -o '[^"]*$' | xargs wget -c

Sure this does not work in every case, but it's often good enough.

Sign up to request clarification or add additional context in comments.

7 Comments

Brilliant. A cautionary note (wget_1.13.4-3 with polipo_1.0.4.1-1.2): the "-c" option (to continue interrupted downloads) may not work if you also use a proxy server. It seems to keep retrying files which are already complete.
this works, just a little thing, is there a way to only download the latest in the rss feed?, so you can run it via cron, i dont want to download 400 episodes of a show :/
hmm, that didnt work on this one > wget -q -O- "feeds.twit.tv/sn_video_hd.xml" | grep -o '<enclosure url="[^"]*' | grep -o '[^"]*$' | xargs wget -c
my question again is howto download only the single latest entry in a rss feed
@wiak It means that it takes the top entry of the feed. Which is usually the newest.
|
2

Do you have access to awk? Maybe you could use XMLGawk

Comments

1

I read about XMLStartlet here and there

But is there a port to ReadyNas NV+ available?

Comments

1

I've wrote the following simple script for downloading XML from Amazon S3, so it would be useful for parsing different kind of XML files:

#!/bin/bash
#
# Download all files from the Amazon feed
#
# Usage:
#  ./dl_amazon_feed_files.sh http://example.s3.amazonaws.com/
# Note: Don't forget about slash at the end
#

wget -qO- "$1" | grep -o '<Key>[^<]*' | grep -o "[^>]*$" | xargs -I% -L1 wget -c "$1%"

This is similar approach to @leo answer.

1 Comment

I don't remember if any of part was used, as this is very basic syntax which I'm using very often out of hand. However, I've linked the post just in case.
0

You can use xsltproc from libxml2 and write a simple xsl stylesheet that parses the rss and outputs a list of links.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.