0

I would like to parse an HTML form and pull our filename's of any embedded images.

So the string could look like:

{ 

... random HTML content

    image1.png 

 more random HTML content

    image3.png

... }

From the above I would like to write a function in Java that returns to me {image1.png, image3.png}.

I have a regular expression that returns to me only the last image name (image3.png) but it disregards previous image names. How can I capture all of them using regex?

All / any help would be appreciated.

1
  • This is probably a task best suited to a parsing API such as JSoup or JTidy. While a RegEx is a powerful tool, it has been shown time and again to be insufficient for extracting information from real WWW HTML. Commented Dec 21, 2011 at 0:41

1 Answer 1

2

https://stackoverflow.com/a/2059614/684934 give a good hint. More specifically, you're probably looking for something like [a-zA-Z0-9_\-]+\.(png|jpg|gif|jpeg|tif)

Note, however, that this is regex and is only looking for sequences of characters. If you are looking at a site that serves up dynamic images using servlets for example, and the resource URI doesn't happen to end with a normal image file extension (such as .jsp or .do), then the regex will completely fail. It will also pick up any "image names" from any sort of text that happens to match, which does not actually represent an image on the page.

To do the job properly, you will need to use some sort of DOM and traverse the <img> elements. (And the <button> elements, which may be of type image... there are probably more tags that can have images.)

Sign up to request clarification or add additional context in comments.

1 Comment

"there are probably more tags that can have images." Background images. Using CSS, they can be applied to a variety of elements. +1 for "To do the job properly, you will need to use some sort of DOM" ( and sorry to break your 4x4 rep score ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.