0
public class Unsplash {

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        System.setProperty("webdriver.firefox.marionette","d:\\selenium\\gecko\\geckodriver.exe");
        WebDriver driver = new FirefoxDriver();

        driver.manage().timeouts().implicitlyWait(30,TimeUnit.SECONDS);     
        driver.manage().window().maximize();
        //driver.manage().window().setPosition(new Point(1920,0));
        //driver.manage().window().setSize(new Dimension(1920/2,1080));
        driver.get("http://unsplash.com/");
        driver.findElement(By.className("_32SMR")).click();
        for(int i=0;i<30;i++)
        {
            driver.findElement(By.tagName("body")).sendKeys(Keys.PAGE_DOWN);

        }
        //driver.getPageSource();
        Pattern p = Pattern.compile("/?photo=(.*?)");
        Matcher m = p.matcher(driver.getPageSource());
        while(m.find())
        {

            driver.get("https://unsplash.com"+m.group());
            System.out.println(m.group());
        }

        driver.quit();
    }

}

Iam trying to extract href links from unsplash.com to automate it for downloading website the href linksformat is href="/photos/9l_326FISzk"

for the code System.out.println(m.group()); Iam just getting "/photos/" as ouput . How can I get full href url for example "/photos/9l_326FISzk " as output

1
  • What did you try to verify the regEx expression? Commented Jul 14, 2017 at 15:28

2 Answers 2

1

Instead of using matching a regex against the entire driver.getPageSource(), the more "Selenium"-ish way to do this is to locate the elements that contain the href attribute, then compute your regex.

Assuming you only want to get hrefs from all <a> tags on the page:

Pattern p = Pattern.compile("/?photo=(.*?)");

List<WebElement> aTags = driver.findElements(By.tagName("a"));
for (WebElement aTag : aTags) {
    String href = aTag.getAttribute("href");
    Matcher m = p.matcher(href);
    if (m.matches()) {
        // do something with href
    }
}
Sign up to request clarification or add additional context in comments.

Comments

0

Here is the Answer to your Question:

We can adopt a much easier approach to get the URLs of the images of different Artists using Java Collection. The following code block gets all the links of the images as per Artist:

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;

public class Q45106505_REGEX 
{

    public static void main(String[] args) 
    {


        System.setProperty("webdriver.gecko.driver", "C:\\Utility\\BrowserDrivers\\geckodriver.exe");
        WebDriver driver = new FirefoxDriver();

        driver.manage().timeouts().implicitlyWait(5,TimeUnit.SECONDS);     
        driver.manage().window().maximize();
        driver.get("http://unsplash.com/");
        driver.findElement(By.xpath("//button[@class='_2OLVr _21rCr']/*[name()='svg' and @class='_32SMR']")).click();;
        List<WebElement> elem_list = driver.findElements(By.xpath("//div[@id='app']//div[@id='gridSingle']/div[@class='y5w1y' and @data-test='photo-component']//a[contains(@href,'/?photo=')]"));
        List<String> title_list = new ArrayList<String>();
        List<String> href_list = new ArrayList<String>();
        for (WebElement we:elem_list)
        {
            String my_title = we.getAttribute("title");
            title_list.add(my_title);
            String my_href = we.getAttribute("href");
            href_list.add(my_href);
        }

        for(int i=0; i<title_list.size(); i++)
        {
            System.out.println(title_list.get(i)+" at : "+href_list.get(i));
        }


    }

}

The Output on the console is as follows:

View the photo By timothy muza at : https://unsplash.com/?photo=6VjPmyMj5KM
View the photo By Stephanie McCabe at : https://unsplash.com/?photo=_Ajm-ewEC24
View the photo By John Moore at : https://unsplash.com/?photo=Fdhyrhb9x7o
View the photo By Jason Blackeye at : https://unsplash.com/?photo=KUgDg__TMGk
View the photo By Mahkeo at : https://unsplash.com/?photo=m76_jjV-rRI
View the photo By Samara Doole at : https://unsplash.com/?photo=5VuLCwvZCQU
View the photo By Craig  Whitehead at : https://unsplash.com/?photo=2pdDHpqbKr8
View the photo By Chris Marquardt at : https://unsplash.com/?photo=5KmkrOjOBrE
View the photo By Annie Spratt at : https://unsplash.com/?photo=MN31CWOoEmc
View the photo By Alexandra Kusper at : https://unsplash.com/?photo=T8kr3JLALFU

Let me know if this Answers your Question.

1 Comment

@mythreya Glad to be able to help you. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.