0

Okay, so here's the thing: All of you are probably thinking the same thing: you can use

driver.getPageSource();

And this is partially true. The only issue is that the source code gets compiled in a rather strange way where all through the code

\"

starts showing up. I tried removing this manually but that still doesnt fix the problem completely.

One example of what I mean:

normal source code:

\"query_title\":null}",encoded_title:"WyJoZW5rIl0",ref:"unknown",logger_source:"www_main",typeahead_sid:"",tl_log:false,impression_id:"bbdb1882",filter_ids:

Selenium output:

\\\"query_title\\\":null}\",\"encoded_title\":\"WyJoZW5rIl0\",\"ref\":\"br_tf\",\"logger_source\":\"www_main\",\"typeahead_sid\":\"0.6583900225217523\",\"tl_log\":false,\"impression_id\":\"e00060b4\",\"filter_ids\"

It seems to be the same type of thing as where you have to put something in front of certain symbols in quotes, to stop java from seeing it as one of those symbols, but I don't fully understand this behaviour, and have no idea how to fix it... hope you can help :)

edit: replacing doesn't work because of the way this got compiled. An example of why it won't work is actually in the example I included earlier:

original:

}",encoded_title:

compiled version:

}\",\"encoded_title\":

Replacing \" with " would change it in to:

}","encoded_title":

which differs from the original...

And if I were to replace \" with nothing, I would get:

},encoded_title:

which, sadly, still differs from the original. The way this is compiled I just don't think replacing is a viable option...

2 Answers 2

1

You can use javascript to get html using outerHTML or innerHTML (How do I get the HTML source from the page?):

((JavascriptExecutor) driver).executeScript("return document.documentElement.outerHTML;")
((JavascriptExecutor) driver).executeScript("return document.documentElement.outerHTML;")
((JavascriptExecutor) driver).executeScript("return document.all[0].outerHTML")
((JavascriptExecutor) driver).executeScript("return new XMLSerializer().serializeToString(document);")
Sign up to request clarification or add additional context in comments.

10 Comments

I'm sorry for misspeaking earlier, I experimented a bit more, and this actually doesn't work... It still gives me the sam shit in between...
Try @Amit replace solution
I edited the question, to explain why that won't work
Did you try driver.getPageSource().replaceAll("\\"", ""); ?
The part you shared is json? Can you share full html?
|
0

You can use Java String Class replaceAll method to replace unwanted characters with the character you want.

OLD solution -

 driver.getPageSource().replaceAll("\\"", "\"").replaceAll("\\\\", ""));

New approx solution - As page source can contain anything in HTML

public class CheckString {


    static String str = "\\\\\\"query_title\\\\\\":null}\\",\\"encoded_title\\":\\"WyJoZW5rIl0\\",\\"ref\\":\\"br_tf\\",\\"logger_source\\":\\"www_main\\",\\"typeahead_sid\\":\\"0.6583900225217523\\",\\"tl_log\\":false,\\"impression_id\\":\\"e00060b4\\",\\"filter_ids\\"";

    public static void main(String[] args) {

    System.out.println(str.replaceAll("\\\\",","\",")
                          .replaceAll(":\\\\"", ":\"")
                          .replaceAll("\\\\"","")
                          .replaceAll("\\\\\\\\", "\\\\\""));

    }

}

OutPut -

\"query_title\":null}",encoded_title:"WyJoZW5rIl0",ref:"br_tf",logger_source:"www_main",typeahead_sid:"0.6583900225217523",tl_log:false,impression_id:"e00060b4",filter_ids

Note - In earlier approach I forgot to escape & character which is used by replaceAll function to separate multiple condition in regex

5 Comments

Thanks for your contribution, I will try this option now!
I edited the question, to explain why your answer won't work
Solution is not recommended for full page source but this works for the string you have given. Thanks.
Sadly, that will not suffice. I need loads of different parts of the source code, so I require a fix to rectify the entire source code.
I dont think so that there will be fix in source code because in client side html, the html reserved tags if used in page are replaced in the way it is appearing in getPageSource method output ... e.g < ,> are replaced with &lt,&gt Please share what ever solution you find for this. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.