16

In an HTML page, I want to pick the value of a javascript variable.
Below is the snippet of HTML page:

<input id="hidval" value="" type="hidden"> 
<form method="post" style="padding: 0px;margin: 0px;" name="profile" autocomplete="off">
<input name="pqRjnA" id="pqRjnA" value="" type="hidden">
<script type="text/javascript">
    key="pqRjnA";
</script>

My aim is to read the value of variable key from this page using jsoup.
Is it possible with jsoup? If yes then how?

3
  • 1
    You'd have to get the script content then either parse manually, or see if you could use Rhino to get context out of an executed JS fragment. Commented Feb 15, 2013 at 23:09
  • @Reimeus: no. Initialization can be done somewhere else here some value is being assigned to variable key. Commented Feb 15, 2013 at 23:19
  • Added kotlin tag because a similar Koltlin question is marked duplicate and is linked to this question. Commented Nov 14, 2021 at 15:24

2 Answers 2

35

Since jsoup isn't a javascript library you have two ways to solve this:

A. Use a javascript library

  • Pro:

    • Full Javascript support
  • Con:

    • Additional libraray / dependencies

B. Use Jsoup + manual parsing

  • Pro:

    • No extra libraries required
    • Enough for simple tasks
  • Con:

    • Not as flexible as a javascript library

Here's an example how to get the key with jsoupand some "manual" code:

Document doc = ...
Element script = doc.select("script").first(); // Get the script part


Pattern p = Pattern.compile("(?is)key=\"(.+?)\""); // Regex for the value of the key
Matcher m = p.matcher(script.html()); // you have to use html here and NOT text! Text will drop the 'key' part


while( m.find() )
{
    System.out.println(m.group()); // the whole key ('key = value')
    System.out.println(m.group(1)); // value only
}

Output (using your html part):

key="pqRjnA"
pqRjnA
Sign up to request clarification or add additional context in comments.

4 Comments

Hey, Jsoup + manual parsing is very good solution for this, but breaking while I am using the js variable as array. eg: keyArray = [1, 2, 3] can you please give me solution for this.
You can use this regex instead: (?s)(keyArray)\\s??=\\s??\\[(.*?)\\]. If defined two groups: Group 1 = variable name, group 2 = value (those within [ ]).
And What if I have something like abc.xyz.init({requiredJsonObjectAsAnArgument}); inside script tags and I want to parse requiredJsonObjectAsAnArgument only. Can you suggest me the applicable regex for this case?
Please try (?s)\\.init\\(\\{(.+?)\\}\\); - group #1 contains the requiredJsonObjectAsAnArgument.
0

The Kotlin question is marked as duplicate and is directed to this question.
So, here is how I did that with Kotlin:

val (key, value) = document
    .select("script")
    .map(Element::data)
    .first { "key" in it } // OR single { "key" in it }
    .split("=")
    .map(String::trim)
val pureValue = value.replace(Regex("""["';]"""), "")
println("$key::$pureValue") // key::pqRjnA

Another version:

val (key, value) = document
    .select("script")
    .first { Regex("""key\s*=\s*["'].*["'];""") in it.data() }
    .data()
    .split("=")
    .map { it.replace(Regex("""[\s"';]"""), "") }
println("$key::$value") // key::pqRjnA

Footnote

To get the document you can do this:

  • From a file:
    val input = File("my-document.html")
    val document = Jsoup.parse(input, "UTF-8")
    
  • From a server:
    val document = Jsoup.connect("the/target/url")
        .userAgent("Mozilla")
        .get()
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.