I am trying to extract integers and variable values defined in JavaScript in an HTML file using Python 3 re.findall method.
However, I am having a little difficulty matching digits enclosed in " with \d*, and matching an alphanumeric string enclosed in " too.
Case 1:
s = """
<script>
var i = 1636592595;
var j = i + Number("6876" + "52907");
</script>
"""
pattern = r'var j = i + Number(\"(\d*)\" + \"(\d*)\");'
m = re.findall(pattern, s)
print(m) # Output: []
The desired output should contain 6876 and 52907, but an empty list [] was obtained.
Case 2:
s = """
xhr.send(JSON.stringify({
"bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
"pow": j
}));
"""
pattern = r'"bm-foo": \"(\w*)\",'
m = re.findall(pattern, s)
print(m) # Output: []
The desired output should contain AAQAAAAE/////4ytkgqq/oWI, but an empty list [] was obtained.
Can I have some help explaining why my regex patterns are not matching it?
+characters. you don't need to escape"(characters.\wonly matches letters, numbers and_. So it won't match the////in the the second example.