1

I want to parse a string, such as:

package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'
uses-permission:'android.permission.WRITE_APN_SETTINGS'
uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'
uses-permission:'android.permission.ACCESS_NETWORK_STATE'

I want to get:

string1: jp.tjkapp.droidllwp`

string2: 1.1

Because there are multiple uses-permission, I want to get permission as a list, contains: WRITE_APN_SETTINGS, RECEIVE_BOOT_COMPLETED and ACCESS_NETWORK_STATE.

Could you help me write the python regular expression to get the strings I want? Thanks.

4
  • Is that entire code block one giant string? Commented Oct 16, 2012 at 6:03
  • Did you write any Regex already? Commented Oct 16, 2012 at 6:04
  • You could consider it as a giant string in a text file, but you can retrieve line-by-line from the file. Commented Oct 16, 2012 at 6:04
  • @shiplu.mokadd.im wrote some re already, but need some suggestions Commented Oct 16, 2012 at 6:06

3 Answers 3

1

Assuming the code block you provided is one long string, here stored in a variable called input_string:

name = re.search(r"(?<=name\=\')[\w\.]+?(?=\')", input_string).group(0)
versionName = re.search(r"(?<=versionName\=\')\d+?\.\d+?(?=\')", input_string).group(0)
permissions = re.findall(r'(?<=android\.permission\.)[A-Z_]+(?=\')', input_string)

Explanation:

name

  • (?<=name\=\'): check ahead of the main string in order to return only strings that are preceded by name='. The \ in front of = and ' serve to escape them so that the regex knows we're talking about the = string and not a regex command. name=' is not also returned when we get the result, we just know that the results we get are all preceded by it.
  • [\w\.]+?: This is the main string we're searching for. \w means any alphanumeric character and underscore. \. is an escaped period, so the regex knows we mean . and not the regex command represented by an unescaped period. Putting these in [] means we're okay with anything we've stuck in brackets, so we're saying that we'll accept any alphanumeric character, _, or .. + afterwords means at least one of the previous thing, meaning at least one (but possibly more) of [\w\.]. Finally, the ? means don't be greedy--we're telling the regex to get the smallest possible group that meets these specifications, since + could go on for an unlimited number of repeats of anything matched by [\w\.].
  • (?=\'): check behind the main string in order to return only strings that are followed by '. The \ is also an escape, since otherwise regex or Python's string execution might misinterpret '. This final ' is not returned with our results, we just know that in the original string, it followed any result we do end up getting.
Sign up to request clarification or add additional context in comments.

2 Comments

Could you tell me what does (?<= and ?(?=\') mean?
You should really read the re documentation. (?<=...) is a positive lookbehind assertion, it checks whether the string your searching for is preceded by another string, but doesn't actually return that string as part of the match afterwards. (?=...) is a lookbehind assertion, which ensures that you get back only strings that have the specified string after them, but also doesn't return the lookbehind string as part of the actual result.
0

You can do this without regex by reading the file content line by line.

>>> def split_string(s):
...     if s.startswith('package'):
...             return [i.split('=')[1] for i in s.split() if "=" in i]
...     elif s.startswith('uses-permission'):
...             return s.split('.')[-1]
... 
>>> split_string("package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'")
["'jp.tjkapp.droid1lwp'", "'2'", "'1.1'"]
>>> split_string("uses-permission:'android.permission.WRITE_APN_SETTINGS'")
"WRITE_APN_SETTINGS'"
>>> split_string("uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'")
"RECEIVE_BOOT_COMPLETED'"
>>> split_string("uses-permission:'android.permission.ACCESS_NETWORK_STATE'")
"ACCESS_NETWORK_STATE'"
>>> 

Comments

0

Here is one example code

#!/usr/bin/env python
inputFile = open("test.txt", "r").readlines()
for line in inputFile:
    if line.startswith("package"):
        words = line.split()
        string1 = words[1].split("=")[1].replace("'","")
        string2 = words[3].split("=")[1].replace("'","")

test.txt file contains input data you mentioned earlier..

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.