1

I need a way for my Java app to do some regex-based string analysis and replacement. Each replacement is a rule, and the app should be capable of reading a file containing these rules. This will allow users to download sets of rules, and the development of them will be sped up considerably this way, since the app does not need to be recompiled for each new or changed rule.

Here are some example rules which are currently executed server-side in Python

    # ---------- Copy ----------
    title = item['title']
    uri = item['action']['uri']

    # ---------- Spiegel Online ----------
    title = title.replace(" - SPIEGEL ONLINE - Nachrichten", "").replace(" - SPIEGEL ONLINE", "")

    if domain == "m.spiegel.de":
      uri = "http://www.spiegel.de" + uri[19:]

    if domain == "spon.de":
      r = requests.head(uri) # <----- resolve the url
      try:    uri = r.headers['location']
      except: traceback.print_exc()

    # ---------- Stack Overflow ----------
    if title.endswith(" - Stack Overflow"):
      title = title[:-17]

    # ---------- Google Play ----------
    if uri.startswith("https://play.google.com"):
      match = re.search(u'^Das könnte interessant sein: "(.+)"$', title, re.DOTALL)
      if match:
        title = match.group(1)

    # ---------- Prime Guide TV ----------
    if "@PrimeGuideTV" in uri:
      uri_segments = uri.split("\n")
      when = uri_segments[1].split(", ")
      when_times = when[1].split(" - ")
      dtfrom = datetime.datetime.strptime(when[0]+when_times[0], "%d.%m.%y%H:%M")
      dtto   = datetime.datetime.strptime(when[0]+when_times[1], "%d.%m.%y%H:%M")
      title += " -- " + dtfrom.strftime("%H:%M -- %a %d %b") + " -- " + when[2].strip()# + " -- " + str(int((dtto - dtfrom).total_seconds() / 60)) + "min" + " -- " + uri_segments[1]
      uri = uri_segments[2]

    # ---------- Wikipedia, enforce https and demobilize ----------
    if " - Wikipedia, " in title:
      title = title[:title.find(" - Wikipedia, ")]
      uri = re.sub(r"https?://(en\.)(?:m\.)?(wikipedia\.org/.+)", r"https://\1\2", uri, 0, re.DOTALL)

    # ---------- YouTube ----------
    if domain == "youtu.be":
      r = requests.head(uri) # <----- resolve the url
      try:    uri = r.headers['location'].replace('&feature=youtu.be', '')
      except: traceback.print_exc()
    match = re.search(u'^Schau dir "(.+)" auf YouTube an$', title, re.DOTALL)
    if match:
      title = match.group(1)

    # ---------- Update ----------
    item['title'] = title
    item['action']['uri'] = uri
    #print '--', title.encode('utf-8'), '--', uri

Considering that the requirements of the title and uri parsing will change rapidly, I think it's best to offload the entire task to an interpreter, instead of trying to find some method to express this in Java. It could be too hard to try to do what is done above with Prime Guide TV via some flexibe Java code.

I thought about using a WebView and pushing the rules as JavaScript with the text into the WebView, so that they can work on that text, and then retrieve the result. No GUI is required, and in some cases the Activity will have a Theme.NoDisplay, of which I don't know if it will cause trouble.

I've read a bit about Rhino, which may be a possible option, but I don't know if the overhead is a bit too big.

Is there a better way to accomplish this? Is it worth trying to access the internal v8 engine, as I've read in some posts, or will this be a problem regarding compatibility?

2
  • You can save rules in JSON file and process it in Java app. Commented May 15, 2015 at 12:37
  • @jcubic Yes, I was going for JSON already, but the Java side processing may be too unflexible. I added some example rules so that you see what I mean. I have it in Python, and JavaScript would be just as easy, but in Java, I think that's wasted time and effort. Commented May 15, 2015 at 14:05

1 Answer 1

1

Basically, you have the following options:

  1. Use invisible WebView. Pros: probably the easiest approach to start with. You can use injected Java objects (via WebView.addJavascriptInterface) for interacting between JS <-> Java. As JS is executed on V8, it runs really fast. Cons: high memory costs (WebView is a full-blown browser engine), also JS <-> Java bridge on Android KitKat+ has significant overhead if you need to perform thousands of calls per second.

  2. Run on Java VM. You can run either JavaScript or Python on Java VM. Pros: no extra native libraries needed, JS / Python <-> Java interaction is trivially simple, you basically have full access to Java classes from your JS code. Cons: JS / Python execution will be definitely slower than on a native engine, so if you need pure performance, this isn't your way.

  3. Package V8 yourself. Unfortunately, it's not currently possible to re-use V8 from WebView without doing gross and fragile hacks, so instead you will basically need to package it as a native library and distribute with your apk (and deal with both 32-bit and 64-bit devices). You will also need to implement your own (or re-use somebody else's) JS<->Java bindings. This is a lot of work but feasible. Pros: Speeeed! Cons: Technically challenging, also for a standalone V8 there is no good JavaScript debugger, because WebView DevTools remote debugging is implemented in the rendering engine (called Blink).

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, the second approach sounds interesting, as the parsing/modifying of the text occurs rarely and usually upon request of the user. I'd like to try that. Are there any well established libraries you know about? I think I'm prefering JavaScript.
I would take a look at DynJS. Rhino is probably outdated now, as Oracle is replacing it with Nashorn.
Ok, thanks. I couldn't find a Tutorial for Android with a quick search. I'll use a WebView, and when I start recognizing patterns in the "filters" I'll parse and process them directly in Java. Maybe the WebView will have the additional benefit of allowing to add a HTML5 based editor / debugger for the rules.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.