I need a way for my Java app to do some regex-based string analysis and replacement. Each replacement is a rule, and the app should be capable of reading a file containing these rules. This will allow users to download sets of rules, and the development of them will be sped up considerably this way, since the app does not need to be recompiled for each new or changed rule.
Here are some example rules which are currently executed server-side in Python
# ---------- Copy ----------
title = item['title']
uri = item['action']['uri']
# ---------- Spiegel Online ----------
title = title.replace(" - SPIEGEL ONLINE - Nachrichten", "").replace(" - SPIEGEL ONLINE", "")
if domain == "m.spiegel.de":
uri = "http://www.spiegel.de" + uri[19:]
if domain == "spon.de":
r = requests.head(uri) # <----- resolve the url
try: uri = r.headers['location']
except: traceback.print_exc()
# ---------- Stack Overflow ----------
if title.endswith(" - Stack Overflow"):
title = title[:-17]
# ---------- Google Play ----------
if uri.startswith("https://play.google.com"):
match = re.search(u'^Das könnte interessant sein: "(.+)"$', title, re.DOTALL)
if match:
title = match.group(1)
# ---------- Prime Guide TV ----------
if "@PrimeGuideTV" in uri:
uri_segments = uri.split("\n")
when = uri_segments[1].split(", ")
when_times = when[1].split(" - ")
dtfrom = datetime.datetime.strptime(when[0]+when_times[0], "%d.%m.%y%H:%M")
dtto = datetime.datetime.strptime(when[0]+when_times[1], "%d.%m.%y%H:%M")
title += " -- " + dtfrom.strftime("%H:%M -- %a %d %b") + " -- " + when[2].strip()# + " -- " + str(int((dtto - dtfrom).total_seconds() / 60)) + "min" + " -- " + uri_segments[1]
uri = uri_segments[2]
# ---------- Wikipedia, enforce https and demobilize ----------
if " - Wikipedia, " in title:
title = title[:title.find(" - Wikipedia, ")]
uri = re.sub(r"https?://(en\.)(?:m\.)?(wikipedia\.org/.+)", r"https://\1\2", uri, 0, re.DOTALL)
# ---------- YouTube ----------
if domain == "youtu.be":
r = requests.head(uri) # <----- resolve the url
try: uri = r.headers['location'].replace('&feature=youtu.be', '')
except: traceback.print_exc()
match = re.search(u'^Schau dir "(.+)" auf YouTube an$', title, re.DOTALL)
if match:
title = match.group(1)
# ---------- Update ----------
item['title'] = title
item['action']['uri'] = uri
#print '--', title.encode('utf-8'), '--', uri
Considering that the requirements of the title and uri parsing will change rapidly, I think it's best to offload the entire task to an interpreter, instead of trying to find some method to express this in Java. It could be too hard to try to do what is done above with Prime Guide TV via some flexibe Java code.
I thought about using a WebView and pushing the rules as JavaScript with the text into the WebView, so that they can work on that text, and then retrieve the result. No GUI is required, and in some cases the Activity will have a Theme.NoDisplay, of which I don't know if it will cause trouble.
I've read a bit about Rhino, which may be a possible option, but I don't know if the overhead is a bit too big.
Is there a better way to accomplish this? Is it worth trying to access the internal v8 engine, as I've read in some posts, or will this be a problem regarding compatibility?