4

I have a log file I wanted to parse in the following format:

225:org.powertac.common.Competition::0::new::game-0
287:org.powertac.common.Competition::0::withSimulationBaseTime::1255132800000
288:org.powertac.common.Competition::0::withTimezoneOffset::-6
288:org.powertac.common.Competition::0::withLatitude::45
289:org.powertac.common.Competition::0::withBootstrapTimeslotCount::336
289:org.powertac.common.Competition::0::withBootstrapDiscardedTimeslots::24
290:org.powertac.common.Competition::0::withMinimumTimeslotCount::1400
290:org.powertac.common.Competition::0::withExpectedTimeslotCount::1440
291:org.powertac.common.Competition::0::withTimeslotLength::60
291:org.powertac.common.Competition::0::withSimulationRate::720
292:org.powertac.common.Competition::0::withTimeslotsOpen::24
292:org.powertac.common.Competition::0::withDeactivateTimeslotsAhead::1
300:org.powertac.du.DefaultBrokerService$LocalBroker::1::new::default broker
300:org.powertac.du.DefaultBrokerService$LocalBroker::1::setLocal::true
2074:org.powertac.common.RandomSeed::2::init::CompetitionControlService::0::game-setup::5354386935242895562
2157:org.powertac.common.TimeService::null::setCurrentTime::2009-10-10T00:00:00.000Z
2197:org.powertac.common.RandomSeed::3::init::AccountingService::0::interest::-8975848432442556652
2206:org.powertac.common.RandomSeed::4::init::TariffMarket::0::fees::-6239716112490883981
2213:org.powertac.common.msg.BrokerAccept::null::new::1
2214:org.powertac.common.msg.BrokerAccept::null::new::1::null
2216:org.powertac.common.RandomSeed::5::init::org.powertac.du.DefaultBrokerService::0::pricing::8741252857248937781
2226:org.powertac.common.TariffSpecification::6::new::1::CONSUMPTION
2231:org.powertac.common.Rate::7::new
2231:org.powertac.common.Rate::7::withValue::-0.5
2232:org.powertac.common.Rate::7::setTariffId::6

the pattern is as following: for a new object:

<id>:<classname>::<order_of_execution>::<new>::<args>

for a method call:

 <id>:<classname>::<order_of_execution>::<method_name>::<args>

for an internal class:

 <id>:<classname$innerclass>::<order_of_execution>::<method_name or new>::<args>

for an init call:

 <id>:<classname>::<order_of_execution>::<init>::<args>

I wanted a regular expression that handles all the cases, and I would be able to retrieve each value as shown in the cases. If I want to create a new object, then I would use the Reflection API in Java. So, for example:

2231:org.powertac.common.Rate::7::new

would be parsed into "2231", "org.powertac.common.Rate", "7", "new", args = {}. How could I come up with such regular expression?

3 Answers 3

2

Use a Matcher with capturing groups:

String s = "225:org.powertac.common.Competition::0::new::game-0";
Pattern p = Pattern.compile("([^:]+):([^:]+)::([\\d]+)::([^:]+)::(.+)");
Matcher m = p.matcher(s);
if (m.find()) {
  String id = m.group(1);
  String className = m.group(2);
  int orderOfExecution = Integer.valueOf(m.group(3));
  String methodNameOrNew = m.group(4);
  String[] arguments = m.group(5).split("::");
}

Or an easier way, using java.util.Scanner, with the delimiter set to ::?:

Scanner scanner = new Scanner(s);
scanner.useDelimiter("::?");
int id = scanner.nextInt();
String className = scanner.next();
int orderOfExecution = scanner.nextInt();
String methodNameOrNew = scanner.next();
scanner.useDelimiter("$").skip("::");
String[] arguments = scanner.next().split("::");
Sign up to request clarification or add additional context in comments.

2 Comments

Joao, will your regex handle all the cases?
@philippe: Yes, they all follow the same pattern. You just need to interpret, for example, if group 4 is new then it's a new object, else, it's a method. I've added a much easier way using Scanner instead.
1

Don't try to shove all of this into a single regex. Make one regex expression for each pattern and, for every line, match it to each regex until you find a matching pattern. Then you can parse accordingly.

Pseudocode:

for line in file:
    if re.match(patNew, line):
        parseNew(line)
    elif re.match(patMethod, line):
        parseMethod(line)
    ...

A regex to match <id>:<classname>::<order_of_execution>::<new>::<args> would look something like this:

([0-9]+):(.*?)::([0-9]+)::(new)(?:::(.*))?

2 Comments

I can determine each case, but how would I go about creating a regex for this case, for example: <id>:<classname>::<order_of_execution>::<new>::<args>
@philippe I've fixed a minor issue with the regex. Tested and working with regextester.com
-1

As the values are colon-separated and cannot contain colons themselves, there is no need for escaping or quoting, so all you need is a simple

(.*):(.*)::(.*)::(.*)::(.*)

If the args are supposed to be optional, use

(.*):(.*)::(.*)::([^:]*)(?:::(.*))?

The values are in group 1 to 5. For instance, to find out if the log-entry is a constructor call, check if group 4 equals "new".

3 Comments

could you give more details? I like your approach
The following colons will prevent that the first .* matches everything. This regex works fine, as you could have easily tested by yourself.
Nice trick of removing a wrong comment to make it seem like I replied with nonsense to another question, and at the same time not removing the downvote ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.