How to replace a string with a regex in java using nested groups

Question

I have the format line

"123","45","{"VFO":[B501], "AGN":[605,B501], "AXP":[665], "QAV":[720,223R,251Q,496M,548A,799M]}","4"

it can be longer but it always contains

"number","number","someValues","digit"

I need to wrap values inside someValues with quotes

for test string expected result should be.

"123","45","{"VFO":["B501"], "AGN":["605","B501"], "AXP":["665"], "QAV":["720","223R","251Q","496M","548A","799M"]}","4"

Please suggest simplest solution in java.

P.S.

my variant:

                        String valuePattern = "\\[(.*?)\\]";
                        Pattern valueR = Pattern.compile(valuePattern);
                        Matcher valueM = valueR.matcher(line);
                        List<String> list = new ArrayList<String>();
                        while (valueM.find()) {
                            list.add(valueM.group(0));
                        }
                        String value = "";
                        for (String element : list) {
                            element = element.substring(1, element.length() - 1);
                            String[] strings = element.split(",");
                            String singleGroup = "[";
                            for (String el : strings) {
                                singleGroup += "\"" + el + "\",";
                            }
                            singleGroup = singleGroup.substring(0, singleGroup.length() - 1);
                            singleGroup = singleGroup + "]";
                            value += singleGroup;
                        }
                        System.out.println(value);

Are these JSON objects? Have you looked into implementing a java JSON library? — Patrick J Abare II
– Patrick J Abare II, Commented Jun 23, 2016 at 14:17
@Patrick J Abare II I understand that these rows is not valid json. It is root cause — gstackoverflow
– gstackoverflow, Commented Jun 23, 2016 at 14:19
@fartagaintuxedo I am triyng to write long code using string.replace. and split. But I am not sure that this solution best — gstackoverflow
– gstackoverflow, Commented Jun 23, 2016 at 14:23

fartagaintuxedo · Accepted Answer · 2016-06-23 17:22:37Z

1

EDITED

OK, here is the shortest way i found, it works very nicely in my opinion, except for the comma and the bracket which i had to add manually... somebody might be able to do it straight away but i found it tricky to handle replacements with nested groups.

import java.util.*;
import java.lang.*;
import java.io.*;

Pattern p = Pattern.compile("(\\[(\\w+))|(,(\\w+))");
Matcher m = p.matcher("\"123\",\"45\",\"{\"VFO\":[B501], \"AGN\":[605,B501], \"AXP\":[665], \"QAV\":[720,223R,251Q,496M,548A,799M]}\",\"4\"");
StringBuffer s = new StringBuffer();
while (m.find()){
  if(m.group(2)!=null){
    m.appendReplacement(s, "[\""+m.group(2)+"\"");
  }else if(m.group(4)!=null){
    m.appendReplacement(s, ",\""+m.group(4)+"\"");
  }
}
m.appendTail(s);
print(s);

edited Jun 23, 2016 at 17:22

answered Jun 23, 2016 at 14:50

fartagaintuxedo

75910 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

fartagaintuxedo Over a year ago

have tested the regex but not the complete code, should work at least if some minor correction...

gstackoverflow Over a year ago

it creates separated entries for multi valued values QAV - 720, QAV-223R etc

gstackoverflow Over a year ago

it should be array

fartagaintuxedo Over a year ago

check now, the output is the new string with all the values in between quotes

Daniel Pryden · Accepted Answer · 2016-06-23 15:36:00Z

As I commented above, I think the real solution here is to fix the thing that's generating this malformed output. In the general case I don't believe it's possible to parse correctly: if the strings contain embedded bracket or comma characters then it becomes impossible to determine which parts are which.

You can get pretty close, though, by simply ignoring all quote characters and tokenizing the rest:

public final class AlmostJsonSanitizer {
  enum TokenType {
    COMMA(','),
    COLON(':'),
    LEFT_SQUARE_BRACKET('['),
    RIGHT_SQUARE_BRACKET(']'),
    LEFT_CURLY_BRACKET('{'),
    RIGHT_CURLY_BRACKET('}'),
    LITERAL(null);

    static Map<Character, TokenType> LOOKUP;
    static {
      Map<Character, TokenType> lookup = new HashMap<Character, TokenType>();
      for (TokenType tokenType : values()) {
        lookup.put(tokenType.ch, tokenType);
      }
      LOOKUP = Collections.unmodifiableMap(lookup);
    }

    private final Character ch;

    private TokenType(Character ch) {
      this.ch = ch;
    }
  }

  static class Token {
    final TokenType type;
    final String string;

    Token(TokenType type, String string) {
      this.type = type;
      this.string = string;
    }
  }

  private static class Tokenizer implements Iterator<Token> {
    private final String buffer;
    private int pos;

    Tokenizer(String buffer) {
      this.buffer = buffer;
      this.pos = 0;
    }

    @Override
    public boolean hasNext() {
      return pos < buffer.length;
    }

    @Override
    public Token next() {
      char ch = buffer.charAt(pos);
      TokenType type = TokenType.LOOKUP.get(ch);
      // If it's in the lookup table, return a token of that type
      if (type != null) {
        pos++;
        return new Token(type, null);
      }
      // Otherwise it's a literal
      StringBuilder sb = new StringBuilder();
      while (pos < buffer.length) {
        ch = buffer.charAt(pos++);
        // Skip all quote characters
        if (ch == '"') {
          continue;
        }
        // If we've found a different type of token then stop
        if (TokenType.LOOKUP.get(ch) != null) {
          break;
        }
        sb.append(ch);
      }
      return new Token(TokenType.LITERAL, sb.toString());
    }

    @Override
    public boolean remove() {
      throw new UnsupportedOperationException();
    }
  }

  /** Convenience method to allow using a foreach loop below. */
  static Iterable<Token> tokenize(final String input) {
    return new Iterable<Token>() {
      @Override
      public Iterator<Token> iterate() {
        return new Tokenizer(input);
      }
    };
  }

  public static String sanitize(String input) {
    StringBuilder result = new StringBuilder();
    for (Token token : tokenize(input)) {
      switch (token.type) {
        case COMMA:
          result.append(", ");
          break;

        case COLON:
          result.append(": ");
          break;

        case LEFT_SQUARE_BRACKET:
        case RIGHT_SQUARE_BRACKET:
        case LEFT_CURLY_BRACKET:
        case RIGHT_CURLY_BRACKET:
          result.append(token.type.ch);
          break;

        case LITERAL:
          result.append('"').append(token.string).append('"');
          break;
      }
    }
    return result.toString();
  }
}

If you wanted to you could also do some sanity checks like ensuring the brackets are balanced. Up to you, this is just an example.

Collectives™ on Stack Overflow

How to replace a string with a regex in java using nested groups

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related