1

I have the format line

"123","45","{"VFO":[B501], "AGN":[605,B501], "AXP":[665], "QAV":[720,223R,251Q,496M,548A,799M]}","4"

it can be longer but it always contains

"number","number","someValues","digit"

I need to wrap values inside someValues with quotes

for test string expected result should be.

"123","45","{"VFO":["B501"], "AGN":["605","B501"], "AXP":["665"], "QAV":["720","223R","251Q","496M","548A","799M"]}","4"

Please suggest simplest solution in java.

P.S.

my variant:

                        String valuePattern = "\\[(.*?)\\]";
                        Pattern valueR = Pattern.compile(valuePattern);
                        Matcher valueM = valueR.matcher(line);
                        List<String> list = new ArrayList<String>();
                        while (valueM.find()) {
                            list.add(valueM.group(0));
                        }
                        String value = "";
                        for (String element : list) {
                            element = element.substring(1, element.length() - 1);
                            String[] strings = element.split(",");
                            String singleGroup = "[";
                            for (String el : strings) {
                                singleGroup += "\"" + el + "\",";
                            }
                            singleGroup = singleGroup.substring(0, singleGroup.length() - 1);
                            singleGroup = singleGroup + "]";
                            value += singleGroup;
                        }
                        System.out.println(value);
9
  • Are these JSON objects? Have you looked into implementing a java JSON library? Commented Jun 23, 2016 at 14:17
  • @Patrick J Abare II I understand that these rows is not valid json. It is root cause Commented Jun 23, 2016 at 14:19
  • Shouldn't you try something before asking? Commented Jun 23, 2016 at 14:20
  • @fartagaintuxedo I am triyng to write long code using string.replace. and split. But I am not sure that this solution best Commented Jun 23, 2016 at 14:23
  • and I use java6 thus I cannot use streams Commented Jun 23, 2016 at 14:25

2 Answers 2

1

EDITED

OK, here is the shortest way i found, it works very nicely in my opinion, except for the comma and the bracket which i had to add manually... somebody might be able to do it straight away but i found it tricky to handle replacements with nested groups.

import java.util.*;
import java.lang.*;
import java.io.*;

Pattern p = Pattern.compile("(\\[(\\w+))|(,(\\w+))");
Matcher m = p.matcher("\"123\",\"45\",\"{\"VFO\":[B501], \"AGN\":[605,B501], \"AXP\":[665], \"QAV\":[720,223R,251Q,496M,548A,799M]}\",\"4\"");
StringBuffer s = new StringBuffer();
while (m.find()){
  if(m.group(2)!=null){
    m.appendReplacement(s, "[\""+m.group(2)+"\"");
  }else if(m.group(4)!=null){
    m.appendReplacement(s, ",\""+m.group(4)+"\"");
  }
}
m.appendTail(s);
print(s);
Sign up to request clarification or add additional context in comments.

4 Comments

have tested the regex but not the complete code, should work at least if some minor correction...
it creates separated entries for multi valued values QAV - 720, QAV-223R etc
it should be array
check now, the output is the new string with all the values in between quotes
0

As I commented above, I think the real solution here is to fix the thing that's generating this malformed output. In the general case I don't believe it's possible to parse correctly: if the strings contain embedded bracket or comma characters then it becomes impossible to determine which parts are which.

You can get pretty close, though, by simply ignoring all quote characters and tokenizing the rest:

public final class AlmostJsonSanitizer {
  enum TokenType {
    COMMA(','),
    COLON(':'),
    LEFT_SQUARE_BRACKET('['),
    RIGHT_SQUARE_BRACKET(']'),
    LEFT_CURLY_BRACKET('{'),
    RIGHT_CURLY_BRACKET('}'),
    LITERAL(null);

    static Map<Character, TokenType> LOOKUP;
    static {
      Map<Character, TokenType> lookup = new HashMap<Character, TokenType>();
      for (TokenType tokenType : values()) {
        lookup.put(tokenType.ch, tokenType);
      }
      LOOKUP = Collections.unmodifiableMap(lookup);
    }

    private final Character ch;

    private TokenType(Character ch) {
      this.ch = ch;
    }
  }

  static class Token {
    final TokenType type;
    final String string;

    Token(TokenType type, String string) {
      this.type = type;
      this.string = string;
    }
  }

  private static class Tokenizer implements Iterator<Token> {
    private final String buffer;
    private int pos;

    Tokenizer(String buffer) {
      this.buffer = buffer;
      this.pos = 0;
    }

    @Override
    public boolean hasNext() {
      return pos < buffer.length;
    }

    @Override
    public Token next() {
      char ch = buffer.charAt(pos);
      TokenType type = TokenType.LOOKUP.get(ch);
      // If it's in the lookup table, return a token of that type
      if (type != null) {
        pos++;
        return new Token(type, null);
      }
      // Otherwise it's a literal
      StringBuilder sb = new StringBuilder();
      while (pos < buffer.length) {
        ch = buffer.charAt(pos++);
        // Skip all quote characters
        if (ch == '"') {
          continue;
        }
        // If we've found a different type of token then stop
        if (TokenType.LOOKUP.get(ch) != null) {
          break;
        }
        sb.append(ch);
      }
      return new Token(TokenType.LITERAL, sb.toString());
    }

    @Override
    public boolean remove() {
      throw new UnsupportedOperationException();
    }
  }

  /** Convenience method to allow using a foreach loop below. */
  static Iterable<Token> tokenize(final String input) {
    return new Iterable<Token>() {
      @Override
      public Iterator<Token> iterate() {
        return new Tokenizer(input);
      }
    };
  }

  public static String sanitize(String input) {
    StringBuilder result = new StringBuilder();
    for (Token token : tokenize(input)) {
      switch (token.type) {
        case COMMA:
          result.append(", ");
          break;

        case COLON:
          result.append(": ");
          break;

        case LEFT_SQUARE_BRACKET:
        case RIGHT_SQUARE_BRACKET:
        case LEFT_CURLY_BRACKET:
        case RIGHT_CURLY_BRACKET:
          result.append(token.type.ch);
          break;

        case LITERAL:
          result.append('"').append(token.string).append('"');
          break;
      }
    }
    return result.toString();
  }
}

If you wanted to you could also do some sanity checks like ensuring the brackets are balanced. Up to you, this is just an example.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.