0

I have a text file that is multiple of lines of "(what I want to grab)","junk","junk","junk" separated by newlines. I'm reading the file into a list of strings and trying to use regex to print out what I want to grab, but I cannot seem to get this to work.

The way I understand regex, ^ matches the start of a new line, \"matches to the first quotation after ^, . matches anything, then \" matches the next quotation. What am I missing?

List<String> result = Files.readAllLines(Paths.get("file.txt"));

Pattern pattern = Pattern.compile("^\".\"");

for (int i = 0; i < result.size(); i++)
{
    System.out.println(result.get(i));
    Matcher matcher = pattern.matcher(result.get(i));
    System.out.println(matcher.find());
}
5
  • 1
    . matches any character once. You're missing a quantifier, such as * (any number of times), + (any >0 number of times) or {n,m} (between n and m times). Use by appending it after the token you want to modify, e.g. .* Commented Jun 25, 2019 at 13:14
  • 1
    The Regex link to test your regex which will help you to understand created pattern. Commented Jun 25, 2019 at 13:16
  • 1
    @Aaron - " is not a regex special character, it needs to be escaped only because it's in a string literal. Commented Jun 25, 2019 at 13:19
  • 2
    @KingoOfWhales - your file seems to be CSV. As such, the relevant part may contain escape characters and additional ". I would strongly recommend parsing a CSV file with a CSV parser. First, you wouldn't need to load all the lines to memory. Second, you won't run into problems like this, and third, it's then very easy to get just the first field or just the nth field. Commented Jun 25, 2019 at 13:22
  • @RealSkeptic right, don't know what I was thinking. Commented Jun 25, 2019 at 13:28

1 Answer 1

2

Here's a simple regex which should solve your problem:

String regex = "(^[\"][^\"]+[\"])";

This will match the beginning of the line, then directly afterwards it will match one single quotation mark. Then it'll match anything except a quotation mark until it reaches one.

Another (possibly more legible) version from Aaron in the comments.

^\"[^\"]+\"

Choose which you prefer.

Tested here.

Sign up to request clarification or add additional context in comments.

6 Comments

[\"] can be simplified into \".
Yes, I'm aware. It's a matter of personal preference
Ah, no point to the outer capturing group either. You do you, but I feel like superfluous constructs hurt readability.
Sure, but then you can compare these styles. I believe (but might be wrong) that a person unfamiliar with our respective styles would have less trouble understanding ^\"[^\"]+\" than (^[\"][^\"]+[\"]) simply because there are less things to read
I added it to my answer. Let the people choose, I guess
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.