1

I need to parse a BUFKIT weather model file and it's really quite a long file.

PRES TMPC TMWC DWPC THTE DRCT SKNT OMEG
CFRL HGHT
995.10 4.64 3.24 1.45 290.15 360.00 0.58 -0.10
0.00 292.82
990.40 5.04 2.18 -1.95 288.49 75.96 4.80 -0.10
0.00 331.43
985.70 6.44 2.36 -3.77 289.24 109.86 7.44 -0.10
0.00 370.34
976.00 8.64 3.43 -4.12 292.23 142.13 8.86 -0.10
0.00 410.44

Within the file, I really only care about the lines the numerical lines (which are broken into two lines). For example, I'd like to parse:

995.10 4.64 3.24 1.45 290.15 360.00 0.58 -0.10
0.00 292.82

I can find the individual decimals using:

-?[0-9]\d*(\.\d+)?

I can't figure out how to capture all 10 of them, including the line break.

So what is an expression I can use to parse all 10?

edit: There are also lines that conflict with some patterns. These need to be ignored:

722190 141106/2300 1013.10 981.20 17.94 292.30 0.00 55.00
1.68 0.00 290.50 0.00 0.00 27.00
0.00 3.90 -1.70 0.06 0.02 17.44
11.63 0.00 0.00 0.00 1.00 18.20
-4.80 -26.00 -0.02 60.00 979.00 0.19
15.80
722190 141107/0000 1014.00 981.90 16.44 291.50 0.00 56.00
0.69 0.00 290.50 0.00 0.00 0.00
0.00 4.40 -2.00 0.02 0.02 15.74
9.01 0.00 0.00 0.00 1.00 19.60
-5.50 -104.50 -0.04 60.00 951.30 20.00
11.94
2
  • Are there always 2 columns in 2nd continuing line as shown in your example? Commented Nov 3, 2014 at 21:28
  • @anubhava yes but I'd like regex to pull out both lines each time. Commented Nov 3, 2014 at 21:34

2 Answers 2

1

You can use this regex to capture each line separately using MULTILINE flag:

^((?:-?\d+(?:\.\d+)?\s+){9}-?\d+(?:\.\d+)?)(?=\r?\n|\z)

RegEx Demo

Once lines separated out you can just use String#split by space to get individual cell values.

Sign up to request clarification or add additional context in comments.

4 Comments

We're getting close. There seems to be some rows at the bottom I want to ignore but this pattern is picking it up (I know, I'm sorry , I should have included those.
everything below: STN YYMMDD/HHMM PMSL PRES SKTC STC1 SNFL WTNS
Nope, it's still picking those up
1

Updated: based on the data posted, use this pattern w/ gm options

^((?:-?\d+(?:\.\d+)? ){7}(?:-?\d+(?:\.\d+)?(?:\r?\n))(?:-?\d+(?:\.\d+)? )(?:-?\d+(?:\.\d+)?(?=\r?\n|$)))

Demo

2 Comments

For some reason, I see how nicely that works in the regex101 engine but in java, it doesn't match anything
@JamieMcIlroy updated my pattern above, hope it helps

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.