1

I have a very long string like this sample bellow and I'm struggling to find a regex to split it in parts according to the patern, for example: '1. OAS / AC' and '2. OAS / AD'.

This slice of text has:

1) a varying number in the beginning

2) two capital letters varying from A to Z

I tried this:

x <- stringr::str_split(have, "([1-9])( OAS / )([A-Z]{2})")

but not works

Thanks in advance, for any help!

Example

require(stringr)
have <- "1. OAS / AC 12345/this is a test string to regex, 2. OAS / AD     79856/this is another test string to regex, 3. OAS / AE 87987/this is a new test string to regex. 4. OAS / AZ 78798456/this is one mode test string to regex."
want <- stringr::str_split(have, "([1-9])( OAS / )([A-Z]{2})")

want <- list(
         "1. OAS / AC " = "12345/this is a test string to regex,",
         "2. OAS / AD " = "79856/this is another test string to regex,",
         "3. OAS / AE " = "87987/this is a new test string to regex.",
         "4. OAS / AZ " = "78798456/this is one mode test string to regex."
)
2
  • Try stringr::str_match_all(have, "(\\d+\\. OAS / [A-Z]{2})\\s*(.*?)(?=\\s*\\d+\\. OAS / [A-Z]{2}|\\z)") Commented Feb 12, 2019 at 0:17
  • Hi @WiktorStribiżew. I was far from getting such a solution. Thanks a lot for the help. Commented Feb 12, 2019 at 0:25

3 Answers 3

1

We could do this with a positive lookahead, looking for the pattern of a number, followed by a peroid:

str_split(have, "(?=\\d+\\.)")

[1] ""                                                             "1. OAS / AC 12345/this is a test string to regex, "          
[3] "2. OAS / AD     79856/this is another test string to regex, " "3. OAS / AE 87987/this is a new test string to regex. "      
[5] "4. OAS / AZ 78798456/this is one mode test string to regex."

And we can further clean it up:

str_split(have, "(?=\\d{1,2}\\.)") %>% unlist() %>% .[-1]

[1] "1. OAS / AC 12345/this is a test string to regex, "           "2. OAS / AD     79856/this is another test string to regex, "
[3] "3. OAS / AE 87987/this is a new test string to regex. "       "4. OAS / AZ 78798456/this is one mode test string to regex." 
Sign up to request clarification or add additional context in comments.

1 Comment

Hi @Mako212 . Thank you so much for your help. That will help a lot.
0

You may use

library(stringr)
have <- "1. OAS / AC 12345/this is a test string to regex, 2. OAS / AD     79856/this is another test string to regex, 3. OAS / AE 87987/this is a new test string to regex. 4. OAS / AZ 78798456/this is one mode test string to regex."
r <- stringr::str_match_all(have, "(\\d+\\. OAS / [A-Z]{2})\\s*(.*?)(?=\\s*\\d+\\. OAS / [A-Z]{2}|\\z)")
res <- r[[1]][,3]
names(res) <- r[[1]][,2]

Result:

dput(res)
# => structure(c("12345/this is a test string to regex,", "79856/this is another test string to regex,", 
#  "87987/this is a new test string to regex.", "78798456/this is one mode test string to regex."
#  ), .Names = c("1. OAS / AC", "2. OAS / AD", "3. OAS / AE", "4. OAS / AZ"
#  ))

See the regex demo.

Pattern details

  • (\d+\. OAS / [A-Z]{2}) - Capturing group 1:
    • \d+ - 1+ digits
    • \. - a .
    • OAS / - a literal OAS / substring
    • [A-Z]{2} - two uppercase letters
  • \s* - 0+ whitespaces
  • (.*?) - Capturing group 2: any 0+ chars other than line break chars, as few as possible
  • (?=\s*\d+\. OAS / [A-Z]{2}|\z) - a positive lookahead: immediately to the right of the current location, there must
    • \s*\d+\. OAS / [A-Z]{2} - 0+ whitespaces, 1+ digits, ., space, /, space, two uppercase letters
    • | - or
    • \z - end of string.

Comments

0

They way you described the issue is kinda unclear, but if you want to simply extract till "OAS / AC",

library(qdap)
beg2char(have, " ", 4)#looks for the fourth occurrence of \\s and extracts everything before it.

For the above function to work, the sentences should be individual strings in a character vector

If your aim is to actually insert an "=" sign between the two letter sub-string and the number occurring after "OAS",

gsub("([A-Z])\\s*([0-9])","\\1 = \\2",have,perl=T)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.