R: Delete first and last part of string based on pattern

Question

This string is a ticker for a bond: OAT 3 25/32 7/17/17. I want to extract the coupon rate which is 3 25/32 and is read as 3 + 25/32 or 3.78125. Now I've been trying to delete the date and the name OAT with gsub, however I've encountered some problems.

This is the code to delete the date:

tkr.bond <- 'OAT 3 25/32 7/17/17'
tkr.ptrn <- '[0-9][[:punct:]][0-9][[:punct:]][0-9]'
gsub(tkr.ptrn, "", tkr.bond)

However it gets me the same string. When I use [0-9][[:punct:]][0-9] in the pattern I manage to delete part of the date, however it also deletes the fraction part of the coupon rate for the bond.

The tricky thing is to find a solution that doesn't involve the pattern of the coupon because the tickers have this form: Name Coupon Date, so, using a specific pattern for the coupon may limit the scope of the solution. For example, if the ticker is this way OAT 0 7/17/17, the coupon is zero.

Just some clarification questions... When you say that it's read as 3 + 25/32 or 3.78125, are you saying you wish to express it with this form programmatically? Or maybe that it can take those two forms in addition to the first? — Dominic Comtois
– Dominic Comtois, Commented Apr 9, 2015 at 6:19
It works both ways for me, once I get 3 25/32 it is simpler to convert it to decimals. Of course, the less code I use to get 3.78125 the better. — capm
– capm, Commented Apr 9, 2015 at 6:25
Gotcha. I think there's already some pretty good solutions for you then! — Dominic Comtois
– Dominic Comtois, Commented Apr 9, 2015 at 6:34
In the update OAT 0 7/17/17, the fraction part is not given. — akrun
– akrun, Commented Apr 9, 2015 at 6:39
Exactly, sometimes it has fractions, other times it doesn't. That's why the first idea I had was to delete the Date and the Name part of the ticker so whatever the Coupon part was I made sure I always captured it. — capm
– capm, Commented Apr 9, 2015 at 6:41

Avinash Raj · Accepted Answer · 2015-04-09 06:54:12Z

2

Just replace first and last word with an empty string.

> tkr.bond <- 'OAT 3 25/32 7/17/17'
> gsub("^\\S+\\s*|\\s*\\S+$", "", tkr.bond)
[1] "3 25/32"

OR

Use gsubfn function in-order to use a function in the replacement part.

> gsubfn("^\\S+\\s+(\\d+)\\s+(\\d+)/(\\d+).*", ~ as.numeric(x) + as.numeric(y)/as.numeric(z), tkr.bond)
[1] "3.78125"

Update:

> tkr.bond1 <- c(tkr.bond, 'OAT 0 7/17/17')
> m <- gsub("^\\S+\\s*|\\s*\\S+$", "", tkr.bond1)
> gsubfn(".+", ~ eval(parse(text=x)), gsub("\\s+", "+", m))
[1] "3.78125" "0"

edited Apr 9, 2015 at 6:54

answered Apr 9, 2015 at 6:26

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

akrun · Accepted Answer · 2015-04-09 07:08:57Z

1

Try

eval(parse(text=sub('[A-Z]+ ([0-9]+ )([0-9/]+) .*', '\\1 + \\2', tkr.bond)))
#[1] 3.78125

Or you may need

sub('^[A-Z]+ ([^A-Z]+) [^ ]+$', '\\1', tkr.bond)
#[1] "3 25/32"

Update

tkr.bond1 <- c(tkr.bond, 'OAT 0 7/17/17')
v1 <- sub('^[A-Z]+ ([^A-Z]+) [^ ]+$', '\\1', tkr.bond1)
unname(sapply(sub(' ', '+', v1), function(x) eval(parse(text=x))))
#[1] 3.78125 0.00000

Or

vapply(strsplit(tkr.bond1, ' '), function(x)  
  eval(parse(text= paste(x[-c(1, length(x))], collapse="+"))), 0)
#[1] 3.78125 0.00000

Or without the eval(parse

 vapply(strsplit(gsub('^[^ ]+ | [^ ]+$', '', tkr.bond1), '[ /]'), function(x) {
         x1 <- as.numeric(x)
         sum(x1[1], x1[2]/x1[3], na.rm=TRUE)}, 0)
#[1] 3.78125 0.00000

edited Apr 9, 2015 at 7:08

answered Apr 9, 2015 at 6:06

akrun

891k38 gold badges590 silver badges700 bronze badges

6 Comments

capm Over a year ago

It works, but help me understand the logic, you use \\1 + \\2 in the replace argument, how does it works?

akrun Over a year ago

@capm It is the capture. group. We captured the first numeric element and a space followed i.e. with ([0-9]+ ) using parenthess. Similary the second capture group is 25/32` and replace the string with those capture groups (\\1, and \\2)

Dominic Comtois Over a year ago

Hmmm not sure I see why you're using the eval(parse(text= routine here, if I can call it that?

akrun Over a year ago

@DominicComtois It is based on the expected result in the OP's post (as i understand) to sum the numeric elements, but I think your solution may also be his desired result. Not sure though

Dominic Comtois Over a year ago

Ok I see... I asked a clarification question, we'll figure it out soon I guess.

|

Dominic Comtois · Accepted Answer · 2015-04-09 06:33:51Z

1

Similar to akrun's answer, using sub with a replacement. How it works: you put your "desired" pattern inside parentheses and leave the rest out (while still putting regex characters to match what's there and that you don't wish to keep). Then when you say replacement = "\\1" you indicate that the whole string must be substituted by only what's inside the parentheses.

sub(pattern = ".*\\s(\\d\\s\\d+\\/\\d+)\\s.*", replacement = "\\1", x = tkr.bond, perl = TRUE)

# [1] "3 25/32"

Then you can change it to numerical:

temp <- sub(pattern = ".*\\s(\\d\\s\\d+\\/\\d+)\\s.*", replacement = "\\1", x = tkr.bond, perl = TRUE)

eval(parse(text=sub(" ","+",x = temp)))

# [1] 3.78125

edited Apr 9, 2015 at 6:33

answered Apr 9, 2015 at 6:11

Dominic Comtois

10.5k1 gold badge43 silver badges62 bronze badges

Comments

inscaven · Accepted Answer · 2015-04-09 07:32:56Z

0

You can also use strsplit here. Then evaluate components excluding the first and the last. Like this

> tickers <- c('OAT 3 25/32 7/17/17', 'OAT 0 7/17/17')
> 
> unlist(lapply(lapply(strsplit(tickers, " "), 
+               function(x) {x[-length(x)][-1]}),
+        function(y) {sum(
+          sapply(y, function (z) {eval(parse(text = z))}) )} ) )
[1] 3.78125 0.00000

answered Apr 9, 2015 at 7:32

inscaven

2,59421 silver badges30 bronze badges

Collectives™ on Stack Overflow

R: Delete first and last part of string based on pattern

4 Answers 4

Comments

Update

6 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Update

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related