1

This string is a ticker for a bond: OAT 3 25/32 7/17/17. I want to extract the coupon rate which is 3 25/32 and is read as 3 + 25/32 or 3.78125. Now I've been trying to delete the date and the name OAT with gsub, however I've encountered some problems.

This is the code to delete the date:

tkr.bond <- 'OAT 3 25/32 7/17/17'
tkr.ptrn <- '[0-9][[:punct:]][0-9][[:punct:]][0-9]'
gsub(tkr.ptrn, "", tkr.bond)

However it gets me the same string. When I use [0-9][[:punct:]][0-9] in the pattern I manage to delete part of the date, however it also deletes the fraction part of the coupon rate for the bond.

The tricky thing is to find a solution that doesn't involve the pattern of the coupon because the tickers have this form: Name Coupon Date, so, using a specific pattern for the coupon may limit the scope of the solution. For example, if the ticker is this way OAT 0 7/17/17, the coupon is zero.

6
  • Just some clarification questions... When you say that it's read as 3 + 25/32 or 3.78125, are you saying you wish to express it with this form programmatically? Or maybe that it can take those two forms in addition to the first? Commented Apr 9, 2015 at 6:19
  • It works both ways for me, once I get 3 25/32 it is simpler to convert it to decimals. Of course, the less code I use to get 3.78125 the better. Commented Apr 9, 2015 at 6:25
  • Gotcha. I think there's already some pretty good solutions for you then! Commented Apr 9, 2015 at 6:34
  • In the update OAT 0 7/17/17, the fraction part is not given. Commented Apr 9, 2015 at 6:39
  • Exactly, sometimes it has fractions, other times it doesn't. That's why the first idea I had was to delete the Date and the Name part of the ticker so whatever the Coupon part was I made sure I always captured it. Commented Apr 9, 2015 at 6:41

4 Answers 4

2

Just replace first and last word with an empty string.

> tkr.bond <- 'OAT 3 25/32 7/17/17'
> gsub("^\\S+\\s*|\\s*\\S+$", "", tkr.bond)
[1] "3 25/32"

OR

Use gsubfn function in-order to use a function in the replacement part.

> gsubfn("^\\S+\\s+(\\d+)\\s+(\\d+)/(\\d+).*", ~ as.numeric(x) + as.numeric(y)/as.numeric(z), tkr.bond)
[1] "3.78125"

Update:

> tkr.bond1 <- c(tkr.bond, 'OAT 0 7/17/17')
> m <- gsub("^\\S+\\s*|\\s*\\S+$", "", tkr.bond1)
> gsubfn(".+", ~ eval(parse(text=x)), gsub("\\s+", "+", m))
[1] "3.78125" "0" 
Sign up to request clarification or add additional context in comments.

Comments

1

Try

eval(parse(text=sub('[A-Z]+ ([0-9]+ )([0-9/]+) .*', '\\1 + \\2', tkr.bond)))
#[1] 3.78125

Or you may need

sub('^[A-Z]+ ([^A-Z]+) [^ ]+$', '\\1', tkr.bond)
#[1] "3 25/32"

Update

tkr.bond1 <- c(tkr.bond, 'OAT 0 7/17/17')
v1 <- sub('^[A-Z]+ ([^A-Z]+) [^ ]+$', '\\1', tkr.bond1)
unname(sapply(sub(' ', '+', v1), function(x) eval(parse(text=x))))
#[1] 3.78125 0.00000

Or

vapply(strsplit(tkr.bond1, ' '), function(x)  
  eval(parse(text= paste(x[-c(1, length(x))], collapse="+"))), 0)
#[1] 3.78125 0.00000

Or without the eval(parse

 vapply(strsplit(gsub('^[^ ]+ | [^ ]+$', '', tkr.bond1), '[ /]'), function(x) {
         x1 <- as.numeric(x)
         sum(x1[1], x1[2]/x1[3], na.rm=TRUE)}, 0)
#[1] 3.78125 0.00000

6 Comments

It works, but help me understand the logic, you use \\1 + \\2 in the replace argument, how does it works?
@capm It is the capture. group. We captured the first numeric element and a space followed i.e. with ([0-9]+ ) using parenthess. Similary the second capture group is 25/32` and replace the string with those capture groups (\\1, and \\2)
Hmmm not sure I see why you're using the eval(parse(text= routine here, if I can call it that?
@DominicComtois It is based on the expected result in the OP's post (as i understand) to sum the numeric elements, but I think your solution may also be his desired result. Not sure though
Ok I see... I asked a clarification question, we'll figure it out soon I guess.
|
1

Similar to akrun's answer, using sub with a replacement. How it works: you put your "desired" pattern inside parentheses and leave the rest out (while still putting regex characters to match what's there and that you don't wish to keep). Then when you say replacement = "\\1" you indicate that the whole string must be substituted by only what's inside the parentheses.

sub(pattern = ".*\\s(\\d\\s\\d+\\/\\d+)\\s.*", replacement = "\\1", x = tkr.bond, perl = TRUE)

# [1] "3 25/32"

Then you can change it to numerical:

temp <- sub(pattern = ".*\\s(\\d\\s\\d+\\/\\d+)\\s.*", replacement = "\\1", x = tkr.bond, perl = TRUE)

eval(parse(text=sub(" ","+",x = temp)))

# [1] 3.78125

Comments

0

You can also use strsplit here. Then evaluate components excluding the first and the last. Like this

> tickers <- c('OAT 3 25/32 7/17/17', 'OAT 0 7/17/17')
> 
> unlist(lapply(lapply(strsplit(tickers, " "), 
+               function(x) {x[-length(x)][-1]}),
+        function(y) {sum(
+          sapply(y, function (z) {eval(parse(text = z))}) )} ) )
[1] 3.78125 0.00000

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.