1

I have a data.frame containing numerics. I want to create a new column within that data.frame that will house factor labels using (letters[]). I want these factor labels to be built from a sequence of numbers that I have, and can change every time.

For example, my original DF has 1 column x containing numerics, I then have a sequence of numbers (3,7,9). So I need the new FLABEL column to populate according to the number sequence, i.e. first 3 lines are a, next 4 lines b and so on.

x       FLABEL
0.23     a
0.21     a
0.19     a
0.27     b
0.25     b
0.22     b
0.15     b
0.09     c
0.32     c
0.19     d
0.17     d

I'm struggling with how to do this, I'm assuming some form of for-loop given that my number sequence can vary in length every time I run it So I could be populating letters a & b...or many more.

8
  • why not DF$FLABEL <- rep(letters[1:4],c(3,4,2,2)) Commented Aug 7, 2015 at 7:41
  • because the number seq can change length & content every time I run my script Commented Aug 7, 2015 at 7:42
  • change according to what? How do you compute seq? Commented Aug 7, 2015 at 7:44
  • 1
    rep(letters[length(seq)],seq). But make sure length(seq) <= 26 Commented Aug 7, 2015 at 7:46
  • 1
    @scoa Minor correction: I think this should be rep(letters[1:length(series)],series) where series <- c(3,4,2,2) in this example. Commented Aug 7, 2015 at 7:53

1 Answer 1

1

Based on the comment by @scoa, I suggest the following modified approach:

series <- c(3, 7, 9)
series <- c(series, nrow(DF)) # This ensures that the sequence extends to the last row of DF
series2 <- c(series[1] ,diff(series))
DF$FLABEL <- rep(letters[1:length(series2)], series2)
#> DF
#      x FLABEL
#1  0.23      a
#2  0.21      a
#3  0.19      a
#4  0.27      b
#5  0.25      b
#6  0.22      b
#7  0.15      b
#8  0.09      c
#9  0.32      c
#10 0.19      d
#11 0.17      d

By using diff() the length of each sequence is calculated based on the index numbers in the input vector series. In this case, the index values 3, 7, 9 are converted into the number of repetitions of subsequent letters up to the last row of the data frame and stored in series2: 3, 4, 2, 2.

data

text <- "x       FLABEL
         0.23     x
         0.21     x
         0.19     x
         0.27     x
         0.25     x
         0.22     x
         0.15     x
         0.09     x
         0.32     x
         0.19     x
         0.17     x"
DF <- read.table(text = text, header=T)
Sign up to request clarification or add additional context in comments.

3 Comments

Erroring because the final difference isn't being handled...what i mean is the dataframe lables only go as far as the last seq number where it should go right to the end nrow of the original dataframe
Please check if you still obtain an error with the code in the revised post.
I got it to work by using ....end<- (nrow(x)-tail(series,1))...then....series2 <- c(series2,end). This allows the letters labelling to run right from the last diff point to the end of the dataframe..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.