0

Here's the file sample:

PG32 -13475.111367   9609.545216 -20675.190735   -194.319140                    
PG04 -15764.275182  19616.036013  -8378.361758     -9.567460                    
PG08 -23862.812721   9840.809904  -4415.011886     18.783955                    
PG10  25009.053940   9106.541565   2672.535304   -168.226094                    
PG14 -14188.519147  -9647.162991 -20079.808927     76.323202                    
PG13  12541.368512 -14252.727697  18475.956052    -99.144840                    
PG28  22638.858335  13831.226799   2650.716670    427.905209                    
PG21 -10609.714398 -12191.750707  21782.583544   -429.224611                    
PG11  -8677.979931  23944.136240  -7811.280190   -566.272355                    
PG22 -24991.333186  -9073.717145  -1692.043749    331.646741                    
PG20  25603.243214   5007.836647   5172.462172    302.625348                    
PG18 -19417.534666 -15923.466357   9597.721199    388.425996  

It's actually times bigger. First column is a satellite's "name" (e.g. "PG32"). I have a character vector with sats ids:

>[1] "PG05" "PG07" "PG09" "PG10" "PG13" "PG16" "PG19" "PG20" "PG27"  "PG28" "PG30"

So I need to extract only the lines with those ids either from a data.frame or from a file using gsubfn package read.pattern. I'm trying to get into regular expressions but don't have a complete understanding of the subject yet.

2
  • 1
    Try yourdf[grep(paste(v1, collapse='|'), yourdf$firstcolumn),] Commented Dec 27, 2015 at 13:04
  • Thanks, seems good for a data.frame. I'd like to know how to get the same result without dumping entire file to data.frame. It seems read.pattern allows to read lines from file based on regexp and that's what I want to do here. But I can't figure the appropriate regexp. Commented Dec 27, 2015 at 14:31

1 Answer 1

1

Consider scanning the file line by line with scan, iteratively checking if first column is in the satellite list:

## INITIAL VARS
file <- "C:\\Path\\To\\File.txt"
flines <- 12

satnames <- c("PG05", "PG07", "PG09", "PG10", "PG13", "PG16", 
              "PG19", "PG20", "PG27", "PG28", "PG30", "PG32")

## OPEN CONNECTION
con <- file(description=file, open="r")

## LOOP OVER CONNECTION
dfList <- c()
for(i in 1:flines) {
  tmp <- scan(file=con, nlines=1, what = list("","","","",""), quiet=TRUE)
  names(tmp) <- c('sat', 'data1', 'data2', 'data3', 'data4')  

  # APPEND TO DFLIST ONLY IF IN SATNAMES LIST
  if (tmp$sat %in% satnames) {
    dfList <- c(dfList, list(tmp))   
  }      
}

# CLOSE CONNECTION
unlink(tmp)
close(con)

# MIGRATE LIST TO DATA FRAME, CONVERTING DATA TYPES
df <- as.data.frame(do.call(rbind, dfList))
df[,c(2:5)] <- sapply(df[,(2:5)], function(x) as.numeric(as.character(x)))

rm(con, dfList, tmp)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.