3

I'm pretty new to R and just can't figure out how to do this, despite some similar but not-quite-the-same questions floating around. What I have is several (~10) CSV files that look like this:

time, value
0, 5
100, 4
200, 8
etc.

That is they record a long series of times and values at that time. I want to plot all of them on one chart in R using ggplot2, so that it looks something like this enter image description here. I've been trying all kinds of melts and merges and have been unsuccessful so far (though read.csv is working fine and I can plot the files one by one easily). One thing I can't figure out is whether to combine all the data before it gets to ggplot2, or somehow pass all the data individually to ggplot2.

I should probably note that each data series shares the exact same time points. By this I mean, if file 1 has values at times 100, 200, 300, ..., 1000 then so do all the other files. But ideally, I'd like the solution not to depend on that, because I could see a future situation where the times are similarly scaled but not exactly the same, e.g. file 1 has times 99, 202, 302, 399, ... and file 2 has times 101, 201, 398, 400, ...

Thanks much.

EDIT: I can do this with just regular plot like so (clunkily), this might illustrate the kind of thing I want to do:

f1 = read.csv("file1.txt")
f2 = read.csv("file2.txt")
f3 = read.csv("file3.txt")
plot(f1$time,f1$value,type="l",col="red")
lines(f2$time, f2$value, type="l",col="blue" )
lines(f3$time, f3$value, type="l",col="green" )

3 Answers 3

3

I would divide this in 4 tasks. This can also help look for answers for each.

1. Reading a few files automatically, without harcoding the file names 
2. Merging these data.frame's , using a "left join"
3. Reshaping the data for ggplot2
4. Plotting a line graph

.

# Define a "base" data.frame
max_time = 600
base_df <- data.frame(time=seq(1, max_time, 1))

# Get the file names
all_files = list.files(pattern='.*csv')

# This reads the csv files, check if you need to make changes in read.csv
all_data <- lapply(all_files, read.csv)

# This joins the files, using the "base" data.frame
ls = do.call(cbind, lapply(all_data, function(y){
  df = merge(base_df, y, all.x=TRUE, by="time")
  df[,-1]
}))

# This would have the data in "wide" format
data = data.frame(time=base_df$time, ls)

# The plot
library(ggplot2)
library(reshape2)

mdf = melt(data, id.vars='time')
ggplot(mdf, aes(time, value, color=variable, group=variable)) +
  geom_line() +
  theme_bw()
Sign up to request clarification or add additional context in comments.

1 Comment

Hey so thanks for this. This works. I think you meant "data" where you put "output", minor typo. The data ends up labelled X1, X2, and so forth, is there any way to control what each column is named? Ideally, the name of the column would be the name of the file name before the ".csv" if that's not too difficult.
2
# Creating fake data
fNames <- c("file1.txt", "file2.txt", "file3.txt")

write.csv(data.frame(time=c(1, 2, 4), value=runif(3)), file=fNames[1])
write.csv(data.frame(time=c(3, 4), value=runif(2)), file=fNames[2])
write.csv(data.frame(time=c(5), value=runif(1)), file=fNames[3])

Here is my attempt,

fNames <- c("file1.txt", "file2.txt", "file3.txt")

allData <- do.call(rbind, # Read the data and combine into single data frame
               lapply(fNames,
                      function(f){
                        cbind(file=f, read.csv(f))
                      }))
require(ggplot2)
ggplot(allData)+
  geom_line(aes(x=time, y=value, colour=file)) # This way all series have a legend!

6 Comments

@marbel I am not sure I get you. I am using cbind only to add an additional column identifying the file name, while the files are being combined using rbind. Having different time points in different files should not affect this.
I see. Then, what about if in the first file there are the times: {1, 2, 4} and in the second {1, 3, 4}.
That will not be a problem, I have updated my answer to reflect the same. Now each file will have a random time stamp.
It's not only about being random, but different amounts. I've adapted your code with a more realistic case and it works. See the edit.
Thanks ? I used sample(1000, 100) so that the time are different amount! Anyway, I guess the readability does increase your way.
|
0

There are four ways you can do this.

First

You can merge the all data into a single data frame and then plot each line separately. Below is the code using sample data:

library(ggplot2)
library(reshape2)
data1 <- data.frame(time=1:200, series1=rnorm(200))
data2 <- data.frame(time=1:200, series2=rnorm(200))

mergeData <- merge(data1, data2, by="time", all=TRUE)

g1 <- ggplot(mergeData, aes(time, series1)) + geom_line(aes(color="blue")) + ylab("")
g2 <- g1 + geom_line(data=mergeData, aes(x=time, y=series2, color="red")) + guides(color=FALSE)
g2

SECOND

You can melt the merged data and then plot using a single ggplot code. Below is the code:

library(reshape2)
meltData <- melt(mergeData, id="time")
ggplot(meltData, aes(time, value, color=variable)) + geom_line()

THIRD This is similar to your edit. Variable names should be same.

library(ggplot2)
data1 <- data.frame(time=1:200, series1=rnorm(200))
data2 <- data.frame(time=1:200, series1=rnorm(200))

g1 <- ggplot(data1, aes(time, series1)) + geom_line(aes(color="blue")) + ylab("")
g2 <- g1 + geom_line(data=data2, aes(color="red")) + guides(color=FALSE)
g2

Fourth Method:

This is the most generic way of doing your task, making least number of assumptions.This method does not assume that variable names are same in every data set, but then it will make you write more code(wrong variable name in code, will give error).

library(ggplot2)

data1 <- data.frame(id=1:200, series1=rnorm(200))
data2 <- data.frame(id=1:200, series2=rnorm(200))

g1 <- ggplot() + geom_line(data=data1, aes(x=id, y=series1, color="red")) +
       geom_line(data=data2, aes(x=id, y=series2, color="blue")) + guides(color=FALSE)
g1

3 Comments

In the third method, please notice that I kept the names of columns same in both data.frames. You can do similar code for your different csv files.Please upvote answer, if it helps you.
you are not reading the files in your answer. You are missing the first part
I assumed that reading csv files is not the issue. I thought that issue is to be able to plot multiple line charts using different data.frames. Hence my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.