0

I have a situation with a bunch of datafiles, these datafiles have a number of samples in a given time frame that depends on the system. i.e. At time t=1 for instance I might have a file with 10 items, or 20 items, at later times in that file I will always have the same number of items. The format is time, x, y, z in columns, and loaded into a numpy array. The time values show which frame, but as mentioned there's always the same, let's go with 10 as a sample. So I'll have a (10,4) numpy array where the time values are identical, but there are many frames in the file, so lets say 100 frames, so really I have (1000,4). I want to plot the data with time on the x-axis and manipulations of the other data on the y, but I am unsure how to do this with line plot methods in matplotlib. Normally to provide both x,y values I believe I need to do a scatter plot, so I'm hoping there's a better way to do this. What I ideally want is to treat each line that has the same time code as a different series (so it will colour differently), and the next bit of data for that same line number in the next frame (time value) will be labelled the same colour, giving those good contiguous lines. We can look at the time column and figure out how many items share a time code, let's call it "n". Sample code:

a = numpy.loadtxt('sampledata.txt')
plt.plot(a[:0,:,n],a[:1,:1])
plt.show()

I think this code expresses what I'm going for, though it doesn't work.

2
  • 1
    I haven't understood the text of the question at all, but note that a[:0] is an empty array, hence your code cannot produce anything useful. Commented Feb 14, 2019 at 20:07
  • @ImportanceOfBeingErnest you're probably right, my goal was to select the first column, but I often screw up numpy array slicing. Commented Feb 20, 2019 at 19:31

2 Answers 2

1

Edit: I hope this is what you wanted.

seaborn scatterplot can categorize data to some groups which have the same codes (time code in this case) and use the same colors to them.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv(r"E:\Programming\Python\Matplotlib\timecodes.csv",
                 names=["time","x","y","z","code"]) #use your file

df["time"]=pd.to_datetime(df["time"]) #recognize the data as Time
df["x"]=df["time"].dt.day # I changed the data into "Date only" and imported to x column. Easier to see on graph.

#just used random numbers in y and z in my data.
sns.scatterplot("x", "y", data = df, hue = "code") #hue does the grouping

plt.show()

I used csv file here but you can do to your text file as well by adding sep="\t" in the argument. I also added a code in the file. If you have it the code can group the data in the graph, so you don't have to separate or make a hierarchical index. If you want to change colors or grouping please see seaborn website.

Hope this helps.

Sign up to request clarification or add additional context in comments.

5 Comments

We're in the ballpark, really my y axis should be one of my position columns, and my x axis should be time. The data is technically grouped by time, though in the file it just has the raw time code for each with each xyz data set. Ideally let's say I have 10 items, I would have hundreds of time codes, but always 10 of the same time code corresponding to that frames data, which I want to plot as a series where the 10 different items all have different colours, and the data is plotted at that same y code, try to avoid more arrays of x,y,z datas and plotting them separately as series.
So do you mean 1 frame has 10 items and they have the same time but x, y, z have different values, and there are 100 frames of this? In this case Hierarchical indexing might be a good way to start. Once I found a way to do, I will edit my answer.
I didn't use hierarchical index in my edit because it won't be necessary if there is a code to group into. Seaborn does its job. You can customize colors and plot options with Seaborn.
Hi Tim, I mean 1 frame has 10 items with the same time code and x,y,z have different values. There may be more than 100 frames, but let's stick with 100 frames for now. The time codes are not dates and times, they're nanoseconds, so not sure I don't think the datetime functions will help me. My goal is to get each of the 10 series plotted on a plot as t,x and another t,y for instance. As efficiently as possible as in the end I'm likely to have as many as 50 series with millions of lines of data.
So they should be grouped for instance in our example as 1, 11, 21, etc., where line 1 is the first item, with the time code which needs to be on the x-axis, I'm trying to avoid pulling data out into individual series due to the sizes
1

Alternative, the method I used, but Tim's answer is still accurate as well. Since the time codes are not date/time information I modified my own code to add tags as a second column I call "p" (they're polymers).

import numpy as np
import pandas as pd
datain = np.loadtxt('somefile.txt')
df = pd.DataFrame(data = datain, columns = ["t","p","x","y","z"])
ax = sns.scatterplot("t","x", data = df, hue = "p")
plt.show()

And of course the other columns can be plotted similarly if desired.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.