1

I'm trying to create 2 types of time series using this data (https://gist.github.com/datomnurdin/33961755b306bc67e4121052ae87cfbc). First how many count per day. Second total sentiments per day.

Code for second total sentiments per day.

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('data_filtered.csv', parse_dates=['date'], index_col='date')

def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
    plt.figure(figsize=(16,5), dpi=dpi)
    plt.plot(x, y, color='tab:red')
    plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
    plt.show()

plot_df(df, x=df.index, y=df.sentiment, title='Sentiment Over Time')

The 2nd time-series graph looks not making any sense for me. Also possible to save the figure for future reference.

enter image description here

2 Answers 2

2

Try checking the source data.


date

If I try to plot a distribution of date with the following code:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('data_filtered.csv', parse_dates = ['date'])

df['date'].hist()
plt.show()

I get:

enter image description here

As you can see, most of the date values are concentrated around 2020-05-19 or 2020-05-30, nothing in between. So, it makes sense to see points on only on the left and on the right side of your graph, not in the middle.


sentiment

If I try to plot a distribution of sentiment with the following code:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('data_filtered.csv', parse_dates = ['date'])

df['sentiment'].hist()
plt.show()

I get:

enter image description here

As you can see, the sentiment values are concentrated in three groups: -1, 0 and 1; no other value. So, it makes sense to see points only at the bottom, at the center and at the top of you graph, not anywhere else.


scatterplot

Finally, I try to combine date and sentiment in a scatter plot:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('data_filtered.csv', parse_dates = ['date'])

fig, ax = plt.subplots(1, 1, figsize = (16, 5))

ax.plot(df['date'], df['sentiment'], 'o', markersize = 15)
ax.set_title('Sentiment Over Time')
ax.set_xlabel('Date')
ax.set_ylabel('Value')

plt.show()

And I get:

enter image description here

It is exactly your graph, but the points are not connected by a line. You can see how the values are concentrated in specific regions and are not scattered.


cumulative

If you want to aggregate the sentiment value by the date, check this code:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('data_filtered.csv', parse_dates = ['date'])

df_cumulate = df.groupby(['date']).sum()

def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
    plt.figure(figsize=(16,5), dpi=dpi)
    plt.plot(x, y, color='tab:red')
    plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
    plt.savefig('graph.png')
    plt.show()

plot_df(df_cumulate, x=df_cumulate.index, y=df_cumulate.sentiment, title='Sentiment Over Time')

I aggregate the data through the line df = pd.read_csv('data.csv', parse_dates = ['date']); here the plot of the cumulative of the sentiment over time:

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

For the date (first use case) I think it's ok for me (but possible in line-chart format). but for sentiment maybe you get me wrong. because what I want is in a time-series format. let say 21/5/2020 there will be 2 positives and 9 negatives. The value should be -7 for 21/5/2020. y=-7, x=21/5/2020. The y-axis should be total sentiment value (positive + negative) and the x-axis should be a date.
It is cumulative, now I get it
Thanks. One more thing, how to save this figure? already tried plt.savefig('graph.png') but just empty graph.
You have to put plt.savefig('graph.png') before the plt.show(), see the above updated code
1

The data that you linked to has eight separate dates.

If you simply copy/paste, the dates are not interpreted as timepoints, but rather as strings.

you can change this by converting into datetime objects:

#convert to datetime
df['date'] = pd.to_datetime(df['date'])

The connections across the plot come from the fact that the index of the a datapoint determines when it is plotted, but the value of its x-coordinate (here: date) determines where it is plotted. Since plt.plot is a method that connects datapoints, datapoints that are plotted one after another will be connected with a line, irrespective of where they will end up. You can align timepoint and position by sorting the data:

#then sort by date
df.sort_values(by='date', inplace=True)

This doesn't make for an easily interpretable plot, but now at least you know what lines come from where:

enter image description here

A nicer way of plotting the data would be a stacked bar chart:

a=df.groupby(['date', 'sentiment']).agg(len).unstack()
a.columns = ['-1', '0', '1']
a[['-1', '0', '1']].plot(kind='bar', stacked=True)

enter image description here

1 Comment

hmmm, this is not what I want... you may refer comment below.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.