3

I am trying to use pandas in python to plot the following higher-dimensional data: https://i.sstatic.net/34nbR.jpg

Here is my code:

import pandas
from pandas.tools.plotting import parallel_coordinates

data = pandas.read_csv('ParaCoords.csv')
parallel_coordinates(data,'Name')

The code fails to plot the data, and the Traceback error ends with:

Keyerror: 'Name'

What is the second argument in parallel_coordinates supposed to say/do? How can I successfully plot the data?

2
  • I think the second argument must be the name of the column you want to use for your plot. That why in iris.data they use 'Name'. Commented Jun 29, 2016 at 15:28
  • Any string I use in the second argument's position (i.e. 'column-name') results in a function error. Commented Jun 29, 2016 at 15:37

2 Answers 2

1

The second argument is supposed to be the column name that defines class. Think ['dog', 'dog', 'cat', 'bird', 'cat', 'dog'].

In the example online they use 'Name' as the second argument because that is a column defining names of iris's

Doc

Signature: parallel_coordinates(*args, **kwargs)
Docstring:
Parallel coordinates plotting.

Parameters
----------
frame: DataFrame
class_column: str
    Column name containing class names
cols: list, optional
    A list of column names to use
ax: matplotlib.axis, optional
    matplotlib axis object
color: list or tuple, optional
    Colors to use for the different classes
use_columns: bool, optional
    If true, columns will be used as xticks
xticks: list or tuple, optional
    A list of values to use for xticks
colormap: str or matplotlib colormap, default None
    Colormap to use for line colors.
axvlines: bool, optional
    If true, vertical lines will be added at each xtick
axvlines_kwds: keywords, optional
    Options to be passed to axvline method for vertical lines
kwds: keywords
    Options to pass to matplotlib plotting method
Sign up to request clarification or add additional context in comments.

3 Comments

I see! So, y is my dependent variable; and x1, x2, x3, and x4 are my independent variables. The second argument should be 'y'; or it could be 'x1', 'x2', etc.
hi! Is there any options for customising the legend of parallel_coordinates plot?
@cucurbit you can customize it the same way you would for any matplotlib plot. Essentially, assign the return value of the plot to a variable. It will be an axes object. Then manipulate from there. You'll want to search for matplotlib legend
1

The iris.data file that you download from UCI does not have headers. To make the pandas example work, you have to assign the headers explicitly as column names:

from pandas.tools.plotting import parallel_coordinates
# The iris.data file from UCI does not have headers,
# so we have to assign the column names explicitly.
data = pd.read_csv("data-iris-for-pandas/iris.data")
data.columns=["x1","x2","x3","x4","Name"]
plt.figure()
parallel_coordinates(data,"Name")

Pandas Parallel Coordinates Example

Basically, the pandas documentation is incomplete. Someone put the column names into the dataframe without letting us know.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.