0

I am writing on jupyter a program for the statistical validation of a network, the final product is a large pandas dataframe 5053x5053:

import pandas as pd
network = pd.DataFrame (data = app, index = products, columns = products)

app is a binary matrix where if app[i,j] = 1 the product i is linked to the product j. I would like to plot the network, and I just learned that it is possible using networkx (and sometimes other tools like cytoscape). Since the amount of data is large I have no clue on how to procede. Which kind of representation is the best and how can I obtain a readable plot? I have tried to write down some basic code, but results are quite disappointing:

import networkx as nx
import matplotlib.pyplot as plt

G = nx.Graph()
G = nx.from_pandas_edgelist(network)
nx.draw_random(G)

Furthermore I have a vector of 212 green products serial numbers (indexes and columns of the dataframe) that if possible I would like to draw of a different color on the same plot.

Edit: I used the code and it works better than my try, but it is still not a readble graph.

G = nx.from_numpy_matrix(gg)
G = nx.relabel_nodes(graph, dict(enumerate(greenxgreen.columns)))
nx.draw(G)

Subnetwork 255x255

1 Answer 1

1

Solution

I have used the dataframe (df) from the Dummy Data section below. This would give you a basic network-diagram. I would encourage you to dig in further in the documentation (see References section).

The nx.draw_random() creates a random arrangement. You have a lot of nodes. And so it will create a clutter. You might want to select a subset of the dataframe that has a certain number of connections at least and plot them instead to reduce the clutter.

# G = graph
G = nx.from_numpy_matrix(df.values)
G = nx.relabel_nodes(G, dict(enumerate(df.columns)))
# nx.draw_spectral(G)
# nx.draw_random(G)
# nx.draw_circular(G)
nx.draw(G)

enter image description here

To draw the whole network, including the labels, edges, etc. use nx.draw_networkx().

nx.draw_networkx(G)

enter image description here

Dummy Data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import networkx as nx

%matplotlib inline

## To randomly generate array: a
#  Uncomment the following three lines
# seed = 0
# np.random.seed(seed=seed)
# a = (np.random.rand(25).reshape(5,5) >= 0.5).astype(int)

## To use a fixed representation of array: a
a = np.array([
    [1, 1, 1, 1, 0],
    [1, 0, 1, 1, 0],
    [1, 1, 1, 1, 0],
    [0, 0, 1, 1, 1],
    [1, 1, 0, 1, 0]
    ])

nodes = list('ABCDE')
df = pd.DataFrame(data=a, index=nodes, columns=nodes)
print(df)

References

  1. Construct NetworkX graph from Pandas DataFrame
  2. Documentation: networkx.convert_matrix.from_pandas_dataframe
  3. Documentation: networkx.convert_matrix.from_pandas_edgelist
Sign up to request clarification or add additional context in comments.

4 Comments

@Rodolfo-Moreschi Is this what you were looking for?
Thanks, yes pretty much it is, I'll try right away, the only problem is that having a lot of nodes I don't know how to make a readable graph. I'll try and post the code and image.
I've just added a sample of the graphs, is it a subnetwork smaller than the network itself (that is 5053x5053)
Working on the parameters of the plot (nodes size and color, and edge color and width), but it took me one day to learn how to do it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.