0

I'm looking to make a specific graph layout from a specific format of Python data, but where the data may be text, string, or another Python object.

I can do this with XlsxWriter (see below), but the data overloads it (14 million lines...no joke. DNA data). Is it possible/how can I do this with Matplotlib or PAndas (or more stable with large data)?

DETAILS: It has to be able to do it on the fly, and make visual formatting decisions based on whether the data is "the same" (regardless of numerically, text or another Python object).

The coloring in the cells is based on whether the items within {1:"A", 2:"A"}are equivalent (not the same object). Green for True, red for False. For example: "A" == "A" (as in the preceding sentences), or (10/5) == (20/10), or ["A", 1, <test object at 0x1052c9b70>] == ["A", 1, <test object at 0x1052c9b70>] would all be green.

The text that appears in the cells is just the __str__ representation of the object.

EXAMPLE ...

{
    1000:{
          "Sample1":{1:"A", 2:"A"}, 
          "Sample2":{1:"A", 2:"A"},
          "Sample3":{1:"A", 2:"A"},
          },

    1001:{
          "Sample1":{1:"A", 2:"A"}, 
          "Sample2":{1:"A", 2:"A"},
          "Sample3":{1:"A", 2:"A"}
          },

    1002:{
          "Sample1":{1:"C", 2:"A"}, 
          "Sample2":{1:"A", 2:"A"},
          "Sample3":{1:"A", 2:"A"}
          },

    (...)

    9999:{
          "Sample1":{1:"A", 2:"T"}, 
          "Sample2":{1:"A", 2:"A"},
          "Sample3":{1:"A", 2:"G"}
          },
}

Excel example

5
  • 1
    Are you aware of the closing reason: "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it."? So you would like to edit your question to decide for one tool and show an attempt of a solution (or indicate the actual problem of using this tool). Commented Dec 5, 2017 at 12:31
  • I understand. But, to be honest, I was hoping to avoid spending 1 or 2 hours learning enough matpoltlib, only to discover it can't do this. And then again with another tool. And another. Is there a Stack exchangeforum where this type of opinion question IS appropriate? Commented Dec 5, 2017 at 13:52
  • 1
    Maybe you want to see it from an answerer's perspective. He would not want to spend half an hour on a solution just to find out that it is then not using the appropriate tool for you. (I know you said it wouldn't matter, but is that honest? Almost every time I answered such a question a comment later said, "That's great, but I cannot use it, because of reason bla.", where "bla" had in no way been mentionned in the question.) E.g. here I might opt for a solution using pandas formatted tables in a Jupyter notebook. Would that solution be appropriate? Commented Dec 5, 2017 at 14:00
  • Fair enough. I changed to ask if this can be done with Matplotlib or Pandas. There really is a problem with the XLS version because its such a huge volume of data. I know Numpy can handle large arrays...but not with that type of data. I dont know matplotlib or pandas well enough. Honestly, if I can just get a reassurance that matplotlib COULD do it...then I'll take the time to learn how. I dont even need live code (but I'll take it if offered :D) Commented Dec 5, 2017 at 14:12
  • "E.g. here I might opt for a solution using pandas formatted tables in a Jupyter notebook. Would that solution be appropriate?" If you think that actually might work...I'll try it :D Honestly, I need the report. HOW is not material to my boss :) So, I'm unconstrained (except for buying a $10,000 piece of software or something :D ) Commented Dec 5, 2017 at 14:13

1 Answer 1

2

Using pandas and styled tables in a Jupyter notebook:

json = {
    1000:{
          "Sample1":{1:"A", 2:"A"}, 
          "Sample2":{1:"A", 2:"A"},
          "Sample3":{1:"A", 2:"A"},
          },
    1001:{
          "Sample1":{1:"A", 2:"A"}, 
          "Sample2":{1:"A", 2:"A"},
          "Sample3":{1:"A", 2:"A"}
          },
    1002:{
          "Sample1":{1:"C", 2:"A"}, 
          "Sample2":{1:"A", 2:"A"},
          "Sample3":{1:"A", 2:"A"}
          },
    9999:{
          "Sample1":{1:"A", 2:"T"}, 
          "Sample2":{1:"A", 2:"A"},
          "Sample3":{1:"A", 2:"G"}
          },
}

import pandas as pd
df = pd.DataFrame(json).transpose()
df0 = df.applymap(lambda x: "{}/{}".format(x[1].__repr__(),x[2].__repr__()))
df0

df1 = df.applymap(lambda x: x[1])
df2 = df.applymap(lambda x: x[2])

booldf = df1 == df2
c = lambda x : 'background-color: {}'.format(x)
formatdf = booldf.applymap(lambda x: c("limegreen") if x else c("crimson"))

df0.style.apply(lambda x: formatdf, axis=None)

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.