1

I'm working on a program that aims on hiding user-specified data in wav files (steganographical program, but only for educational use, nothing extremely sophisticated). Aside from doing the steganographic operations, I also need to visualize content of the original and output wav files, however I don't know how to do it in a feasible way.

At first, I thought I'd just use the canvas widget from tkinter, but it's hardly usable since the input wav files can be quite large and it would be unfeasible to draw such amounts of data, not to mention that I'd need to handle zooming, scrolling etc.

I found matplotlib which I thought could solve my problem. I loaded a 10 MB wav file (16 bit, stereo), separated the samples for the two channels and converted them to signed 16 bit integers. Then I tried to plot the data for the first channel but it seems that matplotlib cannot handle such a large amount of points to plot - at first I can see the waveform plot (but still it takes quite a while) but when I resize the window (which causes redrawing of the plot), the following exception occurs:

Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Python33\lib\tkinter\__i`enter code here`nit__.py", line 1475, in __call__
    return self.func(*args)
  File "C:\Python33\lib\site-packages\matplotlib\backends\backend_tkagg.py", line 276, in resize
    self.show()
  File "C:\Python33\lib\site-packages\matplotlib\backends\backend_tkagg.py", line 348, in  draw
    FigureCanvasAgg.draw(self)
  File "C:\Python33\lib\site-packages\matplotlib\backends\backend_agg.py", line 451, in draw
    self.figure.draw(self.renderer)
  File "C:\Python33\lib\site-packages\matplotlib\artist.py", line 56, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python33\lib\site-packages\matplotlib\figure.py", line 1035, in draw
    func(*args)
  File "C:\Python33\lib\site-packages\matplotlib\artist.py", line 56, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python33\lib\site-packages\matplotlib\axes.py", line 2088, in draw
    a.draw(renderer)
  File "C:\Python33\lib\site-packages\matplotlib\artist.py", line 56, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python33\lib\site-packages\matplotlib\lines.py", line 563, in draw
    drawFunc(renderer, gc, tpath, affine.frozen())
  File "C:\Python33\lib\site-packages\matplotlib\lines.py", line 939, in _draw_lines
    self._lineFunc(renderer, gc, path, trans)
  File "C:\Python33\lib\site-packages\matplotlib\lines.py", line 979, in _draw_solid
    renderer.draw_path(gc, path, trans)
  File "C:\Python33\lib\site-packages\matplotlib\backends\backend_agg.py", line 145, in   draw_path
    self._renderer.draw_path(gc, path, transform, rgbFace)
  OverflowError: Allocated too many blocks

The same error occurred when I tried to load a bigger WAV file (50 MB), even without plotting the waveform. So I'd need to take a different approach but don't quite know how to do it. When I load the samples first, I could probably plot averages of subsets of the input samples, which should probably be bearable for matplotlib. But I don't know how to deal with situation when I zoom/scroll through the plot, which would mean recomputing the averages based on the actual zoomlevel and the actual view position ("window"), which would probably be very poor performance-wise.

And this was only a sample plot, so I can't imagine plotting four times this amount of data (2 channels, original and output data) without facing performance issues or even failures/exceptions as mentioned. On smaller files (a few hundreds kB) it works well (but it's still somewhat slow).

Do you have any suggestions on this issue, please?

EDIT: I found out I had a bad interpretation of the input data in struct.pack() for the 16-bit samples (I used a string <H instead of <h) and somehow I don't have problems with the 10 MB WAV and it seems like there's some speed-up, however plotting the waveform is still much slower than what would be appropriate. The 50 MB WAV seems to plot well, but when I resize the window (and therefore the matplotlib canvas), the aforementioned exception occurs and the replotting doesn't take place anymore when I try to zoom to a certain area or resize the window to the previous size.

Here's the code I used just for getting to know matplotlib a little bit (it's based on the simple matplotlib demo):

(EDIT2: I changed the code so that it behaves right in the same way, but now it is much simpler, I hope.)

import matplotlib
from matplotlib.figure import Figure
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from matplotlib.backends.backend_tkagg import NavigationToolbar2TkAgg
from tkinter import tix
from tkinter.tix import *
from random import randrange

matplotlib.use('TkAgg')

samples = [randrange(-32768, 32768) for i in range(int(1e7))]
fig = Figure(figsize=(20,8), dpi=50)
subplot1 = fig.add_subplot(111)
subplot1.plot(samples, "r")

root = tix.Tk()

canvas = FigureCanvasTkAgg(fig, master=root)
canvas.show()
canvas.get_tk_widget().pack(side=TOP, fill=BOTH, expand=1)

toolbar = NavigationToolbar2TkAgg(canvas, root)
toolbar.update()
canvas._tkcanvas.pack(side=TOP, fill=BOTH, expand=1)

root.mainloop()

Any suggestions how to solve this issue and how to deal with plotting the WAV data with a reasonable performance and memory consumption (this example uses >800 MB of memory before the exception occurs, which means my approach to this matter is not good at all).

8
  • I don't think matplotlib will have a problem with a 50MB file. I just tried this on my system and it worked fine. On the other hand, I don't see much value in plotting so many more data points than one can feasibly look at. Some code and info about your system would be helpful. Commented Apr 13, 2014 at 15:57
  • @tom10: I added some more info and also a code used. I agree that there's actually no point in plotting all of the data in the array, but I'd thought matplotlib would be able to deal with this by itself, so I just provide it with the data and it would plot it in a way that would represent the data graphically, but without unnecessary overhead (i. e. plotting all the points). How could I improve this in some not-extremely-difficult way, please? Commented Apr 13, 2014 at 17:27
  • You are running into size limits in the Agg layer (which does the rasterizing step). Could you possibly write a minimal example (no embedding, synthetic data) which generates the same problem? Commented Apr 13, 2014 at 18:53
  • @tcaswell: Creating the synthetic data is very simple - the code I posted before can be used with just the load_data() method simplified to filling random sample data: self._wd_chan1 = [randrange(-32768, 32767) for i in range(int(1e7))], the same goes for self._wd_chan2. I know the whole approach is bad (even the memory consumption is around 900 MB for 1 million of samples, which is really terrific). Feeding only reduced sets of samples values to matplotlib might be the way to go, but how exactly? :( Commented Apr 13, 2014 at 19:40
  • 1
    Then change your posted code to do that. The easier you make it for people to read/understand your question the better answers you will get. As a rough guide, if your code has scroll bars I am not going to read it. Commented Apr 13, 2014 at 19:45

1 Answer 1

1

You can simplify it even more to be run in an interactive prompt, but I digress

import matplotlib
from matplotlib import pyplot as plt
from random import randrange


samples = [randrange(-32768, 32768) for i in range(int(1e7))]
fig, ax = plt.subplots(1, 1)
ax.plot(samples, "r-")

The problem is that your are trying to draw a of line segments which is more than the Agg library can deal with (I am not sure what limit is, and there should be some path simplification done before the path is passed off to Agg so it probably is not a point count limit anyway).

To some degree this is not a huge problem, your screen is only ~1k pixels across, if you plotted all of the points you have, there would be 1e4 points per pixel, which is kind of silly, so you need to down sample.

You can do this in a number of ways (and which way is right will depend on why you are plotting this) including: blindly down sampling (x = x[::1000]), averaging sections (x = np.mean(x[::n * (len(x)//n)].reshape(-1, n), axis=1)) or doing something exotic (take a fft and filter it to keep only low frequencies).

If you need to be able to zoom in and see all the points in the zoomed region, you may need to do something fancier to replace the data with a non-down sampled version as you zoom.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the answer. The measures you stated (downsampling/averaging sections) are certainly a solution to my problem, but I also need to be able to be able to see the exact values of the samples so that I'm able to compare the difference between the original sound wave and the modified one (would be added to the plot later). Is there any way how the matplotlib canvas component can tell me the range of the data that is to be displayed so that I can feed it the appropriate (reduced) set of data prior the plot is redrawn?
...to be more clear - initially, I would fill the array with the filtered samples set from say sample equivalent to time A to sample eq. to time Z (the filtered sequence of samples as they go in time). Then I zoom in to a certain part of the plot (say, to a part eq. to sample range in time A to time C) - is the matplotlib figure able to call some function that would provide it with more "dense" sample set that would be reduced from time A to time C when the zoom event occurs? This way I could avoid unnecessary amounts of data to be plotted and increase the plot precision as I would zoom in.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.