0

I'm dealing with blobs of up to - I estimate - about 100 kilo bytes in size. The data is compressed already.

Storage engine: InnoDB on MySQL 5.1

Frontend: PHP (Symfony with Propel ORM)

Some questions:

  • I've read somewhere that it's not good to update blobs, because it leads to reallocation, fragmentation, and thus bad performance. Is that true? Any reference on this?

  • Initially the blobs get constructed by appending data chunks. Each chunk is up to 16 kilo bytes in size. Is it more efficient to use a separate chunk table instead, for example with fields as below?

    parent_id, position, chunk

    Then, to get the entire blob, one would do something like:

    SELECT GROUP_CONCAT(chunk ORDER BY position) FROM chunks WHERE parent_id = 187

    The result would be used in a PHP script.

  • Is there any difference between the types of blobs, aside from the size needed for meta data, which should be negligible.

4
  • GROUP_CONCAT() isn't a good candidate for this. By default it's limited to a max length of 1024 bytes (though you can change this with the group_concat_max_len). You'd also have to be very careful constructing the query - what happens if the chunks are grouped/concatted in the wrong order? Commented Jan 9, 2011 at 15:18
  • I don't see a problem with the limit, as it can be extended. Concerning getting concatenation in the right order: That's why I propose the position field. Shouldn't that be enough? Commented Jan 9, 2011 at 16:05
  • just select each chunk and concat them in PHP. or, if you're outputting them, output them without concatting them at all. Commented Jan 9, 2011 at 22:11
  • araqnid: Why should I do that? What's the advantage? In fact, I am assuming that MySQL is way more efficient for that step. By the way, the concatenated result is then post-processed. It is not sent to the screen. Commented Jan 9, 2011 at 23:07

1 Answer 1

1

If you're creating and deleting data in a table, you will get fragmentation of table's data structure.

I don't think you can gain anything by splitting blobs into chunks — you don't gain anything by fragmenting data before DB fragments it :)

You can defragment table's structure by rebuilding it (OPTIMIZE TABLE in MySQL).

I could not find information how MySQL stores blobs on disk. If it stores them alongside other row data, then you could use clustered index (PK in InnoDB, ALTER TABLE ORDER BY in MyISAM) to require particular order of data in table's datafile (e.g. ordered by popularity to create "hot" area which might improve caching and reduce seeking a bit).

In addition to fragmentation of database's own structure, there's problem of fragmentation of table's file in filesystem.

Even if you only inserted data to the table with zero fragmentation of the table itself, the filesystem that holds the table file will sooner or later fragment it on disk. It's unavoidable on safe filesystems as they never update file's data in-place.

if fragmentation is a problem, then I'd attack it at lowest level possible. Don't store blobs in the database, store only some references to files on disk.

Filesystems are closer to the physical disk, so they can deal with fragmentation much better than DB query that's few levels of abstraction above it. Some filesystems automatically defragment small files, but leave large files fragmented.

Or you might just throw hardware at the problem — use RAID, add a ton of RAM for disk/DB caches or use SSD.

And of course you've benchmarked it carefully and know that fragmentation is a problem in a first place, right?

Sign up to request clarification or add additional context in comments.

1 Comment

In fact, I've discovered that - as of now - the fragmentation is not an issue. After introducing compression, the blobs are - in general - quite small (up to a couple of kilo bytes) and append operations are infrequent. So priorities have changed a bit. However, I am still interested in the difference between the blob data types. Also, I recall reading that MySQL stores big blobs in a different place than small blobs, and that this is much less efficient. That would be another reason to use a separate chunks table, asides from - yes - fragmentation. Not an issue though, at the moment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.