2

I have two tables one is master table and other is just for cache. From time to time I check whether cache table is up to date and there is no missing data. Cache table is using MyISAM engine and master table is using InnoDB engine.

To explain it in more detail I give you an example

Cache table contains fields from following two tables

product_categories (cat-id, cat_name, parent_cat_id DEFAULT NULL, parent_cat_name DEFAULT NULL) 

products (product_num, product_name, product_desc, price, image, product_date, availability)

It may be possible cache table does not contain products or it may contain products data but it may not be accurate.

In this question Compare two MySQL databases a tool Toad for MySQL has mentioned but I want to do it using PHP.

Cache table schema

products_cache | CREATE TABLE `products_cache` (
  `product_num` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `cat_id` int(10) unsigned NOT NULL,
  `parent_cat_id` int(10) unsigned DEFAULT NULL,
  `cat_name` varchar(50) NOT NULL,
  `parent_cat_name` varchar(50) DEFAULT NULL,
  `product_desc` text NOT NULL,
  `price` float(10) unsigned NOT NULL,
  `image` varchar(65) NOT NULL DEFAULT '',
  `product_date` DATE DEFAULT NULL, 
  `availability` tinyint(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`product_num`),
) ENGINE=MyISAM

Possible solution

Compute the md5 of the fields and store it in cache table and then next time check the md5 in cache table if data is changed. It will work fine except there will be performance issue (I run cache fixer every month, so I think I can compromise with that). Please comment on this.

5
  • 2
    Could please post the output of SHOW CREATE TABLE cache_table? Commented Nov 10, 2010 at 13:54
  • I dont understand your motivation for storing the same data twice - why do you do this ?? Commented Nov 10, 2010 at 14:00
  • @f00 To achieve the performance Commented Nov 10, 2010 at 14:56
  • any proof that a simple join doesn't perform as required? That's pretty hard to imagine. Commented Nov 10, 2010 at 15:13
  • In the question I said its the sample scenario. In the real we have 4 table joins, which makes queries slow performing and require more table scanning. Consider the categories table a product can be in parent category or child category. If it is in child category we need two joins or two queries. The site is quite busy so I am now making a cache table for storing data to reduce load from main table. Also the main table updates frequently (i.e. rating, views, etc) which I do not need in cache table and it may improve performance greatly. Commented Nov 10, 2010 at 15:22

1 Answer 1

1

Instead of computing MD5 sums for all of your data every month you could simply record changes to a table using triggers.

CREATE TABLE changes (
  table char(30) NOT NULL, -- TODO use an enum for better performance
  id int NOT NULL,
  UNIQUE KEY tableId (table, id),
) 

CREATE TRIGGER insert_products AFTER INSERT ON products FOR EACH ROW INSERT IGNORE INTO changes (table, id) values ("products", OLD.id);
CREATE TRIGGER update_products AFTER UPDATE ON products FOR EACH ROW INSERT IGNORE INTO changes (table, id) values ("products", OLD.id);
CREATE TRIGGER delete_products AFTER DELETE ON products FOR EACH ROW INSERT IGNORE INTO changes (table, id) values ("products", OLD.id);

CREATE TRIGGER insert_product_categories AFTER INSERT ON product_categories FOR EACH ROW INSERT IGNORE INTO changes (table, id) values ("product_categories", OLD.id);
CREATE TRIGGER update_product_categories AFTER UPDATE ON product_categories FOR EACH ROW INSERT IGNORE INTO changes (table, id) values ("product_categories", OLD.id);
CREATE TRIGGER delete_product_categories AFTER DELETE ON product_categories FOR EACH ROW INSERT IGNORE INTO changes (table, id) values ("product_categories", OLD.id);

-- do this for every involved table

once in a while, you could than update changed rows (in a nightly batch job) (pseudo code):

for {table,id} in query(select table, id from changes) {
  cacheRow = buildCacheRow($table, $id)
  doInTransaction {
    query(replace into product_cache values $cacheRow)
    query(delete from changes where table = $table and id = $id)
  }
}
Sign up to request clarification or add additional context in comments.

6 Comments

This is great but it won't work in my case because products table updates very frequently i.e. views, rating etc
@jason why do you think that's a problem?
@sfussenegger because it is going to record every update which I don't need. For example views, rating and other unnecessary information which I don't need in cache table. At the end of the month I will have the all products ids in changes table. I want to record only changes that are approved by staff. Rather I want to ensure that the cache table contains accurate data.
@jason 2 possibilities a) check if one of the columns you're interested in changes its value or b) split your table in half. One for data that's more or less static and another one for data that changes frequently. This way, you might already fix the performance issues you have right now (faster reads due to higher cache hit rates and probably faster writes as well).
@jason if I wasn't clear enough: I'd strongly recommend to split your table before you start denormalizing your data. if the problem persists, you can still evaluate throwing in some caches.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.