Thursday, January 6, 2011

A query optimizer practice about "force index"

Two tables:
table keywords (keyword_id int, keyword varchar(255));
table adwords (keyword_id int, local_traffic, global_traffic);

mysql> explain select * from keywords left join adwords on keywords.keyword_id = adwords.keyword_id where keyword like 'car%';
+----+-------------+----------+-------+---------------+----------+---------+------------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+----------+---------+------------------------------+-------+--------------------------+
| 1 | SIMPLE | keywords | range | ukeyword | ukeyword | 152 | NULL | 92378 | Using where; Using index |
| 1 | SIMPLE | adwords | ref | PRIMARY | PRIMARY | 8 | keywords.keywords.keyword_id | 10735 | |
+----+-------------+----------+-------+---------------+----------+---------+------------------------------+-------+--------------------------+

mysql> explain select * from keywords left join adwords on keywords.keyword_id = adwords.keyword_id where keyword like 'car%' and local_traffic > 10000;
+----+-------------+----------+--------+------------------+---------+---------+-----------------------------+---------+-------------+
| id | select_type | table    | type   | possible_keys    | key     | key_len | ref                         | rows    | Extra       |
+----+-------------+----------+--------+------------------+---------+---------+-----------------------------+---------+-------------+
|  1 | SIMPLE      | adwords  | ALL    | PRIMARY          | NULL    | NULL    | NULL                        | 1073581 | Using where |
|  1 | SIMPLE      | keywords | eq_ref | PRIMARY,ukeyword | PRIMARY | 8       | keywords.adwords.keyword_id |       1 | Using where |
+----+-------------+----------+--------+------------------+---------+---------+-----------------------------+---------+-------------+

After using local_traffic in the query, seems Mysql doesn't use the proper index (ukeyword) first as expected. Make a change:

mysql> explain select * from keywords force index(ukeyword) left join adwords on keywords.keyword_id = adwords.keyword_id where keyword like 'car%' and local_traffic > 10000;
+----+-------------+----------+-------+---------------+----------+---------+------------------------------+-------+--------------------------+
| id | select_type | table    | type  | possible_keys | key      | key_len | ref                          | rows  | Extra                    |
+----+-------------+----------+-------+---------------+----------+---------+------------------------------+-------+--------------------------+
|  1 | SIMPLE      | keywords | range | ukeyword      | ukeyword | 152     | NULL                         | 92378 | Using where; Using index |
|  1 | SIMPLE      | adwords  | ref   | PRIMARY       | PRIMARY  | 8       | keywords.keywords.keyword_id | 10735 | Using where              |
+----+-------------+----------+-------+---------------+----------+---------+------------------------------+-------+--------------------------+


You see, we are using "force index(xx)" to tell Mysql how to use index to execute the query, the performance has a big jump!

Mysql Innodb performance tuning - innodb_flush_log_at_trx_commit

The official definition of innodb_flush_log_at_trx_commit is:

If the value of innodb_flush_log_at_trx_commit is 0, the log buffer is written out to the log file once per second and the flush to disk operation is performed on the log file, but nothing is done at a transaction commit. When the value is 1, the log buffer is written out to the log file at each transaction commit and the flush to disk operation is performed on the log file. When the value is 2, the log buffer is written out to the file at each commit, but the flush to disk operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues.

Typically, most of Mysql experts ask us to set the value to 1, so we won't lose any data when system is crashed. But after reading more articles, I found the value "0" might be more suitable for me. The bad condition is I will lose 1 second data, but I will get lot of performance improvement. I don't care about the 1 second data losing if the select and insert actions can be faster than before. If you set the value to 0, please remember to monitor innodb_log_buffer_size. The default value is 1M, you might need to increase the buffer size when more data comes into buffer.

mysql> show status like 'innodb_log%';


+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| Innodb_log_waits | 0 |
| Innodb_log_write_requests | 0 |
| Innodb_log_writes | 1 |
+---------------------------+-------+

If the Innodb_log_waits is close to Innodb_log_writes, you should increase innodb_log_buffer_size, for most time, 8M is big enough to handle all normal conditions.

Mysql Innodb performance tuning - Innodb_buffer_pool_size

Before tuning, Innodb_buffer_pool_size is set to default value: 8M
mysql> show status like 'Innodb_buffer_pool_%';

+-----------------------------------+-----------+
| Variable_name                     | Value     |
+-----------------------------------+-----------+
| Innodb_buffer_pool_pages_data     | 511       |
| Innodb_buffer_pool_pages_dirty    | 0         |
| Innodb_buffer_pool_pages_flushed  | 2542331   |
| Innodb_buffer_pool_pages_free     | 0         |
| Innodb_buffer_pool_pages_misc     | 1         |
| Innodb_buffer_pool_pages_total    | 512       |
| Innodb_buffer_pool_read_ahead_rnd | 4777      |
| Innodb_buffer_pool_read_ahead_seq | 7022      |
| Innodb_buffer_pool_read_requests  | 115968607 |
| Innodb_buffer_pool_reads          | 5413710   |
| Innodb_buffer_pool_wait_free      | 0         |
| Innodb_buffer_pool_write_requests | 43308713  |
+-----------------------------------+-----------+


Innodb_buffer_pool_pages_free is the number of unused data pages.
Innodb_buffer_pool_pages_total is the total number of pages.
Innodb_buffer_pool_pages_data is the total number of used data pages

For current settings, Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_Total = 0, that indicates Innodb_buffer_pool_size should have a larger value. After set Innodb_buffer_pool_size t0 500M, try again

mysql> flush status;
wait for a whole to let Mysql process some SQL actions
mysql> show status like 'Innodb_buffer_pool_%';
+-----------------------------------+--------+
| Variable_name                     | Value  |
+-----------------------------------+--------+
| Innodb_buffer_pool_pages_data     | 4269   |
| Innodb_buffer_pool_pages_dirty    | 0      |
| Innodb_buffer_pool_pages_flushed  | 3391   |
| Innodb_buffer_pool_pages_free     | 28498  |
| Innodb_buffer_pool_pages_misc     | 1      |
| Innodb_buffer_pool_pages_total    | 32768  |
| Innodb_buffer_pool_read_ahead_rnd | 1      |
| Innodb_buffer_pool_read_ahead_seq | 0      |
| Innodb_buffer_pool_read_requests  | 269789 |
| Innodb_buffer_pool_reads          | 4177   |
| Innodb_buffer_pool_wait_free      | 0      |
| Innodb_buffer_pool_write_requests | 105595 |
+-----------------------------------+--------+

Now, Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_Total = 86%. It indicates the buffer pool has been over allocated, we should decrease it. After doing more testing, the size should be exactly match your current usage.

Wednesday, January 5, 2011

Php Json-RPC source code (client side)

<?php

/**
* Author: James Hu
* Email: techman@navime.net
*/
class RpcClient {
  private $server_url;

  private $passcode;

  public function RpcClient($server_url, $passcode = '') {
    $this->__construct($server_url, $passcode);
  }

  public function __construct($server_url, $passcode = '') {
    $this->server_url = $server_url;
    $this->set_passcode($passcode);
  }

  public function set_passcode($passcode) {
    $this->passcode = $passcode;
  }

  public function call($func) {
    $url = $this->server_url . "?call=$func";
    
    $args = func_get_args();
    array_shift($args);
    $args = join(',', $args);

    $url .= "¶ms=" . $args;

    if(!empty($this->passcode)) $url .= "&passcode=" . $this->passcode;

    $response = file_get_contents($url);
    if(!$response) throw new Exception('no response from server');

    $data = @json_decode($response, true);
    if(!$data || !isset($data['ok']) || !isset($data['response'])) 
      throw new Exception("invalid response from server: $response");

    if($data['ok'] || $data['ok'] == 'true') return $data['response'];
    else throw new Exception($data['response']);
  }
};

?>

Php Json-RPC source code (server side)

<?php
/**
* Author: James Hu
*Email: techman@navime.net
*/
 
require_once(dirname(__FILE__) . '/debug.php');
 
Error::$ExceptionEnabled = true;
 
class RpcServer {
  //
  //
  private $passcode;
 
  //
  //
  private $public_funcs;

  //
  // 
  public function RpcServer() {
    $this->__construct();
  }
 
  //
  //
  public function __construct() {
    $this->passcode = md5('sunday');
    $this->public_funcs = array('rpc_echo', 'rpc_version');
  }
 
  public function run() {
    try {
      $this->request_handler();
    } catch (Exception $e) {
      // catch all exceptions
      //
      $this->output(false, $e->getMessage());
    }
  }
 
  public function set_passcode($passcode) {
    $this->passcode = md5($passcode);
  }
 
  public function set_public_func($func) {
    if(!preg_match('/^rpc_/i', $func)) $func = "rpc_$func";
    $this->public_funcs[] = $func;
  }
 
  private function request_handler() {
    //
    //
    $this->load_modules();
 
    // more flexible to specify the function name
    //
    $pnames = array('call', 'do', 'method');
 
    // iterate the possible function name
    //
    foreach($pnames as $pname) {
      $func = @$_REQUEST[$pname];
      if(!empty($func)) break;
    }
 
    // check if the rpc call exists
    //
    if(empty($func) || !function_exists("rpc_$func")) {
      throw new Exception("Rpc function does not exist");
    }
 
    $func = "rpc_$func";
 
    if(!in_array($func, $this->public_funcs)) {
      $this->authenticate();
    }
 
    $params = @$_REQUEST['params'];
 
    // extract raw params information
    //
    if(empty($params)) {
      $params = array();
    } else {
      $params = preg_split('/,/', $params);
 
      foreach($params as $k => $param) {
        $params[$k] = $this->json_decode($param);
      }
    }
 
    // get function's parameters
    //
    $funcRef = new ReflectionFunction($func);
    $paramsRef = $funcRef->getParameters();
 
    for($i = 0; $i < count($paramsRef); ++$i) {
      $paramName = $paramsRef[$i]->name;
      if(isset($_REQUEST[$paramName])) {
        for($j = 0; $j < $i; ++$j) {
          if(!isset($params[$j])) $params[$j] = false;
        }
        $params[$i] = $this->json_decode($_REQUEST[$paramName]);
      }
    }
 
    $result = @call_user_func_array($func, $params);
    $this->output(true, $result);
  }
 
  private function json_decode($v) {
    if(preg_match('/^\[.*?\]$/i', $v)) {
      $p = json_decode($v, true);
      if($p) return $p;
    }
 
    return $v;
  }
 
  private function load_modules() {
    if(!file_exists('./modules')) return;
 
    $dir = dir('./modules');
    while(false !== ($entry = $dir->read())) {
      if(preg_match('/.*\.php$/i', $entry) && $entry != 'index.php') {
        require_once("./modules/$entry");
      }
    }
  }
 
  private function output($error, $result) {
    if(@$_REQUEST['format'] == 'raw') {
      die($result);
    } else {
      die(json_encode(array('ok' => $error, 'response' => $result)));
    }
  }
 
  private function authenticate() {
    $passcode = md5(@$_REQUEST['passcode']);
    if($passcode != $this->passcode) {
      throw new Exception('access is denied');
    }
  }
};
 
function rpc_echo() {
  return func_get_args();
}
 
function rpc_version() {
  return "rpc server v1.1";
}
 
?>

procmail memory issue on VPS

I frequently got "can't allocate more memory" issue on VPS recently, I even can't use "ps" and "top" command to check any process has heavy memory usage because this memory issue.

I checked the system services when the issue was gone temporarily, there is a strange service: procmail, the memory usage is higher than others. The definition from wikipedia is:
Procmail is a mail delivery agent (MDA) capable of sorting incoming mail into various directories and filtering out spam messages. Procmail is widely used

After I killed this process, system will launch it again. I don't know which service is running this command, my solution is to set 0600 mode to /usr/bin/procmail and then kill the running process.
#> chmod 600 /usr/bin/procmail

The memory problem is gone now. I guess the problem is I have lots of cron services running, these services generated lots of messages to root user's mailbox. So, procmail has to allocate huge memory to process these messages.

sphinx plugin compiling issue for mysql about _mysql_plugin_interface_version_

I know many people are getting following problem when loading sphinx plugin to mysql.

mysql> install plugin sphinx soname 'sphinx.so';
ERROR 1127 (HY000):
Can't find symbol '_mysql_plugin_interface_version_' in library

Looks like there are only two threads discussing this issue from google search and there is no proper solution for it. I happened to read one article:

Run Configure, you can use the ” –with-plugins=sphinx ” switch if you want sphinx built static into your mysql binary.

Is this problem caused by using --with-plugins=sphinx option when running ./configure? The answer is yes! Everything works well after removing --with-plugins=sphinx option from ./configure!!

My steps to build sphinx plugin:

#> cp -R SPHINX_SRC_PATH/mysqlse MYSQL_SRC_PATH/storage/sphinx

#> cd MYSQL_SRC_PATH

#> sh BUILD/autorun.sh

#> ./configure (no --with-plugins=sphinx option if you just want to build ha_sphinx.so)

#> cd storage/sphinx

#> make

#> echo "show variables like '%plugin%' | mysql
Variable_name   Value
plugin_dir      /usr/lib/mysql/plugin

#> copy ha_sphinx.so /usr/lib/mysql/plugin

#> copy sphinx.so /usr/lib/mysql/plugin

#> mysql
mysql> install plugin sphinx soname 'ha_sphinx.so';
mysql> show plugins;
+------------+----------+----------------+--------------+---------+
| Name       | Status   | Type           | Library      | License |
+------------+----------+----------------+--------------+---------+
...
| SPHINX     | ACTIVE   | STORAGE ENGINE | ha_sphinx.so | GPL     |
+------------+----------+----------------+--------------+---------+

Actually, no need to run the patch like sphinx-xxx-.diff and no need to build all mysql source. The build process is very simple.