well , we have seen so many articles on Mysql performance and optimization. In this article i am trying to summarized all key points. hope this will help to developers and make life easier. :)
How MySQL stores data ?
Databases are stored as directories.
Tables are stored as files.
Columns are stored in the files in dynamic length or fixed size format. In BDB tables the data is stored in pages.
Memory-based tables are supported.
Databases and tables can be symbolically linked from different disks.
On Windows MySQL supports internal symbolic links to databases with .sym files.
MySQL table types
HEAP tables; Fixed row size tables that are only stored in memory and indexed with a HASH index.
ISAM tables; The old B-tree table format in MySQL 3.22.
MyISAM tables; New version of the ISAM tables with a lot of extensions:
Index on NULL columns.
Less fragmentation for dynamic-size rows than ISAM tables.
Support for big files.
Better index compression.
Better key statistics.
Better and faster auto_increment handling.
Berkeley DB (BDB) tables from Sleepycat: Transaction-safe (with BEGIN WORK / COMMIT | ROLLBACK).
MySQL row types (only relevant for ISAM/MyISAM tables)
MySQL will create the table in fixed size table format if all columns are of fixed length format (no VARCHAR, BLOB or TEXT columns). If not, the table is created in dynamic-size format.
Fixed-size format is much faster and more secure than the dynamic format.
The dynamic-size row format normally takes up less space but may be fragmented over time if the table is updated a lot.
In some cases it's worth it to move all VARCHAR, BLOB and TEXT columns to another table just to get more speed on the main table.
With myisampack (pack_isam for ISAM) one can create a read-only, packed table. This minimizes disk usage which is very nice when using slow disks. The packed tables are perfect to use on log tables which one will not update anymore.
MySQL caches (shared between all threads, allocated once)
Key cache ; key_buffer_size, default 8M
Table cache ; table_cache, default 64
Thread cache ; thread_cache_size, default 0.
Hostname cache ; Changeable at compile time, default 128.
Memory mapped tables ; Currently only used for compressed tables.
How the MySQL table cache works
Each open instance of a MyISAM table uses an index file and a data file. If a table is used by two threads or used twice in the same query, MyISAM will share the index file but will open another instance of the data file.
The cache will temporarily grow larger than the table cache size if all tables in the cache are in use. If this happens, the next table that is released will be closed.
You can check if your table cache is too small by checking the mysqld variable Opened_tables. If this value is high you should increase your table cache!
Optimizing hardware for MySQL
- If you need big tables ( > 2G), you should consider using 64 bit hardware like Alpha, Sparc or the upcoming IA64. As MySQL uses a lot of 64 bit integers internally, 64 bit CPUs will give much better performance.
- For large databases, the optimization order is normally RAM, Fast disks, CPU power.
More RAM can speed up key updates by keeping most of the used key pages in RAM.
- If you are not using transaction-safe tables or have big disks and want to avoid long file checks, a UPS is good idea to be able to take the system down nicely in case of a power failure.
- For systems where the database is on a dedicated server, one should look at 1G Ethernet. Latency is as important as throughput.
Have one dedicated disk for the system, programs and for temporary files. If you do very many changes, put the update logs and transactions logs on dedicated disks.
Low seek time is important for the database disk; For big tables you can estimate that you will need: log(row_count) / log(index_block_length/3*2/(key_length + data_ptr_length))+1 seeks to find a row. For a table with 500,000 rows indexing a medium int: log(500,000)/log(1024/3*2/(3+4)) +1 = 4 seeks The above index would require: 500,000 * 7 * 3/2 = 5.2M. In real life, most of the blocks will be buffered, so probably only 1-2 seeks are needed.
For writes you will need (as above) 4 seek requests, however, to find where to place the new key, and normally 2 seeks to update the index and write the row.
For REALLY big databases, your application will be bound by the speed of your disk seeks, which increase by N log N as you get more data.
Split databases and tables over different disks. In MySQL you can use symbolic links for this.
Striping disks (RAID 0) will increase both read and write throughput.
Striping with mirroring (RAID 0+1) will give you safety and increase the read speed. Write speed will be slightly lower.
Don't use mirroring or RAID (except RAID 0) on the disk for temporary files or for data that can be easily re-generated..
On Linux use hdparm -m16 -d1 on the disks on boot to enable reading/writing of multiple sectors at a time, and DMA. This may increase the response time by 5-50 %.
On Linux, mount the disks with async (default) and noatime.
For some specific application, one may want to have a ram disk for some very specific tables, but normally this is not needed.
If you have memory problems, add more RAM instead or configure your system to use less memory.
Don't use NFS disks for data (you will have problems with NFS locking).
Increase number of open files for system and for the SQL server. (add ulimit -n # in the safe_mysqld script).
Increase the number of processes and threads for the system.
If you have relatively few big tables, tell your file system to not break up the file on different cylinders (Solaris).
Use file systems that support big files (Solaris).
Choose which file system to use; Reiserfs on Linux is very fast for open, read and write. File checks take just a couple of seconds.
If possible, run OPTIMIZE table once in a while. This is especially important on variable size rows that are updated a lot.
Update the key distribution statistics in your tables once in a while with myisamchk -a; Remember to take down MySQL before doing this!
If you get fragmented files, it may be worth it to copy all files to another disk, clear the old disk and copy the files back.
If you have problems, check your tables with myisamchk or CHECK table.
Monitor MySQL status with: mysqladmin -i10 processlist extended-status
With the MySQL GUI client you can monitor the process list and the status in different windows.
Use mysqladmin debug to get information about locks and performance.
Use SQL for the things it's good at, and do other things in your application. Use the SQL server to:
Find rows based on WHERE clause.
Don't use an SQL server:
To validate data (like date)
As a calculator
Use keys wisely.
Keys are good for searches, but bad for inserts / updates of key columns.
Keep by data in the 3rd normal database form, but don't be afraid of duplicating information or creating summary tables if you need more speed.
Instead of doing a lot of GROUP BYs on a big table, create summary tables of the big table and query this instead.
UPDATE table set count=count+1 where key_column=constant is very fast!
For log tables, it's probably better to generate summary tables from them once in a while than try to keep the summary tables live.
Take advantage of default values on INSERT.
MySQL has a rich set of different types. You should try to use the most efficient type for each column.
The ANALYSE procedure can help you find the optimal types for a table: SELECT * FROM table_name PROCEDURE ANALYSE()
Use NOT NULL for columns which will not store null values. This is particularly important for columns which you index.
Change your ISAM tables to MyISAM.
If possible, create your tables with a fixed table format.
Don't create indexes you are not going to use.
Use the fact that MySQL can search on a prefix of an index; If you have and INDEX (a,b), you don't need an index on (a).
Instead of creating an index on long CHAR/VARCHAR column, index just a prefix of the column to save space. CREATE TABLE table_name (hostname CHAR(255) not null, index(hostname(10)))
Use the most efficient table type for each table.
Columns with identical information in different tables should be declared identically and have identical names.
MySQL extensions / optimization that gives you speed
Use the optimal table type (HEAP, MyISAM, or BDB tables).
Use optimal columns for your data.
Use fixed row size if possible.
Use the different lock types (SELECT HIGH_PRIORITY, INSERT LOW_PRIORITY)
REPLACE (REPLACE INTO table_name VALUES (...))
LOAD DATA INFILE / LOAD_FILE()
Use multi-row INSERT to insert many rows at a time.
SELECT INTO OUTFILE
LEFT JOIN, STRAIGHT JOIN
LEFT JOIN combined with IS NULL
ORDER BY can use keys in some cases.
If you only query columns that are in one index, only the index tree will be used to resolve the query.
Joins are normally faster than subselects (this is true for most SQL servers).
SELECT * from table1 WHERE a > 10 LIMIT 10,20
DELETE * from table1 WHERE a > 10 LIMIT 10
foo IN (list of constants) is very optimized.
INSERT and SELECT can run concurrently.
UDF functions that can be loaded into a running server.
Compressed read-only tables.
CREATE TEMPORARY TABLE
CREATE TABLE .. SELECT
MyISAM tables with RAID option to split a file over many files to get over the 2G limit on some file system.
When MySQL doesn't use an index
Indexes are NOT used if MySQL can calculate that it will probably be faster to scan the whole table. For example if key_part1 is evenly distributed between 1 and 100, it's not good to use an index in the following query:
SELECT * FROM table_name where key_part1 > 1 and key_part1 <>
If you are using HEAP tables and you don't search on all key parts with =
When you use ORDER BY on a HEAP table
If you are not using the first key part
SELECT * FROM table_name WHERE key_part2=1
If you are using LIKE that starts with a wildcard
SELECT * FROM table_name WHERE key_part1 LIKE '%jani%'
When you search on one index and do an ORDER BY on another
SELECT * from table_name WHERE key_part1 = # ORDER BY key2
Things to avoid with MySQL
Updates to a table or INSERT on a table with deleted rows, combined with SELECTS that take a long time.
HAVING on things you can have in a WHERE clause.
JOINS without using keys or keys which are not unique enough.
JOINS on columns that have different column types.
Using HEAP tables when not using a full key match with =
Forgetting a WHERE clause with UPDATE or DELETE in the MySQL monitor. If you tend to do this, use the --i-am-a-dummy option to the mysq client.
Tricks to give MySQL more information to solve things better
SELECT SQL_BUFFER_RESULTS ...
Will force MySQL to make a temporary result set. As soon as the temporary set is done, all locks on the tables are released. This can help when you get a problem with table locks or when it takes a long time to transfer the result to the client.
SELECT SQL_SMALL_RESULT ... GROUP BY ...
To tell the optimizer that the result set will only contain a few rows.
SELECT SQL_BIG_RESULT ... GROUP BY ...
To tell the optimizer that the result set will contain many rows.
SELECT STRAIGHT_JOIN ...
Forces the optimizer to join the tables in the order in which they are listed in the FROM clause.
SELECT ... FROM table_name [USE INDEX (index_list) | IGNORE INDEX (index_list)] table_name2
Forces MySQL to use/ignore the listed indexes.
Use short primary keys. Use numbers, not strings, when joining tables.
When using multi-part keys, the first part should be the most-used key.
When in doubt, use columns with more duplicates first to get better key compression.
If you run the client and MySQL server on the same machine, use sockets instead of TCP/IP when connecting to MySQL (this can give you up to a 7.5 % improvement). You can do this by specifying no hostname or localhost when connecting to the MySQL server.
Use --skip-locking (default on some OSes) if possible. This will turn off external locking and will give better performance.
Use application-level hashed values instead of using long keys:
SELECT * FROM table_name WHERE hash=MD5(concat(col1,col2)) AND
col_1='constant' AND col_2='constant'
Store BLOB's that you need to access as files in files. Store only the file name in the database.
It is faster to remove all rows than to remove a large part of the rows.
If SQL is not fast enough, take a look at the lower level interfaces to access the data.