Sunday, January 29, 2012

Review on MYSQL Locks

We have seen so many times locked tables in show process list, obviously it's panic to see locked tables as such cases finally stop your application causes Apache's threads count huge number, In this post we discuss about type of locks and how can we avoid them. Before going start  I’m happy to say I’m not a MySQL expert in every aspect of MySQL, specifically internals where I have not had the detailed time to read the code, and understanding all internal workings.

The main important thing is instrumenting and monitor your mysql in better way.
  1. Slow Query Log
  2. General Query Log
  3. Binary Log
  4. Connectors
  5. Process-list
  6. Engine status
  7. Status variables

There are two classes of locks:

Read locks as the name suggests, are locks taken on resources to provide consistent read by blocking writes to that resource. The read locks are also known as shared locks because many clients can acquire read locks on the same resource simultaneously. The other thing to know is that, while read lock is in place, no write lock can be acquired on that resource.
Following is how MySQL will grant read lock:

if there is no write lock set on the resource grant the read lock immediately
if there is a write lock on the resource, put the lock in the read lock queue

Write lock is taken on a resource when it needs to be modified. There can be only one write lock on a resource at a given time. While a write lock is held on a resource, all other read lock requests must also wait.
Following is how MySQL will grant write lock:

if there is no (read/write) lock on the resource, grant the write lock immediately
if there is a lock on the resource, put the write lock in the write lock queue

Write lock has a higher priority than read lock. When a resource is unlocked, and if there are lock requests waiting in the queue, then the lock is granted in the following manner:

grant the lock first to the request waiting in the write lock queue
if there is no lock request for the resource in the write lock queue, then grant the lock to the first request in the read lock queue

Locking in MyISAM

MyISAM supports table-level locking and does not support row-level locking. Because of this behavior, MyISAM is typically well suited for applications which have a high percentage of reads as compared to writes. But because MyISAM supports locking at a high level, the memory needed for locking is typically less, say when you compare it to row-level locks.

However, MyISAM does allow concurrent INSERTs in some situations!

If there are no holes in MyISAM table, then INSERTs happen at the end of the table, and in such a situation INSERTs and SELECTs are allowed to happen concurrently on the same table, with INSERT needing no write locks. But there is a caveat, if there are many INSERT requests, then the INSERTs are done serially, which implies that only one INSERT can execute concurrently with SELECTs on the same table. Now what exactly are hole? Holes are produced when rows are deleted
and/or  updated in the middle of a MyISAM table.
Locking in InnoDB
InnoDB unlike MyISAM supports both table-level locking as well as row-level locking, which allows for a more fine-grained control over the resource to be locked. Row-level locking allows InnoDB to be extremely efficient in case of heavy write-load.
InnoDB also has a very nice feature MVCC, which allows for non-locking consistent reads. This is achieved by having different snapshots of data available to different transactions. However, if you have the transaction isolation level  set to SERIALIZABLE then plain SELECTs are automatically converted to locking SELECTs.

However, AUTO_INCREMENT columns need table-level locks.


INSERTs into AUTO_INCREMENT column need table-level lock for the duration of the INSERT statement. This is so that the INSERTs are safe for statement-based replication. However, as of MySQL >= 5.1, which also introduced row-based replication, this behavior is configurable using the variable innodb_autoinc_lock_mode, but you still need to be using  row-based replication for the INSERTs to be safe for replication. Set the value of innodb_autoinc_lock_mode to 2, and
then INSERT like statements will not take any table-level locks.

Imagine The Scenario

0.00 seconds A select query (ie. read-only) accesses the table, it will take around 2 seconds to complete.
0.01 seconds Another select query accesses the table, it takes no time and will complete in an instant as it can run in  parallel.
0.02 seconds An insert, delete or update query (ie. write) attempts to write to that very same table before the first select has completed.
0.03 seconds A select query comes in, again waiting for the first select to complete.
0.04 seconds Another select query comes in, again waiting for the first select to complete.
0.05 seconds Yet another select query comes in, again waiting for the first select to complete.
1.99 seconds (~1000 queries later) Yet another select query comes in, again waiting for the first select to complete. And so on and so forth.

The queries in red are blocked and have to wait for the first select to complete before they can be executed. The selects in red are blocked by the write operation in orange which is in turn blocked by
the long-running select in green. This is what causes the table_locks_waited value to grow.

The following items describe some ways to avoid or reduce contention caused by table locking:

  • Try to get the SELECT statements to run faster so that they lock tables for a shorter time. You might have to create some summary tables to do this.
  • Start mysqld with --low-priority-updates. For storage engines that use only table-level locking (such as MyISAM, MEMORY,and MERGE), this gives all statements that update (modify) a table lower priority than SELECT statements. In this case,the second SELECT statement in the preceding scenario
  • would execute before the UPDATE statement, and would not need to wait for the first SELECT to finish.
  • To specify that all updates issued in a specific connection should be done with low priority, set the
  • low_priority_updates server system variable equal to 1.
  • To give a specific INSERT, UPDATE, or DELETE statement lower priority, use the LOW_PRIORITY attribute.
  • To give a specific SELECT statement higher priority, use the HIGH_PRIORITY attribute. 
  • Start mysqld with a low value for the max_write_lock_count system variable to force MySQL to temporarily elevate the priority of all SELECT statements that are waiting for a table
  • after a specific number of inserts to the table occur. This permits READ locks after a certain number of WRITE locks.
  • If you have problems with INSERT combined with SELECT, consider switching to MyISAM tables, which support concurrent SELECT and INSERT statements.
  • If you mix inserts and deletes on the same table, INSERT DELAYED may be of great help.
  • If you have problems with mixed SELECT and DELETE statements, the LIMIT option to DELETE may help.
  • Using SQL_BUFFER_RESULT with SELECT statements can help to make the duration of table locks shorter.
  • You could change the locking code in mysys/thr_lock.c to use a single queue. In this case, write locks and read locks would have the same priority, which might help some applications.

Friday, January 27, 2012

Instant’s Auto complete Suggestions Review

Over the past couple of years, auto-complete has become famous over the web. Facebook, YouTube, Google, Bing, MSDN, LinkedIn,Apple and other websites try to complete text phrases as soon as we typing.
Auto-complete definitely makes for a nice user experience, but it can be a challenge to implement efficiently. In many cases, an efficient implementation requires the use of interesting algorithms and data structures. In this blog post, I will describe all about it.
Basic Idea
Basically Auto complete is combination of HTML,JS,CSS and data source. It could be JS string or a file or data fetching from db connection. we usually fetch data from source on the basis of JS events like onkeyup etc. 
Engineering Challenges
however we notice that Facebook or google's auto suggestions are working very fast or with good speed.  I encountered first time with speed issue when i worked for Actually we had planed to use Auto suggester frequently to show master data in drop downs. see registration page That time i implemented incremental caching feature at browser side but later i found that still i don't touch facebook and google standards.
However now some open sourse caching plugins are avalable by default. see and
Typically following are the key points which play game in efficient manner.
  1.  Data structure to retrieve big data
  2.  client side caching from memcache indexed javascript array (for example first letter as index, so friends['a'] = [andrey, albert] )
  3. client side storage techniques (normally all modern browser support it )
  4.  warm up caches as early as it can
  5. build an index of names -> dom elements, do dom manipulation offline and attach the results with only people that match the searched term

Data structure to retrieve big data
Ternary search trees Trie is used to retrive data for auto-complete lookup but needs too much memory. Solution for it PATRICIA (Practical Algorithm to Retrieve Information Coded in Alphanumeric).
Both are data structues so you can read them about in details separately. 
Other Challenges

Suggestions Can Vary By Region & Language

Not everyone sees the same suggestions. For example if i am in delhi then i will see delhi based results with high priority.

Previously Searched Suggestions

Big demand in suggestor product to track how are things going.

How Suggestions Are Ranked

Big question if you plan to put suggestor at home page.Popularity is a factor, but some less popular searches might be shown above more popular ones.

Deduplicating & Spelling Corrections

Its basic functionality that everyone dreams in auto suggestor :P


If there are terms that suddenly spike in popularity in the short term, these can appear as suggestions, even if they haven’t gained long-term popularity.

Tuesday, January 24, 2012

Dev team with no QA resources

Right now in my organization, there are completely independent roles for QA and Dev. QA vs Development ratio is always an open issue, specially in crucial prod releases :P

In my office, we are thinking about DEV without QA i.e. push releases on prod without/partial help of QA. 

Overall idea is avoid QA using either TDD, peer programming or peer code review etc. 

Here are few key notes ( my worries :P however they are very subjective, but based on experience)

  1. Can developers really be effective testers, with their intimate knowledge of the application?
  2. Can developers could be a good "bug finders" rather than engineers.
  3. People are often blind to their own shortcomings or mistakes, and developed code is best validated when tested by third parties. Of course, proper monitoring and management of this process could mitigate the process...but it's a concern.
  4. Can developer think high visibility prospective. Sometimes a customer relationship can be impaired by simple usability flaws -- a somewhat common mistake for an immature developer.
  5. Increases the chance for rework
  6. Leads time involved more on testing application or module

However following are few parameters which helps.

How much unit testing the developers are doing. The more they do, the less QA dependency.

How much the developers are writing from scratch versus leveraging existing libraries. If there is a lot of pre-existing code in use then its disappointing call.

How dynamic the development is. If you are writing a UI where relatively small developer tweaks cause a large change to the testable surface then it would be pain.

How mission critical the feature is. It need a lot more testing because bugs are hard to fix in the field and very bad when they happen.

understanding domain driven design

What is DDD ? Actually I've always developed code in a SOA type of way. I prefer overall architecture as SOA and I know SOA and DDD are two conflicting styles :P

Anyway today i got chance to review code based on DDD. So I have started reading DDD. Still I am unable to completely grasp the concept of Entity vs Value objects. ;-)

I am interested in

  1. An illustrative Domain Model
  2. Repositories
  3. Use of Domain/Application Services
  4. Value Objects
  5. Aggregate Roots

If you can help then it would be appreciable. 

Well i am looking for how fit SOA and DDD together ? well we can create a domain model that encapsulates domain concepts, and expose entry points into that model via services.
Maybe the problem we can face is that create services like "UpdateOrder" which takes an order entity and tries to update this in a new session?
We should try to avoid that kind of services and instead break those down to smaller atomic commands.
Each command can be exposed as an operation, or you could have a single service operation that receives groups of commands and then delegate those to command handlers.
Well you can read more from web .. here are few good links.
Finally summarizing all important key role of DDD.
  • Only use one level of indentation per method
  • Don't use the else keyword
  • Wrap all primitives and strings
  • Use only one dot per line
  • Unless explicitly using a fluent interface over an OO model
  • Don't abbreviate
  • Keep all entities small (no more than 50 lines)
  • Don't use a class with more than two instance variables
  • Use first class collections
  • Don't use any getters/setters/properties
  • On a class, it's fine to do this with a data object, such as a DTO
  • Don't give a data object behavior
  • But don't build an illogical model, if a data object needs behavior, make a class
Ending my post with following :P enjoy :-)
I was once asked what my personal motto is. Funnily enough, I had never really decided on one, but I guess it would be:-
“Choose a job you love and you will never have to work a day in your life”
But if you really dig, you’ll find that this line describes best:
“You aren’t remembered for doing what is expected of you”