Ravi Raj's Blog: Mysql

Showing posts with label Mysql. Show all posts

Monday, October 13, 2014

mysql prepared statements caching

PreparedStatement is cached inside the J2EE server connection pool manager. The J2EE server keeps a list of prepared statements for each database connection in the pool. When an application calls prepareStatement on a connection, the application server checks if that statement was previously prepared. If it was, the PreparedStatement object will be in the cache and this will be returned to the application. If not, the call is passed to the jdbc driver and the query/preparedstatement object is added in that connections cache.

We need a cache per connection because that's the way jdbc drivers work. Any preparedstatements returned are specific to that connection.

If we want to take advantage of this cache,We need to use parameterized queries so that they will match ones already prepared in the cache. Most application servers will allow you to tune the size of this prepared statement cache.

PHP/MYSQL

So there are good reasons to use prepared statements:

Save on query parsing
Save on data conversion and copying
Avoid SQL Injection
Save memory on handling blobs

There are also drawback of using prepared statements:

Query cache does not work
Extra server round trip required if statement used only once
Not all statements can be prepared. So you can’t use prepared API exclusively you’ll need to fall back to normal API for some statements
Newer and sometimes buggy code. I had a lot of problems with PHP prepared statements. It is getting better but still it is less mature than standard API
You can’t use placeholders in place of all identifiers. For example you can’t use them for table name. In certain version it even does not work for LIMIT boundaries
Inconvenient list handling. Unlike in for example PEAR emulated prepard statements there is no nice way to pass list of values to IN
Harder tracing. Logs were now fixed to include full statement text not only “Execute” but in SHOW INNODB STATUS you would still see statements without actual values – quite inconvenient for analyses.

For a prepared statement executed via the binary protocol, comparison with statements in the query cache is based on the text of the statement after expansion of ? parameter markers. The statement is compared only with other cached statements that were executed via the binary protocol. That is, for query cache purposes, statements issued via the binary protocol are distinct from statements issued via the text protocol.

We can't re-use a mysql statement prepared during a previous request in php.

If PHP application uses connection pooling to the database, and the database caches prepared statements, then yes, the caching will persist between pages. If the prepared statement caching is done by the client library, then this is more nebulous.

Reusing the same variable name in PHP won't invalidate the MySQL prepare "cache".

$stmt = $dbh->prepare("SELECT column_A FROM Table1 WHERE id=?");
$stmt->bindValue(1, $id, PDO::PARAM_INT);
$stmt->execute();

$stmt = $dbh->prepare("UPDATE Table2 SET column_B=? WHERE column_A=?");
$stmt->bindValue(1, $name);
$stmt->bindValue(2, $column_A);
$stmt->execute();

MYSQL persistent connection

Mysql server offer statement caching.

mysql> PREPARE stmt_name FROM "SELECT name FROM Country WHERE code = ?";
Query OK, 0 rows affected (0.09 sec)
Statement prepared

mysql> SET @test_parm = "FIN";
Query OK, 0 rows affected (0.00 sec)

mysql> EXECUTE stmt_name USING @test_parm;
+---------+
| name |
+---------+
| Finland |
+---------+
1 row in set (0.03 sec)

mysql> DEALLOCATE PREPARE stmt_name;

Query OK, 0 rows affected (0.00 sec)

When a request is served php "cleans" the instance and frees resources and other variables. This is done in several steps. Since apache keeps the process alive after a request not all steps are executed and not all memory is freed. There is e.g. EG(persistent_list) which is used by mysql_pconnect(), pg_pconnect(), ... This list isn't emptied between requests as long as the process keeps alive (could be, depending on the actual implementation, but that would defy the purpose of EG(persistent_list)). If you use persistent connections your script might get a "re-used" connection established during a previous request.
To (re-)use a prepared statement directly you need the identifier for that statement (and that connection). When using (php-)postgresql this is simply a (connection-wise) unique string you pass to pg_execute(), so your script has no problem to gain access to the statement previously prepared by another instance (using the same connection).

Using mysqli or PDO-mysql you need a resource/object as statement identifier. That's kind of a problem since neither the mysqli nor the pdo extension seem to offer a way of storing the resource in EG(persist_list) between requests and you can't recreate it either, so it's seems impossible to re-use a mysql prepared statement directly.

But as we know we should not use persistent connection , here are some good points.

When you lock a table, normally it is unlocked when the connection closes, but since persistent connections do not close, any tables you accidentally leave locked will remain locked, and the only way to unlock them is to wait for the connection to timeout or kill the process.
Normally temporary tables are dropped when the connection closes, but since persistent connections do not close, temporary tables aren't so temporary. If you do not explicitly drop temporary tables when you are done, that table will already exist for a new client reusing the same connection. The same problem occurs with setting session variables.
If PHP and MySQL are on the same server or local network, the connection time may be negligible, in which case there is no advantage to persistent connections.
Apache does not work well with persistent connections. When it receives a request from a new client, instead of using one of the available children which already has a persistent connection open, it tends to spawn a new child, which must then open a new database connection. This causes excess processes which are just sleeping, wasting resources, and causing errors when you reach your maximum connections, plus it defeats any benefit of persistent connections.

Also Try to use it smartly. For example.

$ids = array(12,34,56);

$sql = “SELECT * FROM address WHERE address_id = :address_id”;

$databaseHandle = CustomDB::getDBH();

$rows = array();

$preparedStatementHandle = $databaseHandle->prepare($sql);

foreach ($ids as $id) {

$preparedStatementHandle->execute(array(‘address_id’ => $id));

$result = $preparedStatementHandle->fetchAll();

$rows = array_merge($rows, $result);

}

It doesn't make sense when we can get gain using SELECT id FROM address WHERE address_id IN (12,34,56);

Prepared Statements are useful when they are related to session-long caching alongside a database wrapper that’s just good enough to programmatically generate most typical lookups.

Unfortunately, “good enough” caching looks complicated:

Example Queries:

SELECT * FROM address WHERE address_id = 12;

SELECT * FROM address WHERE address_id = 34;

SELECT * FROM address WHERE address_id = 56;

Example Code, assuming PDO & PHP, to produce those queries using Prepared Statements with caching:

$ids = array(12,34,56);

$sql = “SELECT * FROM address WHERE address_id = :address_id”;

$databaseHandle = CustomDB::getDBH();

Cache::setStrategy(Cache_LRU::getLabel());

$rows = array();

if (!Zend_Registry::isRegistered(self::PREP_STMT_CACHE_KEY)) {

Zend_Registry::set(self::PREP_STMT_CACHE_KEY, Cache::getInstance());

}

$preparedStatementHandle = Zend_Registry::get(self::PREP_STMT_CACHE_KEY)->get($sql);

if (!$preparedStatementHandle) {

$preparedStatementHandle = $databaseHandle->prepare($sql);

// Use the sql itself as the index/hash

Zend_Registry::get(self::PREP_STMT_CACHE_KEY)->set($sql, $preparedStatementHandle);

}

foreach ($ids as $id) {

$preparedStatementHandle->execute(array(‘address_id’ => $id));

$result = $preparedStatementHandle->fetchAll();

$rows = array_merge($rows, $result);

}

Prepared Statements fail

Prepared Statements only exist for the current session, so holding onto a handle after a session closes will lead to failures
Each Prepared Statement consumes a handle from the instance’s Prepared Statement pool, which is the “max_prepared_stmt_count”
Out Of Memory on the client side

Case 1:
A connection closes while a cache of Prepared Statement Handles exists

Solution:
Update your PDO wrapping class to have a __destruct method defined to clear the relevant cache of Prepared Statement Handles before calling the parent’s destruct method.

Case 2:
The max_prepared_stmt_count value is reached on a database

Solution:
Immediately drop all local caches of Prepared Statements and try again. If there is still an issue, activate PDO’s ATTR_EMULATE_PREPARES flag to silently convert calls of ->prepare and ->exec into standard SQL Statements.

Case 3:
Out Of Memory (OOM) on the client-side.

Solution:
Reduce the Prepared Statement Handle Cache size. The cache does not have to be large if it is well managed. Even my company’s complicated webapp’s web requests do not fill a 200-statement-long FIFO cache.

Remaining Points:

Monitor (ie: Nagios) Prepared_stmt_count vs max_prepared_stmt_count
Monitor (ie: StatsD) the Prepared Statement Handle Cache hit, miss, and purge rate.
An LRU’s extra minimal overhead is only worthwhile over a simple FIFO if your Prepared Statement Handle Cache is too small for all the queries that should be cached.
Note: Your cache should be small due to the unbounded and invisible memory consumption of Prepared Statement Handles on the database server
A best-case Prepared Statement instantiation against localhost with a simple select costs me, on average, about 300 microseconds. Pulling a handle from a cache is about 6 microseconds.
Coworkers have shown me that long lists of Named Parameters (ie: “:id1, :id2, :id3, [...]“) get more expensive with quantity whereas long lists of Ordered Parameters (ie: “?,?,?,[...]“) remain cheap even in large number. Numerically quantifying this slowdown will be a future post.
Ordered Parameters’ values are not decoded in SHOW PROCESSLIST. Named Parameters’ values are displayed, however, which makes them, to me, far preferable.

Tuesday, October 30, 2012

Things web developers must know

The idea here is that most of us should already know most of what is on this list. But there just might be one or two items you haven't really looked into before, don't fully understand, or maybe never even heard of.

Interface and User Experience

Be aware that browsers implement standards inconsistently and make sure your site works reasonably well across all major browsers. At a minimum test against a recent Gecko engine (Firefox), a WebKit engine (Safari, Chrome, and some mobile browsers), your supported IE browsers (take advantage of the Application Compatibility VPC Images), and Opera. Also consider how browsers render your site in different operating systems.
Consider how people might use the site other than from the major browsers: cell phones, screen readers and search engines, for example. — Some accessibility info: WAI and Section508, Mobile development: MobiForge.
Staging: How to deploy updates without affecting your users. Ed Lucas's answer has some comments on this.
Don't display unfriendly errors directly to the user.
Don't put users' email addresses in plain text as they will get spammed to death.
Add the attribute rel="nofollow" to user-generated links to avoid spam.
Build well-considered limits into your site - This also belongs under Security.
Learn how to do progressive enhancement.
Redirect after a POST if that POST was successful, to prevent a refresh from submitting again.
Don't forget to take accessibility into account. It's always a good idea and in certain circumstances it's a legal requirement. WAI-ARIA and WCAG 2 are good resources in this area.

Security

It's a lot to digest but the OWASP development guide covers Web Site security from top to bottom.
Know about SQL injection and how to prevent it.
Never trust user input nor anything else that comes in the request (which includes cookies and hidden form field values!).
Hash passwords using salt to prevent rainbow attacks. Use a slow hashing algorithm, such as bcrypt (time tested) or scrypt (even stronger, but newer) (1, 2), for storing passwords. (How To Safely Store A Password)
Don't try to come up with your own fancy authentication system. It's such an easy thing to get wrong in subtle and untestable ways and you wouldn't even know it until after you're hacked.
Know the rules for processing credit cards. (See this question as well)
Use SSL/HTTPS for login and any pages where sensitive data is entered (like credit card info).
How to resist session hijacking.
Avoid cross site scripting (XSS).
Avoid cross site request forgeries (XSRF).
Keep your system(s) up to date with the latest patches.
Make sure your database connection information is secured.
Keep yourself informed about the latest attack techniques and vulnerabilities affecting your platform.
Read The Google Browser Security Handbook.
Read The Web Application Hacker's Handbook.

Performance

Implement caching if necessary, understand and use HTTP caching properly as well as HTML5 Manifest.
Optimize images - don't use a 20 KB image for a repeating background.
Learn how to gzip/deflate content (deflate is better).
Combine/concatenate multiple stylesheets or multiple script files to reduce number of browser connections and improve gzip ability to compress duplications between files.
Take a look at the Yahoo Exceptional Performance site, lots of great guidelines including improving front-end performance and their YSlow tool. Google page speed is another tool for performance profiling. Both require Firebug to be installed.
Use CSS Image Sprites for small related images like toolbars (see the "minimize HTTP requests" point)
Busy web sites should consider splitting components across domains. Specifically...
Static content (i.e. images, CSS, JavaScript, and generally content that doesn't need access to cookies) should go in a separate domain that does not use cookies, because all cookies for a domain and its subdomains are sent with every request to the domain and its subdomains. One good option here is to use a Content Delivery Network (CDN).
Minimize the total number of HTTP requests required for a browser to render the page.
Utilize Google Closure Compiler for JavaScript and other minification tools.
Make sure there’s a favicon.ico file in the root of the site, i.e. /favicon.ico. Browsers will automatically request it, even if the icon isn’t mentioned in the HTML at all. If you don’t have a/favicon.ico, this will result in a lot of 404s, draining your server’s bandwidth.

SEO (Search Engine Optimization)

Use "search engine friendly" URLs, i.e. use example.com/pages/45-article-title instead ofexample.com/index.php?page=45
When using # for dynamic content change the # to #! and then on the server$_REQUEST["_escaped_fragment_"] is what googlebot uses instead of #!. In other words,./#!page=1 becomes ./?_escaped_fragments_=page=1. Also, for users that may be using FF.b4 or Chromium, history.pushState({"foo":"bar"}, "About", "./?page=1"); Is a great command. So even though the address bar has changed the page does not reload. This allows you to use ? instead of #! to keep dynamic content and also tell the server when you email the link that we are after this page, and the AJAX does not need to make another extra request.
Don't use links that say "click here". You're wasting an SEO opportunity and it makes things harder for people with screen readers.
Have an XML sitemap, preferably in the default location /sitemap.xml.
Use when you have multiple URLs that point to the same content, this issue can also be addressed from Google Webmaster Tools.
Use Google Webmaster Tools and Bing Webmaster Tools.
Install Google Analytics right at the start (or an open source analysis tool like Piwik).
Know how robots.txt and search engine spiders work.
Redirect requests (using 301 Moved Permanently) asking for www.example.com to example.com(or the other way round) to prevent splitting the google ranking between both sites.
Know that there can be badly-behaved spiders out there.
If you have non-text content look into Google's sitemap extensions for video etc. There is some good information about this in Tim Farley's answer.

Technology

Understand HTTP and things like GET, POST, sessions, cookies, and what it means to be "stateless".
Write your XHTML/HTML and CSS according to the W3C specifications and make sure theyvalidate. The goal here is to avoid browser quirks modes and as a bonus make it much easier to work with non-standard browsers like screen readers and mobile devices.
Understand how JavaScript is processed in the browser.
Understand how JavaScript, style sheets, and other resources used by your page are loaded and consider their impact on perceived performance. It may be appropriate in some cases to move scripts to the bottom of your pages.
Understand how the JavaScript sandbox works, especially if you intend to use iframes.
Be aware that JavaScript can and will be disabled, and that AJAX is therefore an extension, not a baseline. Even if most normal users leave it on now, remember that NoScript is becoming more popular, mobile devices may not work as expected, and Google won't run most of your JavaScript when indexing the site.
Learn the difference between 301 and 302 redirects (this is also an SEO issue).
Learn as much as you possibly can about your deployment platform.
Consider using a Reset Style Sheet.
Consider JavaScript frameworks (such as jQuery, MooTools, Prototype, Dojo or YUI 3), which will hide a lot of the browser differences when using JavaScript for DOM manipulation.
Taking perceived performance and JS frameworks together, consider using a service such as theGoogle Libraries API to load frameworks so that a browser can use a copy of the framework it has already cached rather than downloading a duplicate copy from your site.
Don't reinvent the wheel. Before doing ANYTHING search for a component or example on how to do it. There is a 99% chance that someone has done it and released an OSS version of the code.

Bug fixing

Understand you'll spend 20% of your time coding and 80% of it maintaining, so code accordingly.
Set up a good error reporting solution.
Have a system for people to contact you with suggestions and criticisms.
Document how the application works for future support staff and people performing maintenance.
Make frequent backups! (And make sure those backups are functional) Ed Lucas's answer has some advice. Have a restore strategy, not just a backup strategy.
Use a version control system to store your files, such as Subversion, Mecurial or Git.
Don't forget to do your Acceptance Testing. Frameworks like Selenium can help.
Make sure you have sufficient logging in place using frameworks such as log4j, log4net or log4r. If something goes wrong on your live site, you'll need a way of finding out what.
When logging make sure you're capture both handled exceptions, and unhandled exceptions. Report/analyse the log output, as it'll show you where the key issues are in your site.

Lots of stuff omitted not necessarily because they're not useful answers, but because they're either too detailed, out of scope, or go a bit too far for someone looking to get an overview of the things they should know. If you're one of those people you can read the rest of the answers to get more detailed information about the things mentioned in this list. If I get the time I'll add links to the various answers that contain the things mentioned in this list if the answers go into detail about these things. Please feel free to edit this as well, I probably missed some stuff or made some mistakes.

original source: stackoverflow.com