Monday, August 30, 2010

Good code reviews are inversely proportional to WTFs/minute in meeting

In my company where i am working , has just started peer code reviews. We conduct them as Email-Pass-around reviews and invite the team to participate in the meeting to gain a better understanding of the changes.

We use Source Control software that requires check-in, code-review rules to be signed off. Nothing big, just another developer's name that has reviewed the code.

There are definite benefits to code review as several studies have been able to demonstrate. in company, it is evident that code quality increased as the number of support calls decreased and the number of reported bugs decreased as well. The benefits of code review can be a more stable product, more maintainable code as it applies to structure and coding standards, and it allows developers to focus more on new features rather than “fire-fighting” bugs, and other production issues.

There really aren’t any drawbacks if code reviews are conducted “right”. More on the “right way” below.

• Some of the hurdles to overcome while implementing code reviews are the idea that “big brother” is watching me and the idea that not having perfect code means torture and pain. Getting developers to trust each other is difficult sometimes, especially when it involves “pecking order” or the “holier than thou” attitudes and putting your hard work under a microscope. Trust is the key to resolving these issues. A developer must trust that they will not be punished by peers or management for mistakes in code. It happens to everyone. Make a note of the issue, get it resolved and move on.

Scrum One of the benefits of using the Scrum methodology is that a development cycle (”sprint”) is short. As the development iterations are smaller, the code review process reviews smaller pieces of code, making the review more likely to find problems than hours or days of formal reviews.

“Right Way” Code Reviews done the “right way” is completely subjective. However, I personally believe that they should be informal reviews. All of the participants in a review should avoid personally attacking the person being reviewed with statements such as “why did you do it that way?” or “what were you thinking?” etc. These types of comments diminish the trust between peers, leading to animosity, hours of arguing over the best/right way to code a solution. Keep in mind that developers do not think or code exactly the same, and there are many solutions to a problem. Just a little clarification on over-the-shoulder reviews; these can be conducted via remote desktop sharing (pick flavor here), or in person. However, they shouldn’t be limited to the developers only. We typically invite our entire team which consists of 10-12 developers.

Suggestion I would highly recommend code reviews.Create the basic rules for each role and implement them as part of culture to achieve a better quality product. It must become part of your culture so that it is part of a natural process and integrated at all levels, as it is a paradigm shift from poor quality, missed deadlines and frustration to better quality products, less frustration, and more on-time deliverables.

In general, I look for the following in a code review:-

Are all methods in a class as short as they could be
Do they have descriptive names
Are those methods doing only one thing (actually the thing that their name says it does)
Did somebody try to justify crappy code with some nice comments
How well does the code read in general

In addition we look at things like:-

coupling
cohesion
thread safety (if threads were used)
Boundary Conditions (what is going into and out of a method is what you expect e.g. no nulls)
Code style follows company standards
Code is not duplicated or copied (this can be automated Google for Simian)
Code complexity - Google cyclomatic complexity
Will the code that has been written cause any knock ons in other parts of the system
If class, method and variable names are kept meaningful a lot of comments should not be necessary
Is exception handling in place and comply with company standard
There are Unit Tests for the code that are meaningful and the tests are relevant
Logic errors
Conformance to the specification (you have one of those, right?)
Robustness/defensive programming

Wednesday, August 25, 2010

PHP Application Affected by the Y2K38 Bug

I don’t want to be too alarmist, but try running the following PHP code on your system:

.
. $date = '2040-02-01';
. $format = 'l d F Y H:i';
. $mydate1 = strtotime($date);
. echo date($format, $mydate1);
. ?>

With luck, you’ll see “Wednesday 1 February 2040 00:00″ displayed in your browser. If you’re seeing a date in the late 60’s or early 70’s, your PHP application may be at risk from the Y2K38 bug!
What’s the Y2K38 bug?

Y2K38, or the Unix Millennium Bug, affects PHP and many other languages and systems which use a signed 32-bit integer to represent dates as the number of seconds since 00:00:00 UTC on 1 January 1970. The furthest date which can be stored is 03:14:07 UTC on 19 January 2038. Beyond that, the left-most bit is set and the integer becomes a negative decimal number — or a time prior to the epoch.

Yes, it’s 28 years away and I’m sure many of you think it’s ridiculous to worry about it now. However, developers thought that way about the Millennium bug the 1970’s and 80’s. Also, any web application which handles long-term future events could be at risk. For example, a typical mortgage runs for 25 years. Pensions and savings plans can be far longer.
Will 64-bit save us?

Probably. If you’re using a 64-bit OS with a compiled 64-bit edition of PHP, your application shouldn’t be affected. I’d recommend you test it, though. A signed 64-bit number gives a maximum future date which is 21 times greater than the current age of the universe — 292 billion years, give or take a day or two.

You can probably sleep at night if you’re convinced your financial application will always be installed on a 64-bit system.
Are there alternative options?

Fortunately, PHP introduced a new DateTime class in version 5.2 (experimental support was available in 5.1 and be aware that some methods were introduced in 5.3)

.
. $date = '2040-02-01';
. $format = 'l j F Y H:i';
. $mydate2 = new DateTime($date);
. echo $mydate2->format($format);
. ?>

DateTime does not suffer from Y2K38 problems and will happily handle dates up to December 31, 9999. I might have paid off my mortgage by then!

It may not be worth upgrading existing applications, but you should certainly consider the DateTime class when planning your next project.

Has you experienced Y2K38 problems in your application? How did you fix it?

Thursday, August 19, 2010

Different phases to start a website from scratch

If we were to design highly scalable website from scratch, what technologies would we use?
Based on Web 2.0 popularity, LAMP seems to be high in the running. We should looking for the best community support, the fastest development time and most importantly, the best scaling approach.

in LAMP , The L and the A part is not the complicated one. The M part neither. Its the P part that you have to give thoughts about. PHP. Perl or Python.
PHP is pretty much de-facto but misses an application-server.
Perl is pretty good but harder than PHP.
Python is pretty good but is **** about white space.

Here is my plan in phases , what I'm using to start a website from scratch using a single server for now. Later, I'll scale out horizontally when the need arises.

Phase 1: Single Server, Dual Quad-Core 2.66, 8gb RAM, 500gb Disk Raid 10
OS: Fedora 8. You could go with pretty much any Linux though. I like Fedora 8 best for servers.
Proxy Cache: Varnish - it is way faster than Squid per my own benchmarks. Squid chokes big time.
Web Server: Lighttpd - faster than Apache 2 and easier to configure for me.
Object Cache: Memcached. Very scalable.
PHP Cache: APC. Easy to configure and seems to work fine.
Language: PHP 5
Database - MySQL 5

Phase 2: Max Ram out to 64 GB, cache everything.

Phase 3: Buy load balancer + 2 more servers for front end Varnish/Memcached/Lighttpd. Use original server as MySQL database server.

Phase 4: Depending on load & usage patterns, scale out the database horizontally with an additional server. I don't expect the db to be a bottleneck for website as only metadata info is stored there. I'll mostly be serving images stored on the file system. Possibly separate Varnish / Memcached / Lighttpd tier into separate tiers if necessary. But I'll carefully evaluate the situation at this point and scale out appropriately and use CDN for static content if necessary.

Phase 5: Max all servers to 64gb of RAM, cache, cache, cache everything.

Phase 6: If I get this far then I'm a multi-millionaire already so I'll replace all of the above machines with whatever the latest and greatest is at that time and keep scaling out.

The important point is that I know how to scale each layer when/if the need arises. I'll scale the individual machines when necessary and scale horizontally too.

What is the difference between proxy server and reverse proxy server.

I will limit my discussion to web proxies, however, the idea of a proxy is not limited to web sites.

FORWARD proxy

Most discussion of web proxies refers to the type of proxy known as a "forward proxy."

The proxy event in this case is that the "forward proxy" retrieves data from another web site on behalf of the original requestee.

A tale of 3 computers (part I)

For an example, I will list three computers connected to the internet.

X = your computer, or "client" computer on the internet
Y = the proxy web site, proxy.example.org
Z = the web site you want to visit, www.example.net

Normally, one would connect directly from X --> Z.

However, in some scenarios, it is better for Y --> Z on behalf of X, which chains as follows: X --> Y --> Z.

Reasons why X would want to use a forward proxy server:

Here is a (very) partial list of uses of a forward proxy server.

1) X is unable to access Z directly because
- a) Someone with administration authority over X's internet connection has decided to block all access to site Z.
  - Examples:
    - The Storm Worm virus is spreading by tricking people into visitingfamilypostcards2008.com, so the system administrator has blocked access to the site to prevent users from inadvertently infecting themselves.
    - Employees at a large company have been wasting too much time on myspace.com, so management wants access blocked during business hours.
    - A local elementary school disallows internet access to the playboy.com web site.
    - A government is unable to control the publishing of news, so it controls access to news instead, by blocking sites such as wikipedia.org. See TOR or FreeNet.
- b) The administrator of Z has blocked X.
  - Examples:
    - The administrator of Z has noticed hacking attempts coming from X, so the administrator has decided to block Z's ip address (and/or netrange).
    - Z is a forum web site. X is spamming the forum. Z blocks X.

REVERSE proxy

A tale of 3 computers (part II)

For this example, I will list three computers connected to the internet.

A = your computer, or "client" computer on the internet
Y = the reverse proxy web site, proxy.example.com
Z = the web site you want to visit, www.example.net

Normally, one would connect directly from X --> Z.

However, in some scenarios, it is better for the administrator of Z to restrict disallow direct access, and force visitors to go through Y first. So, as before, we have data being retrieved by Y --> Z on behalf of X, which chains as follows: X --> Y --> Z.

What is different this time compared to a "forward proxy," is that this time the user X does not know he is accessing Y. A Reverse Proxy is typically less visible than a "forward proxy", and requires no configuration or special knowledge by the client, X.

The client X probably thinks he is visiting Z directly (X --> Z), but the reality is that Y is the invisible go-between (X --> Y --> Z again).

Reasons why Z would want to set up a reverse proxy server:

1) Z wants to force all traffic to its web site to pass through Y first.
- a) Z has a large web site that millions of people want to see, but a single web server cannot handle all the traffic. So Z sets up many servers, and puts a reverse proxy on the internet that will send users to the server closest to them when they try to visit Z. This is part of how the Content Distribution Network (CDN) concept works.
  - Examples:
    - Apple Trailers uses Akamai
    - Jquery.com hosts it's javascript files using CloudFront CDN sample.
    - etc.
b) The administrator of Z is worried about retaliation for content hosted on the server, and does not want to expose the main server directly to the public.
- a) Owners of Spam brands such as "Canadian Pharmacy" appear to have thousands of servers, while in reality having most websites hosted on far fewer servers. Additionally, abuse complaints about the spam will only shut down the public servers, not the main server.

In the above scenarios, Z has the ability to choose Y.

Sunday, August 1, 2010

Is it time to start using HTML 5

Check out this SlideShare Presentation:

Is it time to start using HTML 5

View more presentations from Ravi Raj.