After seeing a couple of my tweets about analytics and performance the folks at Pingdom asked me a few questions to put together a blog about Geekzone performance. How we maintain the site, how we collect data (including real user monitoring and analytics) and what makes the site run.
You can see some interesting information about browser usage and speeds in our State of Browsers on Geekzone March 2013.
We have been using the Pingdom RUM service pretty much from the start of the beta, released first week of January and should be out of beta soon.
As part of keeping up with times, this last weekend I finished moving the Hyper-V VMs behind Geekzone to Windows Server 2012. Someone in our forums was curious on how we could have Geekzone running on a single VM instance with no load balancers and so, so he asked me to post what's behind our website, how it changed over the years and what do we do to keep performance up.
We currently serve around 230,000 pages a day (user requests and AJAX request for some pages) plus other resources such as images, scripts and CSS files.
When I started Geekzone it was a domain in a shared host service called Ocoloco, provided by a small Masterton-based company called SiliconBlue. In 2003 Auckland-based ISP ICONZ bought Ocoloco and with that they became our hosting providers. Back then we had a single domain running on IIS, Classic ASP and a Microsoft Access database. We were serving 10,000 pages a month after a few months and that was BIG.
Our first project was to move from Microsoft Access to Microsoft SQL, still in the shared environment. We know Microsoft Access doesn't scale well, but back then we never thought we'd be serving more than 10,000 pages a month.
This worked out well until we got big enough that we had to sometimes call our provider and ask them to restart their SQL server two or three times a day, due to the server crashing under our load. ICONZ suggested we should really get our own server (back then virtual environments weren't a big thing).
We bought our first server from ICONZ, an Acer server with 3GB RAM. We installed Windows Server 2003 and Microsoft SQL. An entire server just for us! It worked fine for a few years until we got to the point where our requirements were really pushing the limits of that 32 bit hardware.
HP came into play and we were supplied with a HP Proliant DL360 server (like the one in the picture above) with 10GB RAM. Loaded with Windows Server 2008 and Hyper-V we had enough to run a VM for Geekzone (IIS/SQL database), a test VM and a monitoring VM.
That's when I started getting serious about performance. While many companies solve their performance problems by installing more hardware we tried to use more of the resources we had available. The monitoring VM runs SQL Sentry and SQL Monitor for database monitoring, cache plan testing and other management tasks. I spent a lot of time optimizing indexes, working the database model and so on.
At this time I also decided to move from a single IIS worker model to a multiple workers (IIS web garden). To get to this point I had to write our session management routines using the SQL database to allow for persistence between the odd server restart (we do restart servers after applying the monthly patches released by Microsoft every second Tuesday of the month) and to allow session to persist between IIS workers. I also worked with Redjungle's Phil to have separated email notification delivery from the web application, as well creating a metaweblog API for our blogging platform and a couple of .Net MVC web sites (Geekzone Mobile and Geekzone Jobs).
Another advantage of this approach is the ability to scale out - and it does work well as I found out when migrating our applications from the old Windows Server 2008 VM to the new Windows Server 2012 VM. I was able to move web applications one at a time and sessions worked across different hosts, sharing the database across a Hyper-V private network.
Around the time we started playing with performance I got to meet the folks at Aptimize, now Riverbed Aptimizer. Aptimize was a Wellington-based company until Riverbed acquired them in 2011. The software works automatically, examining all pages served from our servers and applying rules that determine how to optimize web pages for best client performance. This includes image sprite creation, script and CSS minification, URL rewrite for CDN resources, lazy loading images, loading async scripts and so on. We start using Aptimizer and it improved page speed almost instantly so we had time to put a lot of effort into the database side of things, to get everything a step further.
Around 2009 we decided to move our server from ICONZ, mainly due to colocation and traffic costs. We know 60% of our traffic is New Zealand-based, and of those 75% is from Auckland alone, so when the time came for us to move hosting companies we examined a few companies around Auckland and decided to go with Datacom. They were really good at putting together a package for our small one man operation. And so one day we unplugged the server at ICONZ, loaded it into Nate's car and drove across Auckland to its new home. The Datacom datacenter is so huge that I am pretty sure i might not ever see the server again.
The Datacom move was really good, with improved bandwidth giving our users even faster access to our website. But we know a lot of people access Geekzone from outside New Zealand so we started using a CDN to distribute the heavy resources around the world. Initially with MaxCDN (their prices are really good) and lately with Cloudflare. There are two reasons we moved to Cloudflare: they have a POP in Sydney, which is pretty close to New Zealand, so we could move to them with low impact to our users and their Pro plans support SSL for the CDN - which was a problem for us before (we used to have different CDN rewrites for SSL and non-SSL pages, now we have only one).
We do not use Cloudflare for page optimization because that would add unnecessary round trips for the majority of ours users. But using Aptimizer together with Cloudflare for CDN we can get our resources closer to users, manage the cache expire in their browsers and in the ISP's proxies making all faster than ever.
Since then we increased memory on the server to 24GB to allow for better memory management as well. And while our Windows Server 2008 was working perfectly well, I decided to move to Windows Server 2012 for a few reasons but mainly because of a faster OS startup, OS support for NIC teaming, and Hyper-V Dynamic Memory. And also because this is Geekzone so why not then?
So that's it. A bit of geek history and things I've done the last few years. More to come (and if you need more information or some help with your current setup, contact me and we can have a chat).
Just finished reading a blog post that shows, once again, that people should use their ISP DNS for better performance when it comes to distributed content.
In New Zealand this is even more important because using a local CDN cache gives broadband users a huge advantage instead of fetching resources overseas through a long undersea cable.
There's a dynamic table where you can check the performance loss/gain depending on which CDN you're targeting. Here is one for Australia:
This table shows how much slower a download will be, based on where the CDN is resolved to.
A positive percentage means performance is worse, negative means performance is better. The first one is Google DNS, the second is OpenDNS.
You see now that using those DNS in Australia (and New Zealand, but unfortunately there's no data in the table for our little country) can make things really bad.
Using your ISP DNS will point to the local cache. Using other DNS will instead point to somewhere else in the world.
Just last week we found out someone is bringing big guns to a fight, as Stuff told us Neil Graham was starting an online marketplace business to compete with the one and only Trade Me.
The new web site, called Wheedle wasn't ready for prime time yet when it was first mentioned online and after a few hours of hiccups it was taken offline until its official launch date, 1st October 2012.
In the brief moments the site was up (and down) Geekzone members started reporting some of the bugs around the site (and here as well). The discussion listed simple things such as listings showing completely unrelated images, to a bit more disturbing problem: pages showing someone else's user names and information.
It is great to see since then the mixed up identities problem seems to have been fixed, but other things popped up.
Right now I can imagine some Trade Me folks talking around a whiteboard:
- Tech Guy: We have a problem with Wheedle.
- Non-Tech Manager: Sure, it's a worthy competitor, backed by someone with deep pockets to go for the long run.
- Tech Guy: Not that, but. . . They store their password in plain text, instead of encrypting it before storing in the database.
- Non-Tech Manager: How do you know this?
- Tech Guy: I registered there and just clicked the "Forgot my password". The email came with my password instead of link to reset it. It tells me the password is stored in plain-text.
- Non-Tech Manager: So? That's their problem. If someone finds a vulnerability and manage to download database contents from their server it's their breach of privacy, not ours.
- Tech Guy: Sure. But reports tells us a good number of people reuse the username and passwords in more than one site.
- Non-Tech Manager: Are you saying if someone used their same Trade Me email or username and password to register on Wheedle then a bad guy in [insert country with lots of bad guys here] could try those on Trade Me and in some cases actually gain access to accounts?
- Tech Guy: Hmmmm, yes.
- Non-Tech Manager: Holy shit, Batman!
We can use another scenario: there is something for sale on Trade Me, and armed with a third party list of valid emails addresses for the buyer a scammer could send out an email pretending to be the seller on Trade Me, saying something like "the item didn't sell, I can offer to you very cheap" and then get the unsuspecting buyer to deposit the payment into someone else's account for laundering.
You might say no one would fall for that. Think again. People fall for simple scams all the time.
I don't know what security they have implemented server-side, but sanitizing input data on the client side is no way to go on life:
If this is done on the client side only, then anyone with interest could easily craft a local page to bypass this weak strategy and send something malicious to the server, potentially gaining access to information stored there through SQL Injections.
The question that popped in my mind was "how long before Trade Me" forces people logging into their site to change their passwords?". Simply put, any third party vulnerability can affect Trade Me as an unintended consequence.
What can you do?
- If you are planning to register on any other site make sure you use a different email address, user name and password.
- If you already registered on any other site then go there now and change your email address and password.
Just do one of those two things and you will be a lot safer.
And for those on Twitter who said we shouldn't be criticising newcomers. I'm happy to support a new online marketplace in New Zealand but security should be part of design since Day 0. I hope this is something for them to consider, and good luck the days ahead.
Another advertising order for Geekzone, another reason to be happy. But I'm actually sad - sad for my readers and advertisers.
You probably know by now I try to get maximum performance out of the servers we use. I also work hard, using different software, services and techniques to get the site as fast as possible.
Many people use ad blockers for different reasons. Some say they find the ads slow down their PCs, others say ads may be vector for malware. Some say ads slow down web page load times.
Assuming we are hosting the creatives (ads) with Google DFP, a single call will be all its needed to get the image and parameters to show it on the page.
If the advertiser is using DoubleClick (a Google company agencies use to manage a campaign workflow) , Google is smart enough to get the ads out exactly like it would do with hosted creatives - that is in a single call.
Between advertisers and publishers there's almost all the times an agency that represents the publisher, trying to sell available inventory. These agencies get paid a commission on each sale they manage to complete. They also like to know how many impressions and clicks campaigns are getting. As a publisher using Google DFP I can easily give agencies access to real-time reports for their campaigns. But I haven't seen any agency that takes advantage of this feature.
Instead, these agencies load the tags supplied by the advertisers into their own systems. In turn they give the publishers their own tags. And we obviously need to load our own scripts to manage the delivery.
So instead of having one script that loads and ad with a single call (the Google DFP and Google DoubleClick integration), we have a script that loads the agency script than in turn loads the advertiser script than in turn loads the ads.
This ads an incredible latency to the whole ad delivery system. Usually these ad agencies don't have servers closer to end users. They don't use CDNs. Things get slow. And when things get slow users navigate away. And when users navigate away then don't see the ads.
For all purposes Google DFP delivered the code and counted one impression. But by the time the browser loaded the second script and is waiting to load the third script the user might have closed the window or clicked a link to go away. So the agency doesn't count the impression. Then they complain there's a difference between my counter and their counter.
Another important thing: Google DFP is smart enough to deliver more impressions of those ads that perform better. In other words, if the advertiser supplies more than one ad then Google DFP will make sure it shows more of the ads getting a higher number of clicks. If we run an agency tag we lose control and can't count the clicks, meaning all ads are delivered in a balanced manner. This mean the optimization that could benefit the advertiser and attract more clicks is lost.
At the end advertisers lose the opportunity to get more clicks, our reader sees pages slowing down, and agencies act as a middle man that really is trying to do more than they should do by getting technical where they don't have the capability and don't actually ad any value.
This is not a rant at one specific agency. Most agencies work like this. They just don't understand that a fast web means more business for everyone.
Last weekend a press release landed in my inbox, and I thought it interesting enough to make me contact the agency and get more information about the product. In summary ScaleArc iDB promised to scale your database without changes in code or database itself:
ScaleArc, the pioneer in a new category of database infrastructure that accelerates application development by simplifying the way database environments are deployed and managed, today announced general availability of iDB v2.0 for Microsoft SQL Server that brings significant new capabilities to SQL Server environments such as instant horizontal scaling, higher availability, faster database performance, increased SQL protection and real-time query analytics. iDB takes a fundamentally different approach at the SQL protocol layer by providing customers with a wide spectrum of capabilities for their database environment in a single solution, without requiring any modifications to existing applications or SQL Server databases.
Until now, moving to advanced architectures like multi-master, or achieving instant scale and better performance within SQL server environments, has been costly and extremely difficult to implement. iDB v2.0 for MS SQL supports a wide range of functions including Read/Write splitting, dynamic load balancing and horizontal scaling, query caching for up to 24x faster query responses, wire-speed SQL filtering and real-time instrumentation and analytics to enhance all deployment modes of SQL server, including SQL Server Clustering, SQL mirroring, Peer-to-Peer (P2P) Replication and log shipping.
iDB for MS SQL Feature Highlights
. Dynamic Query Load Balancing for High Availability: ScaleArc iDB implements a specialized dynamic load-balancing algorithm that allows the most efficient utilization of available database capacity, even when servers have varying capacity. iDB monitors query responses in real-time and can load balance queries to the server that will provide the fastest response to properly distribute the load. Up to 40% better performance has been observed with iDB's dynamic load balancing relative to TCP-based load balancing.
. Pattern-Based Query Caching for Increased Performance: ScaleArc iDB allows users to cache query responses with one-click. No changes are required at the database server or in the application code; the query is cached at the SQL protocol level, providing up to 24x acceleration without any modifications.
. Multi-master: iDB supports multi-master and master-slave scenarios to ensure high availability and scalability. Specific queries, irrespective of their origin, are routed to the right server with the advance query routing engine that also simplifies sharding.
. Real-time Analytics: Advanced graphical analysis tools provided by ScaleArc iDB bring comprehensive real-time awareness of all queries, helping to quickly pinpoint query patterns that are not performing optimally and allowing more precise management.
. Wire Speed SQL filtering: iDB is able to enforce query-level policies for security or compliance reasons to protect against attacks, theft and other threats. iDB can operate outside of the application where policies have not traditionally been easily enforced.
. SQL Query Surge Queue: Extreme loads can lead to unacceptable response times or even halting of operations until the load reduces, leading to "Database not Available" errors. ScaleArc iDB allows a more graceful response to peak loads. When faced with an extreme load, ScaleArc iDB can initiate a SQL Query Surge Queue and momentarily queue queries in a FIFO queue and process them once server resources become available.
Obviously I was a bit worried with their claims, so asked a couple of questions. Here are the answers:
What happens to cached query results when the result changes? For example a record is updated - will the next query use previous results, or get new results?
The key to iDB lies in our Analytics. We provide granular real-time data on all SQL queries flowing between application servers and the database servers. As such, customer now have the intelligence they need to understand the query structure, the frequency it hits the database, the amount of server resources it takes, etc. We then give the customer the power to cache on a per query basis, but we do not set a Time-To-Live for the customer. They need to understand how often the query will be updated, and ensure they do not set a Time-To-Live that may serve stale data is an Update comes in from the Application. We allow customers to set TTL anywhere from 1 second to multiple years. When a cache rule for a query is activated with a single click of a button, we immediately measure the performance and offload impact of the cache. And since our cache on iDB is a hash map that caches the TCP output of Read queries, subsequent Read queries served from our cache are served up to 24x faster (or more).
ScaleArc also has API that can be invoked from the application to add, invalidate and bypass the cache for specific SQL statements
How much more memory does it require? Or does it use the SQL DB footprint?
ScaleArc iDB is a Network appliance like deployment and does not have any agents on the Server or the Application.This would mean that iDB has its own physical/virtual machine to perform its operations. iDB can run load balancing within 4GB of memory, however for caching and logging purposes iDB can address up to 128GB of memory.
iDB is a separate instance from the database. Most customers run our software on a dedicated x86 server to make it a dedicated appliance. We also sell appliances, or iDB can be installed on a hypervisor as a Virtual Machine. iDB does not require a lot of memory to operate, but we can allocate up to 128GB of RAM for caching of READ queries. Query logs are stored on drives on the appliance.
Very interesting - an appliance for SQL TCP output caching. Ok, I have entered my name in to get a 30 day trial and see how much difference it can actually make.
UPDATE: Someone on Facebook said this was advertising. IT IS NOT. I was not asked to post about it, and did not receive any payment to post about it. If you are so inclined please read my FULL DISCLOSURE post.
Learn how the Riverbed performance platform can help you up your IT game
With the growth of virtualization, consolidation, and cloud computing have come new challenges. IT is increasingly consolidated and virtualized while workers and consumers are distributed. How best to harness these approaches and deliver the efficiency and control your organization requires, while ensuring that end users get the performance they need?
Attend the Riverbed Performance Summit to find out how Riverbed empowers enterprises like yours with the tools to analyze, accelerate, and control your IT. Stay on top of the latest technology and solutions from Riverbed and join us for a deep dive into our vision for delivering performance for the globally connected enterprise.
Sign up to connect with Riverbed technology experts and your peers to learn how you can get more out of your Riverbed investment.
At this exclusive event you'll hear firsthand from our experts on how to maximize your Riverbed investment with the latest release of cutting-edge performance platform products and solutions:
- Granite, our revolutionary new product for consolidating edge servers in the data center
- Getting the most out of the latest release of RiOS (7.0), including optimization for video, UDP, IPv6, and VDI
- Steelhead Cloud Accelerator, a new powerful solution for boosting the performance of SaaS applications
- The latest product updates, technical overviews, demos, and more
Register now and discover how to make the Riverbed performance platform work for you. Find out how you can finally consolidate your entire infrastructure, including edge applications, servers and storage to the data center, all without compromising performance.
A shame I won't be attending this event since it falls on the same week I will be in Las Vegas for the HP Discover 2012.
At the end of the day, what you want is a faster loading web site that will help your company achieve an objective.
For example, when I started working to make Geekzone a faster web site, our metrics included reduce web page load time, increase number of repeat visitors, increased time spent on site and increased number of page views - we don't sell a "product", we sell advertising after all so those were the important metrics for us.
Using tools like WebPageTest allowed us to measure the time a web page takes to load in different parts of the world. Even though 40% - 45% of our traffic is New Zealand-based, we still have a large number of visitors coming from overseas (including the United States, Australia, Canada, the United Kingdom and India).
A couple of years ago our average web page load time was around 10 seconds for a visitor coming from the US. By following through with changes in our database, backend scripts, hosting provider, CDN we managed to reduce the web page load time to around 6.5 seconds on average when measured from Dulles, VA.
With automatic web optimization software (in our case Riverbed Stingray Aptimizer) we managed to reduce the time even further to 4.5 s as you can see in the image below, captured from a WebPageTest run earlier today:
If you are in New Zealand our web page load times are even lower, on average 1.5 seconds for a complete page to be ready to be used.
In another post I will talk about each of the items we touched when improving performance on Geekzone - make sure to subscribe to my RSS feed. Of course if you run a web site and think a Web Performance Optimization project could help you improve metrics, please contact me and we can work on this.
Continuing my series of posts about Web Performance Optimization (WPO), here is another thought: use a Content Delivery Networks (CDN) to speed up web pages and save money.
Even though bits travel fast, it all comes down to distance and number of bits. The closer you are to your users, the faster your web pages will load. That's where CDNAs help us, web site owners. While a robust web site might have geographically distributed content servers for performance and redundancy, maintaining this infrastructure comes at a cost.
CDNs provide a balanced distribution platform that allows content providers to store resources closer to their clients, making everything a bit faster. Here at Geekzone we currently use MaxCDN, but played with Fastly and Amazon Cloudfront. We currently have mixed DNS and CDN solution (which I will expand on in another post).
CDNs can be used in many different ways. The most common are Push and Pull. With Push CDNs you are responsible for loading your web resources to their servers, while Pull CDNs will automatically retrieve your web resources from a nominated origin server when a request first comes in.
Below is the stats panel for one of our CDN configurations with MaxCDN, where you can see how the content is distributed through the nodes and how much data is used up every day:
And below you can see the traffic (in number of hits) including cache hits and non-cache hits:
Coming from New Zealand, where data traffic is usually one of the highest costs in a web site operation, CDNs have the side effect of helping web site owners save on traffic. You can see that our CDN serves something between 400 MB and 1.2 GB a day, depending on traffic, with 90% cache hits. This means 90% of the requests are served from the CDN caches directly, without ever reaching our servers.
CDN configuration can be as simple as just creating new DNS records pointing a resource domain to the CDN subdomain created for your specific configuration. If your web site doesn't currently use a separate domain for serving up those resources (images, scripts, CSS, static HTML) there are solutions that can automatically rewrite those when a page is requested.
When using a CDN it's important to make sure your web resources are correctly configured to appropriate cache expire and public caching. If this is not possible to configure in your server, there's always a setting on the CDN that will allow you to override settings from the original server with new default values.
In another post I will talk about latency - make sure to subscribe to my RSS feed. Of course if you run a web site and think a Web Performance Optimization project could help you improve metrics, please contact me and we can work on this.
Continuing my series of posts about Web Performance Optimization (WPO), here is a thought: focus on high impact web pages first. This might seem obvious when you read it, but from my experience most people don't actually put limits to a WPO project and over time the benefits are diluted.
The first thing to do is to identify possible candidates to a WPO project. In a previous project we found out one single script was hit with requests 80% of the time. We (the web site owner and myself) decided to concentrate efforts on this web page first.
Basically, we apply the Pareto Principle and concentrate our efforts on that page responsible for 80% of the total requests using only 20% of the overall time of a full WPO project, with more immediate results. We then have time to concentrate on the other 20% of pages which could take up to 80% of the project time, if needed.
Obviously if you have a page that is hit only a few times a day but still manages to bring the whole web site down, then this should be looked at too.
The tools of choice for this part of the project are web site analytics (Google Analytics is my favourite one - it's free!). Data needs to be collected for a while to help determine the exact focus of the sub project.
Once a web page is selected then a holistic approach takes place. Waterfall diagrams (I will talk about these in another post later) can be used to determine the balance of back end and browser side load times, helping determine which side needs more urgent attention. Scripts can be used to monitor events and report back with signals that can be used to determine specific areas causing slow rendering on the client side.
I will keep posting in this series - make sure to subscribe to my RSS feed. Of course if you run a web site and think a Web Performance Optimization project could help you improve metrics, please contact me and we can work on this.