Immediately after the changes I used Google Webmaster and Bing Webmaster tools to let search engine crawlers know about this change. Pretty happy on how things are going:
Googlebot crawling the new HTTPS domain:
Search results showing the old HTTP URLs:
Search results now showing the new HTTPS URLs (the line before the big uptick is the content pages already served over HTTPS, before the whole site changed):
Up until now we only used SSL for login, registration, private messages and profile pages plus assets (images, CSS and scripts).
Now everything is covered.
I started using SSL many years ago and wanted to have the site fully served over HTTPS for quite a while. Started by enforcing HTTPS on some content-sensitive pages and moving assets to HTTPS domains, including redirects to ensure clients used the correct schema. Last week I deployed an update for Geekzone mobile to make sure it worked on HTTPS and yesterday I did the same on the full desktop version of the site.
Also included in this change is the addition of a "Secure" flag to cookies used on these domains. This ensures cookies only move between the client browser and server when there's a secure connection. If anyone requests http://www.geekzone.co.nz instead of https://www.geekzone.co.nz the server will instruct the browser to redirect to the correct location while the browser knows not to disclose the cookies until the secure connection is established. This is essential to avoid session hijacking (unless of course we talk MITM attacks, of course).
Why have all this trouble for a forum? Because we have lots of industry (telcos mainly but other companies around too) people using the site. Account numbers, PIN and passwords are sometimes sent via our private message system (which has been served using the HTTPS schema for quite a while) so it makes sense to extend this to the whole site.
In addition to this, for the last few months I have been using ThisData to collect, analyse and understand user behaviour around the site, in real-time, to quickly determine if an account could've been compromised. Up until now we were using it in "read mode" and tracking notifications. Last week I changed the webhook/API to actually start closing sessions and blocking IP addresses if a user confirms a breach occurred.
ThisData receives millions of transactions reports (login, logout, forum post, message sent, message read, password change, new registration, avatar change, invalid password, etc) from us every month and uses machine learning to observe and assign a "risk" to each transaction. Based on this risk result our forum software can take different actions to protect our users - like the ones I described in the previous paragraph.
I have also added a Geekzone ruleset to the HTTPS Everywhere project. This ensures that browsers using the HTTPS Everywhere add-ons will know to use the HTTPS schema instead of HTTP even if the source explicitly refer to the HTTP version (including references to any Geekzone resource served in non-Geekzone pages). This is important because Cloudflare also uses the same ruleset when doing the automatic HTTPS upgrade for some of their millions of clients around the Internet.
We also use other platforms to prevent spammers and scammers joining the site. One or another can sometimes get past all this protection but our moderator team is pretty quick to act and our community is really good at reporting suspicious behaviour.
There are lots more to be done, for sure. But it feels good when all this falls into place.
This is a very interesting read about Google Chromium cache performance.
For example: "How long do you think it takes for an average Windows Chrome user to fill up the browser cache? Well, for those users who filled up their cache , 25% of them fill it up in 4 hours. 50% of them fill it up within 20 hours. 75% of them fill it up within 48 hours. Now, that's just wall clock time...but how many hours of "active" browsing does it take to fill the cache? 25% in 1 hour, 50% in 4 hours, and 75% in 10 hours. Wow. That seems really quick to me. Remember though, every resource goes into the cache, in order to support back-forward navigation."
Now this part is frightening: "So, a quickly filled up cache is a one reason why servers perceive a lower than expected cache hit rate. While chatting with Ricardo, he drew my attention to a few other anomalies in our metrics. First, a surprisingly high number of users like to clear their cache. Around 7% of users will clear their cache (via chrome://settings) at least once per week. Furthermore, 19% of users will experience fatal cache corruption at least once per week, thus requiring nuking the whole cache. Wow, the cache gets wiped, either explicitly by the user, or due to corruption, for a large chunk of our user base. We definitely need to investigate what's up with all this cache corruption."
I just looked back at the annual State of Browsers on Geekzone March 2013 and comparing to current stats I found that Google Chrome just went up to 44%, Firefox went down to 22% and Internet Explorer went down to 18%, in only seven months. That’s a huge shift towards Google Chrome.
How do you folks think this impact in someone using Chrome in terms of perceived performance? Have you ever noticed any performance change over the course of weeks when using Chrome? And with Internet Explorer 11 for Windows 7 available now (which includes performance improvements, compatibility, SPDY support and more) how is this going to affect things?
After seeing a couple of my tweets about analytics and performance the folks at Pingdom asked me a few questions to put together a blog about Geekzone performance. How we maintain the site, how we collect data (including real user monitoring and analytics) and what makes the site run.
You can see some interesting information about browser usage and speeds in our State of Browsers on Geekzone March 2013.
We have been using the Pingdom RUM service pretty much from the start of the beta, released first week of January and should be out of beta soon.
As part of keeping up with times, this last weekend I finished moving the Hyper-V VMs behind Geekzone to Windows Server 2012. Someone in our forums was curious on how we could have Geekzone running on a single VM instance with no load balancers and so, so he asked me to post what's behind our website, how it changed over the years and what do we do to keep performance up.
We currently serve around 230,000 pages a day (user requests and AJAX request for some pages) plus other resources such as images, scripts and CSS files.
When I started Geekzone it was a domain in a shared host service called Ocoloco, provided by a small Masterton-based company called SiliconBlue. In 2003 Auckland-based ISP ICONZ bought Ocoloco and with that they became our hosting providers. Back then we had a single domain running on IIS, Classic ASP and a Microsoft Access database. We were serving 10,000 pages a month after a few months and that was BIG.
Our first project was to move from Microsoft Access to Microsoft SQL, still in the shared environment. We know Microsoft Access doesn't scale well, but back then we never thought we'd be serving more than 10,000 pages a month.
This worked out well until we got big enough that we had to sometimes call our provider and ask them to restart their SQL server two or three times a day, due to the server crashing under our load. ICONZ suggested we should really get our own server (back then virtual environments weren't a big thing).
We bought our first server from ICONZ, an Acer server with 3GB RAM. We installed Windows Server 2003 and Microsoft SQL. An entire server just for us! It worked fine for a few years until we got to the point where our requirements were really pushing the limits of that 32 bit hardware.
HP came into play and we were supplied with a HP Proliant DL360 G5 server (like the one in the picture above) with 10GB RAM. Loaded with Windows Server 2008 and Hyper-V we had enough to run a VM for Geekzone (IIS/SQL database), a test VM and a monitoring VM.
That's when I started getting serious about performance. While many companies solve their performance problems by installing more hardware we tried to use more of the resources we had available. The monitoring VM runs SQL Sentry and SQL Monitor for database monitoring, cache plan testing and other management tasks. I spent a lot of time optimizing indexes, working the database model and so on.
At this time I also decided to move from a single IIS worker model to a multiple workers (IIS web garden). To get to this point I had to write our session management routines using the SQL database to allow for persistence between the odd server restart (we do restart servers after applying the monthly patches released by Microsoft every second Tuesday of the month) and to allow session to persist between IIS workers. I also worked with Redjungle's Phil to have separated email notification delivery from the web application, as well creating a metaweblog API for our blogging platform and a couple of .Net MVC web sites (Geekzone Mobile and Geekzone Jobs).
Another advantage of this approach is the ability to scale out - and it does work well as I found out when migrating our applications from the old Windows Server 2008 VM to the new Windows Server 2012 VM. I was able to move web applications one at a time and sessions worked across different hosts, sharing the database across a Hyper-V private network.
Around the time we started playing with performance I got to meet the folks at Aptimize, now Riverbed Aptimizer. Aptimize was a Wellington-based company until Riverbed acquired them in 2011. The software works automatically, examining all pages served from our servers and applying rules that determine how to optimize web pages for best client performance. This includes image sprite creation, script and CSS minification, URL rewrite for CDN resources, lazy loading images, loading async scripts and so on. We start using Aptimizer and it improved page speed almost instantly so we had time to put a lot of effort into the database side of things, to get everything a step further.
Around 2009 we decided to move our server from ICONZ, mainly due to colocation and traffic costs. We know 60% of our traffic is New Zealand-based, and of those 75% is from Auckland alone, so when the time came for us to move hosting companies we examined a few companies around Auckland and decided to go with Datacom. They were really good at putting together a package for our small one man operation. And so one day we unplugged the server at ICONZ, loaded it into Nate's car and drove across Auckland to its new home. The Datacom datacenter is so huge that I am pretty sure i might not ever see the server again.
The Datacom move was really good, with improved bandwidth giving our users even faster access to our website. But we know a lot of people access Geekzone from outside New Zealand so we started using a CDN to distribute the heavy resources around the world. Initially with MaxCDN (their prices are really good) and lately with Cloudflare. There are two reasons we moved to Cloudflare: they have a POP in Sydney, which is pretty close to New Zealand, so we could move to them with low impact to our users and their Pro plans support SSL for the CDN - which was a problem for us before (we used to have different CDN rewrites for SSL and non-SSL pages, now we have only one).
We do not use Cloudflare for page optimization because that would add unnecessary round trips for the majority of ours users. But using Aptimizer together with Cloudflare for CDN we can get our resources closer to users, manage the cache expire in their browsers and in the ISP's proxies making all faster than ever.
Since then we increased memory on the server to 24GB to allow for better memory management as well. And while our Windows Server 2008 was working perfectly well, I decided to move to Windows Server 2012 for a few reasons but mainly because of a faster OS startup, OS support for NIC teaming, and Hyper-V Dynamic Memory. And also because this is Geekzone so why not then?
So that's it. A bit of geek history and things I've done the last few years. More to come (and if you need more information or some help with your current setup, contact me and we can have a chat).
Just finished reading a blog post that shows, once again, that people should use their ISP DNS for better performance when it comes to distributed content.
In New Zealand this is even more important because using a local CDN cache gives broadband users a huge advantage instead of fetching resources overseas through a long undersea cable.
There's a dynamic table where you can check the performance loss/gain depending on which CDN you're targeting. Here is one for Australia:
This table shows how much slower a download will be, based on where the CDN is resolved to.
A positive percentage means performance is worse, negative means performance is better. The first one is Google DNS, the second is OpenDNS.
You see now that using those DNS in Australia (and New Zealand, but unfortunately there's no data in the table for our little country) can make things really bad.
Using your ISP DNS will point to the local cache. Using other DNS will instead point to somewhere else in the world.
Just last week we found out someone is bringing big guns to a fight, as Stuff told us Neil Graham was starting an online marketplace business to compete with the one and only Trade Me.
The new web site, called Wheedle wasn't ready for prime time yet when it was first mentioned online and after a few hours of hiccups it was taken offline until its official launch date, 1st October 2012.
In the brief moments the site was up (and down) Geekzone members started reporting some of the bugs around the site (and here as well). The discussion listed simple things such as listings showing completely unrelated images, to a bit more disturbing problem: pages showing someone else's user names and information.
It is great to see since then the mixed up identities problem seems to have been fixed, but other things popped up.
Right now I can imagine some Trade Me folks talking around a whiteboard:
- Tech Guy: We have a problem with Wheedle.
- Non-Tech Manager: Sure, it's a worthy competitor, backed by someone with deep pockets to go for the long run.
- Tech Guy: Not that, but. . . They store their password in plain text, instead of encrypting it before storing in the database.
- Non-Tech Manager: How do you know this?
- Tech Guy: I registered there and just clicked the "Forgot my password". The email came with my password instead of link to reset it. It tells me the password is stored in plain-text.
- Non-Tech Manager: So? That's their problem. If someone finds a vulnerability and manage to download database contents from their server it's their breach of privacy, not ours.
- Tech Guy: Sure. But reports tells us a good number of people reuse the username and passwords in more than one site.
- Non-Tech Manager: Are you saying if someone used their same Trade Me email or username and password to register on Wheedle then a bad guy in [insert country with lots of bad guys here] could try those on Trade Me and in some cases actually gain access to accounts?
- Tech Guy: Hmmmm, yes.
- Non-Tech Manager: Holy shit, Batman!
We can use another scenario: there is something for sale on Trade Me, and armed with a third party list of valid emails addresses for the buyer a scammer could send out an email pretending to be the seller on Trade Me, saying something like "the item didn't sell, I can offer to you very cheap" and then get the unsuspecting buyer to deposit the payment into someone else's account for laundering.
You might say no one would fall for that. Think again. People fall for simple scams all the time.
I don't know what security they have implemented server-side, but sanitizing input data on the client side is no way to go on life:
If this is done on the client side only, then anyone with interest could easily craft a local page to bypass this weak strategy and send something malicious to the server, potentially gaining access to information stored there through SQL Injections.
The question that popped in my mind was "how long before Trade Me" forces people logging into their site to change their passwords?". Simply put, any third party vulnerability can affect Trade Me as an unintended consequence.
What can you do?
- If you are planning to register on any other site make sure you use a different email address, user name and password.
- If you already registered on any other site then go there now and change your email address and password.
Just do one of those two things and you will be a lot safer.
And for those on Twitter who said we shouldn't be criticising newcomers. I'm happy to support a new online marketplace in New Zealand but security should be part of design since Day 0. I hope this is something for them to consider, and good luck the days ahead.
Another advertising order for Geekzone, another reason to be happy. But I'm actually sad - sad for my readers and advertisers.
You probably know by now I try to get maximum performance out of the servers we use. I also work hard, using different software, services and techniques to get the site as fast as possible.
Many people use ad blockers for different reasons. Some say they find the ads slow down their PCs, others say ads may be vector for malware. Some say ads slow down web page load times.
Assuming we are hosting the creatives (ads) with Google DFP, a single call will be all its needed to get the image and parameters to show it on the page.
If the advertiser is using DoubleClick (a Google company agencies use to manage a campaign workflow) , Google is smart enough to get the ads out exactly like it would do with hosted creatives - that is in a single call.
Between advertisers and publishers there's almost all the times an agency that represents the publisher, trying to sell available inventory. These agencies get paid a commission on each sale they manage to complete. They also like to know how many impressions and clicks campaigns are getting. As a publisher using Google DFP I can easily give agencies access to real-time reports for their campaigns. But I haven't seen any agency that takes advantage of this feature.
Instead, these agencies load the tags supplied by the advertisers into their own systems. In turn they give the publishers their own tags. And we obviously need to load our own scripts to manage the delivery.
So instead of having one script that loads and ad with a single call (the Google DFP and Google DoubleClick integration), we have a script that loads the agency script than in turn loads the advertiser script than in turn loads the ads.
This ads an incredible latency to the whole ad delivery system. Usually these ad agencies don't have servers closer to end users. They don't use CDNs. Things get slow. And when things get slow users navigate away. And when users navigate away then don't see the ads.
For all purposes Google DFP delivered the code and counted one impression. But by the time the browser loaded the second script and is waiting to load the third script the user might have closed the window or clicked a link to go away. So the agency doesn't count the impression. Then they complain there's a difference between my counter and their counter.
Another important thing: Google DFP is smart enough to deliver more impressions of those ads that perform better. In other words, if the advertiser supplies more than one ad then Google DFP will make sure it shows more of the ads getting a higher number of clicks. If we run an agency tag we lose control and can't count the clicks, meaning all ads are delivered in a balanced manner. This mean the optimization that could benefit the advertiser and attract more clicks is lost.
At the end advertisers lose the opportunity to get more clicks, our reader sees pages slowing down, and agencies act as a middle man that really is trying to do more than they should do by getting technical where they don't have the capability and don't actually ad any value.
This is not a rant at one specific agency. Most agencies work like this. They just don't understand that a fast web means more business for everyone.
Last weekend a press release landed in my inbox, and I thought it interesting enough to make me contact the agency and get more information about the product. In summary ScaleArc iDB promised to scale your database without changes in code or database itself:
ScaleArc, the pioneer in a new category of database infrastructure that accelerates application development by simplifying the way database environments are deployed and managed, today announced general availability of iDB v2.0 for Microsoft SQL Server that brings significant new capabilities to SQL Server environments such as instant horizontal scaling, higher availability, faster database performance, increased SQL protection and real-time query analytics. iDB takes a fundamentally different approach at the SQL protocol layer by providing customers with a wide spectrum of capabilities for their database environment in a single solution, without requiring any modifications to existing applications or SQL Server databases.
Until now, moving to advanced architectures like multi-master, or achieving instant scale and better performance within SQL server environments, has been costly and extremely difficult to implement. iDB v2.0 for MS SQL supports a wide range of functions including Read/Write splitting, dynamic load balancing and horizontal scaling, query caching for up to 24x faster query responses, wire-speed SQL filtering and real-time instrumentation and analytics to enhance all deployment modes of SQL server, including SQL Server Clustering, SQL mirroring, Peer-to-Peer (P2P) Replication and log shipping.
iDB for MS SQL Feature Highlights
. Dynamic Query Load Balancing for High Availability: ScaleArc iDB implements a specialized dynamic load-balancing algorithm that allows the most efficient utilization of available database capacity, even when servers have varying capacity. iDB monitors query responses in real-time and can load balance queries to the server that will provide the fastest response to properly distribute the load. Up to 40% better performance has been observed with iDB's dynamic load balancing relative to TCP-based load balancing.
. Pattern-Based Query Caching for Increased Performance: ScaleArc iDB allows users to cache query responses with one-click. No changes are required at the database server or in the application code; the query is cached at the SQL protocol level, providing up to 24x acceleration without any modifications.
. Multi-master: iDB supports multi-master and master-slave scenarios to ensure high availability and scalability. Specific queries, irrespective of their origin, are routed to the right server with the advance query routing engine that also simplifies sharding.
. Real-time Analytics: Advanced graphical analysis tools provided by ScaleArc iDB bring comprehensive real-time awareness of all queries, helping to quickly pinpoint query patterns that are not performing optimally and allowing more precise management.
. Wire Speed SQL filtering: iDB is able to enforce query-level policies for security or compliance reasons to protect against attacks, theft and other threats. iDB can operate outside of the application where policies have not traditionally been easily enforced.
. SQL Query Surge Queue: Extreme loads can lead to unacceptable response times or even halting of operations until the load reduces, leading to "Database not Available" errors. ScaleArc iDB allows a more graceful response to peak loads. When faced with an extreme load, ScaleArc iDB can initiate a SQL Query Surge Queue and momentarily queue queries in a FIFO queue and process them once server resources become available.
Obviously I was a bit worried with their claims, so asked a couple of questions. Here are the answers:
What happens to cached query results when the result changes? For example a record is updated - will the next query use previous results, or get new results?
The key to iDB lies in our Analytics. We provide granular real-time data on all SQL queries flowing between application servers and the database servers. As such, customer now have the intelligence they need to understand the query structure, the frequency it hits the database, the amount of server resources it takes, etc. We then give the customer the power to cache on a per query basis, but we do not set a Time-To-Live for the customer. They need to understand how often the query will be updated, and ensure they do not set a Time-To-Live that may serve stale data is an Update comes in from the Application. We allow customers to set TTL anywhere from 1 second to multiple years. When a cache rule for a query is activated with a single click of a button, we immediately measure the performance and offload impact of the cache. And since our cache on iDB is a hash map that caches the TCP output of Read queries, subsequent Read queries served from our cache are served up to 24x faster (or more).
ScaleArc also has API that can be invoked from the application to add, invalidate and bypass the cache for specific SQL statements
How much more memory does it require? Or does it use the SQL DB footprint?
ScaleArc iDB is a Network appliance like deployment and does not have any agents on the Server or the Application.This would mean that iDB has its own physical/virtual machine to perform its operations. iDB can run load balancing within 4GB of memory, however for caching and logging purposes iDB can address up to 128GB of memory.
iDB is a separate instance from the database. Most customers run our software on a dedicated x86 server to make it a dedicated appliance. We also sell appliances, or iDB can be installed on a hypervisor as a Virtual Machine. iDB does not require a lot of memory to operate, but we can allocate up to 128GB of RAM for caching of READ queries. Query logs are stored on drives on the appliance.
Very interesting - an appliance for SQL TCP output caching. Ok, I have entered my name in to get a 30 day trial and see how much difference it can actually make.
UPDATE: Someone on Facebook said this was advertising. IT IS NOT. I was not asked to post about it, and did not receive any payment to post about it. If you are so inclined please read my FULL DISCLOSURE post.
Learn how the Riverbed performance platform can help you up your IT game
With the growth of virtualization, consolidation, and cloud computing have come new challenges. IT is increasingly consolidated and virtualized while workers and consumers are distributed. How best to harness these approaches and deliver the efficiency and control your organization requires, while ensuring that end users get the performance they need?
Attend the Riverbed Performance Summit to find out how Riverbed empowers enterprises like yours with the tools to analyze, accelerate, and control your IT. Stay on top of the latest technology and solutions from Riverbed and join us for a deep dive into our vision for delivering performance for the globally connected enterprise.
Sign up to connect with Riverbed technology experts and your peers to learn how you can get more out of your Riverbed investment.
At this exclusive event you'll hear firsthand from our experts on how to maximize your Riverbed investment with the latest release of cutting-edge performance platform products and solutions:
- Granite, our revolutionary new product for consolidating edge servers in the data center
- Getting the most out of the latest release of RiOS (7.0), including optimization for video, UDP, IPv6, and VDI
- Steelhead Cloud Accelerator, a new powerful solution for boosting the performance of SaaS applications
- The latest product updates, technical overviews, demos, and more
Register now and discover how to make the Riverbed performance platform work for you. Find out how you can finally consolidate your entire infrastructure, including edge applications, servers and storage to the data center, all without compromising performance.
A shame I won't be attending this event since it falls on the same week I will be in Las Vegas for the HP Discover 2012.