At DevReach 2011 I did session on Load Testing in the Cloud together with Anton Staykov. We spent huge amount of research to prepare this session and I would like to share some of my experiences here. The result of having Visual Studio ALM MVP and Windows Azure MVP both work on same project is just outstanding!
In Visual Studio 2010 you have the possibility to record web performance test that records all the packets that are being sent to the web server and then replay the complete recording. You can also “encapsulate” the web performance into a load test and run the same web test thousands of times instead. This will generate huge utilization to your web server but it also puts a great demand to your test infrastructure in terms of hardware resources. Usually web servers are designed to handle great amount of traffic. You will need at least the same or even more hardware resources to generate enough load to your web server so you can measure peak/stress conditions.
Visual Studio allows you to create “rig” of one TFS Controller and multiple TFS Agents that will help you distribute the load over separate test machines. An agent must be installed on each test machine and each agent is registered with only one controller.
In our scenario we decided to showcase what is the HTTP latency impact using real Internet connection. We decided to place TFS Agents in the cloud (Windows Azure) and use them to generate HTTP traffic with real latency. The thing with the latency is that you cannot really emulate it. You can emulate send/receive delays and I believe this is how it is done with the Network Emulation in Visual Studio. However there is just no way to tell the packets how fast they should move on the wire. And we wanted to get the real picture. A thanks goes to my good friend Richard Campbell for sharing this idea and other precious tips.
Additional benefit that you get by placing the agents in the cloud is that you have geographical distribution using different Windows Azure datacenters all over the world. This is crucial for our demo as we want to show real HTTP latency and the best way to do that is to have geographically dispersed test agents all over the world.
With Windows Azure you can elastically size your installation on demand. This means that we can easily fire up 100 agents that will generate load to our web server, use them for few hours and them close them down. This is a great opportunity to scale your test infrastructure and pay for only what you are using. It is worth mentioning that unlimited number of virtual users is possible if your company has MSDN Subscription with Visual Studio Ultimate edition. This feature comes with the Visual Studio 2010 Load Test Feature Pack. A single license of the Ultimate edition is enough to enable this feature pack. This feature pack is actually a license key that you add in TFS Controller virtual users dialog box. Just for comparison – unlimited virtual users in other load testing solutions usually cost 6 digit amounts of cash, while the price of VS Ultimate with MSDN is much more affordable.
And last, having external clients hitting your web farm infrastructure is really the only way you can test how good your load balancer is working. And if it is working properly at first place. One hint – all instances from a single deployment (that is instances of the role located in one datacenter) end up using the same external IP. This is important to know from the load balancing prospective. So you must deploy your test agent role to multiple datacenters, use multiple separate deployments or have a special HTTP header tag generated in your web test. The first two techniques does not require configuration on the load balancer side but are pretty limiting in terms of unique IPs that can be used. The third technique requires load balancer to be configured to identify this custom HTTP header and apply client affinity based on that.
In short here is what we do. First record a web performance test using Visual Studio, then create load test that contains the recorded web test. The web performance test should hit the web server under test. You can easily change the domain name recorded in the web test using the Parameterize Web Servers feature in Visual Studio. This will extract the recorded domain name in a variable that is easy to change and switch to other web server. It will allow you to switch between staging and production environment pretty easy later on. After that we select the test to be run executed remotely and specifying the test controller we want to use. As we execute the load test now, Visual Studio sends all metadata about our load test scenario to the controller. The controller queries all available agents and sends them instructions on how to execute the load test. The controller is also responsible for weighting the number of tests executed on each agent. Agents replay the recorded web test steps and generate HTTP requests to our web server. The result of the execution of each test is send back to the controller, which in turn sends back the collected data from all agents to the Visual Studio. Controller is also responsible for collecting performance counters from the system under test i.e. the web server. All data that the controller receives can be stored in database and reviewed some time later.
When the load test complete all received data gets aggregated and you will get nice summary of what has been done during the load test. Part of the summary is shown below. Detailed data is also available in different tabs.
Max User Load | 50 |
Tests/Sec | 0.017 |
Tests Failed | 0 |
Avg. Test Time (sec) | 111 |
Transactions/Sec | 0 |
Avg. Transaction Time (sec) | 0 |
Pages/Sec | 1.50 |
Avg. Page Time (sec) | 27.7 |
Requests/Sec | 5.87 |
Requests Failed | 0 |
Requests Cached Percentage | 35.5 |
Avg. Response Time (sec) | 7.27 |
Avg. Content Length (bytes) | 245,429 |
How we did it?
We used Worker Role in Windows Azure to install the TFS Agent bits. The thing with Azure is that you can fire up plenty of machines (basically unlimited number) that will have the agents installed. The Worker Role serves as a “template” for new instance invocation. However you need to have a fully automated process of installing and configuring everything that’s required. The reason for this is that when a new instance (a new machine) of your worker role is starting up, it should do everything by itself. No manual configuration steps should be involved. Azure may also “heal up” your machine – it is a process of replacing your unhealthy old instance with a brand new and clean instance of your role “template”. Installation of TFS Agent bits is pretty easy. It is a simple copy process of all required binaries. There is a configuration step that needs to be executed at a later stage to get it all configured. This step takes about few minutes to get done.
In addition to the TFS Agents bits, you will want to install Visual Studio 2010 SP1. This is a time consuming process of about 20-30min depends on the Azure hardware configuration that you are using. We used SP1 Web install as it is able to download all required binaries from the Microsoft Download Center directly. Since all ingress (incoming) traffic in Azure is free the size of the SP is something we do not really care about. Automated installation of SP1 bits was the step I spend huge amount of time. One big gotcha is the location of TEMP folder in Windows Azure. By default the TEMP folder size is limited to 100MB. I had to create a local storage with enough space to fit the bits downloaded by VS SP1 and replace the existing TEMP folder environment variable to the location of my local storage folder. After all SP1 bits are installed a restart is required to finalize the installation process.
When the instance is up and running you must also run TFS Agent configuration tool. This step installs TFS Agent as a service or as an interactive process on the machine. The latter is needed if you want to execute CodedUI tests on the agent. If you are planning to run web and load tests only, install the agent in service mode. The agent also gets registered with the controller. The registration process requires that port 6901 is open on the controller machine and port 6910 is opened on the agent. The configuration process makes sure that firewall is configured properly. However you also need to make sure Azure is aware of your intend to use these ports. Internal Endpoints on the web role must be defined to allow port 6910 (that is the agent communication port).
So far I only discussed how to configure TFS Agents on the cloud. In order to get the full rig running you also need TFS Controller running. My recommendation would be to have the controller installed in the same network where the web server(s) are located. The controller collects performance counters from the web server. This technology uses UDP as transport protocol and UDP is not supported in Windows Azure. So if you choose to place the controller in the cloud, close to the agents, you will lose perfmon data collection from the system under test i.e. the web server. On the other side when the controller is running on premises, i.e. close to the web server, a lot of data transfer will be charged from the agents to the controller. Still I prefer the first option so I can have the most data out of it. Perfmon data from the system under test also gets validated by built in validation rules within Visual Studio and different thresholds are applied. This is a really powerful way to get some guidance of how good or bad your web server is performing by applying warning and critical thresholds values. All rules are customizable and you can built new ones or edit the existing ones.
There is one final glue to stick this all together – it is Azure Connect. Connect is Microsoft implementation of VPN serverclient tool that works on Windows Azure. Currently Azure Connect is CTP only but it works very stable. With Azure Connect we put agents in the cloud and the controller on premises into one Endpoint group. At the point you get the agent role and controller set up in one endpoint group they can connect to each other using IPv6. The web server itself does not have to be added to this group as we are sending HTTP requests using its public domain name.
Now that we have everything scripted and automated, the whole process of firing up a new instance of our agent role and registering the agent with the controller takes roughly about 1 hour. This is basically because SP1 install is time consuming. However 1 hour is still pretty good accomplishment. You can start hundreds of agents by a simple Azure deployment configuration change and you are ready to generate insane amounts of requests hitting your web farm.
Is your web server ready to handle the holiday traffic? Contact us to find out the answer. Within 1 hour we can load test your web farm and see if it is “holiday” ready.