This felt so good when I solved it (like a good bm) that i just had to follow it up with a blog post.
A new client of ours (Financial Services company) has a T1 line that handles their voice and data. They mentioned that their Internet is slow. I spoke to the ISP who told me this customer’s bandwidth is maxed out. They have 500k upload and it is being saturated. After checking the applications that they typically use in their business I found that they have an ACT database that is hosted in the cloud. I spoke to the technical support people for the hosted database and they told me that for the number of users this account has the most bandwidth they should be using is 190k. Furthermore, they told me that if the browser connection to the database is closed the utilization should be zero.
With this information in hand I started to do some investigation. This office is not too big and the people there are very cooperative and they said I could do some testing during office hours.
One colleague suggested using a Calyptix UTM device for its bandwidth monitor. I do use these devices but the bandwidth monitor graphic is not really useful for this study – not enough magnification (see insert).
I had recently read about the TOMATO firmware that can run on Linksys routers. I grabbed an old Linksys router from the gadget warehouse (the newer versions don’t work with Tomato) and installed the firmware. The bandwidth monitor in the Tomato router was perfect (see insert). This Tomato display is from the actual problem network. The green stuff at the bottom is download activity which was not the problem. The blue stuff averaging 700+k is the outbound problem.
Next I asked everyone to shut down their ACT web sessions. There was no change in the activity when I did this.
Next I disconnected a single workstation from the network. I saw a dramatic drop in the outbound activity. I thought I may have found the problem workstation. HOWEVER, after I waited a short while, with that workstation still unconnected, the activity RETURNED to the high level!
I reconnected that computer and disconnected another computer. I saw similar results. Activity fell way off but then returned to the high level.
At this point I decided I would have to take some more disruptive action so I told the business owner I would return to do more testing when the office was closed.
Last night, after our trick or treaters waned – I went back to the office.
I removed all the workstations from the network and then connected just one. I saw the high level of outbound activity. I then put to use two tools. I used WIRESHARK to capture an interval of traffic. I then sorted the traffic on Destination IP address and found one external IP that was the bulk of the traffic. I next used Sysinternals TCPVIEW. This utility enabled me to see applications/processes that were running along with what resource they were accessing. I spotted my external IP address in the list and found that the CULPRIT was AFPW.OUTLOOK.SERVICE.EXE. This was buried several folders deep in an Act Web folder on the computer. When I killed this process – VOILA – my outbound traffic utilization dropped to zero!
On Monday I will contact the ACT database people to find out if this errant process is even related to what they are doing or maybe just left over from some earlier and now unused installation.