Ethereal-users: RE: [Ethereal-users] SQL Slammer - How to identify
Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.
From: Greg Saunders <gregs@xxxxxxxxxxxxxxx>
Date: Thu, 18 Nov 2004 11:05:53 -0500
Thanks for the replies... Here is why I was asking. (This is kinda long to give the detail) *** Details *** - We have an HP Procurve 4000M which is connected to subnet 172.20.2.0. - This subnet has 10MB and 100MB devices with some having full duplex and a small number having half duplex. Most of the half duplex are print servers. - The HP Procurve 4000M has 3 ports that connect to other switches. 1 to another HP Procurve 4000M and 2 to small linksys switches that have most of the slow print servers and such tied to them. - We are running on a Windows NT domain with NT servers, 2000 servers, 2000 workstations, XP workstations, 98 workstations, SQL 7.0 server, Exchange 5.5 Server, HP print servers, and Intel print servers. - We generally don't have high network utilization, it is normally below 5 percent all the time, once in a while when we do a network backup or large file transfers will it go higher. *** Symptoms *** - We first noticed problems when our Network Monitoring software (basically ping sweeps, event log monitoring, and NT server monitoring) noticed multiple servers going down for just a few seconds. The ping was set give an alert if it detected 3 or more pings greater than 500ms. I also bumped it to 6 or more pings greater than 500ms to make it less sensitive. Later we found that it wasn't because of low latency, but pings requests not getting replies at all. - We also noticed applications like Outlook would loose connection to the Exchange server briefly, applications that communicated directly to SQL would loose connection and you would have to restart the app, and other similar problems. - Based on a 1 minute ping sweep of about 10 servers we are seeing a drop of about 10 to 20 drops a day. - Based on a 30 second ping sweep of about 10 servers we are seeing a drop of about 30 or more a day. - Based on a 10 second ping sweep of about 10 servers we are seeing a drop of about 60. - This is not isolated to one server it is happening on all of our servers. - Some servers seem to experience it more than others, but no pattern is really found here. - We never see a "link loss" it is always just a matter of not being able to communicate with the server for short time periods - Usually the loss of communication is about 3 seconds, but I have seen it last up to a minute. *** Testing / Diagnosing *** - We first started examining all of the logs and alerts that were generated by the HP Procurve Switch and found nothing. There were a couple of CRC/alignment errors on one port and a couple of FCS rx errors on another port. But other than that we have no errors. - The switch diagnostics don't show any problems and looking at the numbers the buffers are not used up and the memory is not used up. - All servers have the latest MS security patches as well... Especially ones that used the RPC buffer overflow problmes. - All of our units and servers are protected with TrendMicro products with the latest patterns and they don's see anything. - We decided to start capturing packets to see if there were any of the common worm / virus. For capturing of packets I had the switch's monitoring port monitor all ports so I could get a dump of everything. I did this for several hours and after examining the captures I didn't see anything resembling a worm and none of the basic filters for the most common worms revealed anything. - One of my tests was while I was pinging server A from PC B I was not getting replies. I then went to server A and ping out to another unit. Server A was getting replies and as soon as I had done this PC B started getting replies from Server A. When I stopped the pinging from Server A, PC B quit seeing Server A. After about a minute everything went back to normall. - I worked with HP and they swapped out the HP Procurve Chassis and we ended up with the same results. - After a bunch more tests HP was willing to send all new modules and we swapped them and we have the same problem. So now we have a whole new switch and modules. - HP wanted to determine whether or not the pings were actually traversing the switch and making it to the servers so we did the tests below. Listed you will find the ports we were testing and how things were connected. ********** Server TBGSMS with IP 172.20.2.6 on Port B1 Notebook TBGITSOLO2K with IP 172.20.2.12 on Port D6 (which is only monitoring port B1 and the notebook is capturing packets) PC TBGGREGS with IP 172.20.2.4 on Port D7 (This pc is also capturing packets) I turn packet capturing on for both 172.20.2.12 (D6) and 172.20.2.4 (D7) I then start a ping utility that sends 1 ping every 2500ms with 32 bytes continuously from 172.20.2.4 (D7). The pc on 172.20.2.4 (D7) is capturing every packet it sends and the replies. The Notebook on 172.20.2.12 (D6) is capturing all packet inbound/outbound to 172.20.2.6 (B1) as well as it's own traffic. Then pc on 172.20.2.4 (D7) shows that it did not receive replies to 5 icmp ping requests in a row which spans 20 seconds. I then look at the capture on the monitoring port (D6) and it does not see the 5 pings that were outbound from 172.20.2.4 (D7). Basically the ping requests never traversed the switch. In the capture log on 172.20.2.12 (D7) you see all the ping requests and replies to and from 172.20.2.6 (D6) up until those 5 never show up and then you start seeing them pick back up again. HP wanted to verify that the pings actually got to the switch from 172.20.2.4 (D7) so we started monitoring (D7) from (D6) and tested for this. When 172.20.2.4 (D7) pinged without getting replies I was able to see that the pings were getting to the actuall D7 port via the monitoring port and they were intact. ********** - So... I was able to prove that periodically the switch was not forwarding packets to the destination they were intended to be. - Never did we see the utilization of the network or the switch go up sufficiently to cause dropped packets. We are talking 1 to 10 percent utilization with the majarity of the time it is well be low 5%. - In my mind the only other thing I could think of is that the MAC table in the switch is possible getting incorrect MAC addresses being set and causing the problems we are seeing. Oh... We are NOT using VLAN's and NOT using tree spanning. The switch is setup basically as default. - I am currently capturing in a text file the listing of the MAC address assignments to the ports repetatively hoping to catch a port changing its MAC address assignment. *** Conclusions / Help needed *** - I am open to any suggestions on what this problem could be. - I am open to any assistance you can provide on how to use Ethereal to catch something going on. What would you recommend looking for to help isolate this problem. Thanks in advance and if you got this far then I am impressed. Greg -----Original Message----- From: Andrew Hood [mailto:ajhood@xxxxxxxxx] Sent: Thursday, November 18, 2004 5:10 AM To: Ethereal user support Subject: Re: [Ethereal-users] SQL Slammer - How to identify Greg Saunders wrote: > Hey folks, > > How can I identify the SQL slammer if I am capturing all the packets > on my switch through a monitoring port? What specifics should I look > for… is there a filter or something to spot this? I've seen Martin's reply, and would agree installing Snort would be a simpler solution than trying to get Ethereal to pick them out. The Snort rules for CVE CAN-2002-0649 a.k.a. Slammer a.k.a Saphire are: alert udp $EXTERNAL_NET any -> $HOME_NET 1434 (msg:"MS-SQL Worm propagation attempt"; content:"|04|"; depth:1; content:"|81 F1 03 01 04 9B 81 F1 01|"; content:" sock"; content:"send"; reference:bugtraq,5310; reference:bugtraq,5311; reference:cve,2002-0649; reference:url,vil.nai.com/vil/content/v_99992.htm; classtype:mis c-attack; sid:2003; rev:6;) alert udp $HOME_NET any -> $EXTERNAL_NET 1434 (msg:"MS-SQL Worm propagation attempt OUTBOUND"; content:"|04|"; depth:1; content:"|81 F1 03 01 04 9B 81 F1|"; con tent:"sock"; content:"send"; reference:bugtraq,5310; reference:bugtraq,5311; reference:cve,2002-0649; reference:url,vil.nai.com/vil/content/v_99992.htm; classty pe:misc-attack; sid:2004; rev:5;) alert udp $EXTERNAL_NET any -> $HOME_NET 1434 (msg:"MS-SQL version overflow attempt"; dsize:>100; content:"|04|"; depth:1; reference:bugtraq,5310; reference:cve ,2002-0649; reference:nessus,10674; classtype:misc-activity; sid:2050; rev:5;) -- There's no point in being grown up if you can't be childish sometimes. -- Dr. Who _______________________________________________ Ethereal-users mailing list Ethereal-users@xxxxxxxxxxxx http://www.ethereal.com/mailman/listinfo/ethereal-users
- Prev by Date: RE: [Ethereal-users] can ethereal detect slow speed problem ?
- Next by Date: [Ethereal-users] Enhancement Request - Display as EBCDIC
- Previous by thread: RE: [Ethereal-users] SQL Slammer - How to identify
- Next by thread: RE: [Ethereal-users] SQL Slammer - How to identify
- Index(es):