7. SYN Cookie: Troubleshooting Stats
Introduction
In this article I will explain what SYN Cookie stats you can consult and their meaning. There are more complex stats than the explained in this article but they are intended for helping F5 engineers when cases become complex.
SYN Cookie stats
First at all, in order to troubleshoot SYN Cookie you need to know how you can check SYN Cookie stats easily and understand what you are reading. The easiest way to see these stats in a device running LTM module based SYN Cookie is by running below command and focusing in SYN Cookies section of the output:
# tmsh show ltm virtual <virtual> SYN Cookies Status full-hardware Hardware SYN Cookie Instances 6 Software SYN Cookie Instances 0 Current SYN Cache 2.0K SYN Cache Overflow 24 Total Software 4.3M Total Software Accepted 0 Total Software Rejected 0 Total Hardware 21.7M Total Hardware Accepted 3
Let’s go through each field to know what they means:
Status: [full-software/full-hardware|not-activated].
This value describes what type of SYN Cookie mode has been activated, software or hardware.
Once an attack has finished it is normal that LTM takes some time to deactivate SYN Cookie mode after attack traffic stops. Calculation is complex and related to specific platform and several factors, also we do not want SYN Cookie entering in an activating/deactivating loop in case TCP SYN packets per second reaching device are near to the configured SYN cache threshold. So delay of 30-60 seconds before SYN Cookie being deactivated is in normal range.
Hardware SYN Cookie Instances: How many TMMs are under Hardware SYN Cookie mode.
Software SYN Cookie Instances: How many TMMs are under Software SYN Cookie mode. It indicates if software is currently generating SYN cookies.
Current SYN Cache: Indicates how many embryonic connections are handled by BIG-IP (refreshed every 2 seconds). SYN cache is always counting embryonic connections, regardless if SYN Cookie is activated or not. So even if SYN Cookies is completely turned off, we still have embryonic flows that have not been promoted to full flows yet (SYNs which has not completed 3WHS), this means that cache counter will still be used regularly, so is normal having a value different than 0. Disabling SYN Cookie will not avoid this counter increase but thresholds will not be taken into account.
Since SYN cache is no longer a cache, as explained in prior articles, and the stat is merely a counter of embryonic flows, it does not consume memory resources. As the embryonic flows are promoted or time out, this value will decrease.
SYN Cache Overflow (per TMM): It is incremented whenever the SYN cache threshold is exceeded and SYN cookies need to be generated. It increments in one per each TMM instance and this value is a counter that only increases, so we can consult the value to know how many times SYN Cookie has been activated since last stats reset, remember, per TMM.
Total Software: Number of challenges generated by Software. This is increased regardless if client sends a response, does not send any response or the response is not correct.
Total Software Accepted: Number of TCP handshakes that were correctly handled with clients.
Total Software Rejected: Number of wrong responses to challenges. Remember that ALL rejected SYN cookies are in software, there is no hardware rejected.
Total Hardware: Number of challenges generated by Hardware.
Total Hardware Accepted: Number of TCP handshakes that were correctly handled with clients.
In case you have configured AFM based SYN Cookie then you can use two easy sources of information, the already explained LTM command above, but also you can check stats for TCP half open DoS vector, at device or virtual server context, as you would do with any other DoS vector. Let’s check the most important fields at device context.
# tmsh show security dos device-config | grep -A 40 half Statistics Type Count Status Ready Attack Detected 1 *Attacked Dst Detected 0 *Bad Actor Attack Detected 0 Aggregate Attack Detected 1 Attack Count 2 *Stats 1h Samples 0 Stats 408 *Stats Rate 408 Stats 1m 104 Stats 1h 0 Drops 1063 *Drops Rate 1063 Drops 1m 187 Drops 1h 4 *Int Drops 0 *Int Drops Rate 0 *Int Drops 1m 0 *Int Drops 1h 0
Status: This field confirm if this specific DoS vector is ready to detect TCP SYN flood attacks. You have to take into account that you could decide to configure this DoS vector with Auto-threshold, in which case AFM would be in charge of deciding the best threshold. Note that by enabling auto-threshold AFM would need some time to learn the traffic pattern of your environment, until it has enough information to create a correct threshold you will not see this field as Ready but as Learning.
If vector is configured manually you always see it as Ready.
Attack detected: It informs you if currently there is an ongoing attack and detected by AFM.
Aggregate attack detected: Since DoS stats shown above are for device context the detected attack is aggregate.
Attack Count: Gives information about how many attacks have been detected since stats were reset last time. It does not decrease.
Stats: Number of embryonic connections at this current second. Remember that since this is a snapshot, the counter could go increase or decrease. This is different to other DoS vectors where it only increases.
Stats 1m: Average of the Stats in the last minute. This is the average number of embryonic connections that AFM has seen when taking sampling every 1s.
Stats 1h: Average of the Stats 1m in the last hour.
Drops: It counts the number of wrong ACKs received. In other words this is the current snapshot of: Number of SYN Cookies – Number of validated ACKs received
Drops 1m: Average of the Drops in the last minute
Drops 1h: Average of the Drops 1m in the last hour
I have added an asterisk to some fields to point to values that has not real meaning for SYN Cookie DoS vector, or information is not really useful from my perspective, but they are included to have coherence with other DoS attack stats information.
Note: Detection logic for this vector is not based on the Stats 1m, as other DoS vectors, instead it is based on the current number of embryonic connections, that is, value seen in Stats counter.
Interpreting stats
Once you know what each important stat mean I will give some advises when you interpret SYN Cookie stats. You should know some of them already if you have read all articles in this series:
- Hardware can offload TMM for validation, so you can see a number of software generated SYN Cookies much bigger than software validated SYN Cookie since software generates cookies that are validated by hardware as well.
- Since it is possible that a software generated SYN Cookie be accepted by hardware and vice versa, hardware generated SYN Cookies be accepted by software, you could think that value for Total Software is the same than the result of adding the Total Software Accepted plus Total Software rejected. But that is NOT true since it can be possible that a SYN Cookie sent by BIG-IP does not have a response (ACK), quite typical during a SYN flood attack indeed. In this case nothing adds up because generated SYN Cookie was not accepted nor rejected.
- Remember that SYN cookie’s responses discarded by hardware will be rejected by software, so all rejects are in software. This is why there is not counter for Total Hardware rejected. This means that Total Software Rejected can be increase by Total Hardware.
- When there is an attack vectors definition that match this attack will be increased. So, for example, during a TCP SYN flood attack you will see that Stats increases for TCP half open and TCP SYN flood (maybe others like low TTL,... as well). Also Stats will increase in all contexts (device and virtual server). The first vector in an specific context whose limit is exceeded it will start to mitigate. This is important because in cases where TCP SYN flood DoS vector has a lower threshold than TCP half open DoS vector, you will notice that traffic is dropped but you did not expect this behavior. Check article 5 for more information.
Example
In this part you will learn what stats changes you should expect to see when a TCP SYN flood attack is detected and SYN Cookie is activated, so it starts to mitigate the attack. Let’s interpret below stats:
During attack
Status full-hardware Hardware SYN Cookie Instances 6 Software SYN Cookie Instances 0 Current SYN Cache 1 SYN Cache Overflow 24 Total Software 10.1K Total Software Accepted 0 Total Software Rejected 0 Total Hardware 165.1M Total Hardware Accepted 3
After attack
Status not-activated Hardware SYN Cookie Instances 0 Software SYN Cookie Instances 0 Current SYN Cache 15 SYN Cache Overflow 24 Total Software 10.1K Total Software Accepted 0 Total Software Rejected 0 Total Hardware 171.0M Total Hardware Accepted 3
- Before the attack, before SYN Cookie is activated, you will see Current SYN Cache stats starts to increase quickly since TCP SYN packets are causing an increment of embryonic connections.
- Once SYN Cache threshold has been reached in a TMM then SYN Cache Overflow will increase attending to the number of TMMs that detected the attack. In above example you see 24 because this is a 6 CPUs device and 4 TCP SYN Flood attacks were detected by SYN Cookie. Remember this counter will never decrease unless we reset stats.
- During attack SYN cookie is activated so Current SYN Cache will start to decrease until reaching 0 because SYN Cookie Agent starts to handle TCP 3WHS, in other words, TCP stack stops to receive TCP SYN packets.
- We can check how many TMMs have SYN cookie activated currently looking at Hardware/Software SYN Cookie Instances counter. Note this match with what I explained for SYN Cache Overflow value.
- Legitimate connections are counted under Total Hardware/Software Accepted. In this case, we can see that although several millions of TCP SYN packets reached the device only 3 TCP 3WHS were correctly carried out.
- As you can see this device has Hardware SYN Cookie configured and working, but you can read that software also generates SYN Cookie challenges (Total Software). As commented in article number six of these series, this is expected and can be due to collisions or due to validations of first challenges when SYN Cookie is activated and TMMs handles TCP SYN packets until it enables hardware to do it. This does not affect device performance.
- In order to change from ‘Status full-hardware’ to ‘Status not-activated’ both HSBs must exit from SYN Cookie mode.
Conclusion
Now you know how to interpret stats, so you know can deduce information about the past and the present status of your device related to SYN Cookie. In next two article I will end up this series and this troubleshooting part talking about traffic captures and meaning of logs.