April 15, 2015

Anti-DDoS solution for DNS server (Flow management software)

I'm pleased to announce the opensource release of anti-DDoS solution for high traffic DNS servers - Flow management. It is based on well-known Solaris Crossbow flow feature since OpenSolaris 2009.06 as well as new ORACLE Solaris feature  - CR 18734919 : Support for DSCP marking on flows - which had appeared in Solaris 11.2 on March, 18 2015.

Until recently Solaris IP QoS (complex feature) has been the only Solaris feature that handles DSCP policies, but it lacks the API. As opposite Solaris Crossbow Flow (lightweight feature) has API (although unpublished) and more simplified configuration. Each flow has one or more attributes (traffic classified endpoint) and could have one or more properties (which are subject to be applied on traffic that matches the endpoint). Shared bandwidth flow property maxbw serves for limiting bandwidth and traffic control. However it makes little sense if we have multiple sources of attack (DDoS case) and the design requires that each of them is reflected by dynamically created flow.

Newly introduced flow property dscp adds the ability to create cascade flows by classifying the traffic originated from different sources into one flow (via flow attribute dsfield for indication ToS bits and finally applying maxbw limit), thus creating the premises for DoS mitigation software.
It should be mentioned we should have at least two zones for organizing cascade flows due to Solaris flow design constraints. However those zones could coabit in the same physical server, providing out-of-box cost-effective solution where both DoS protection and target service coexist.

Central part of the discussed software is flow-mgmtd binary which could be run both as data producer (that sends what should be blocked) and daemon (data consumer that handles flows dynamically). All interaction are via IP multicast and you shouldn't bother about configuring connection endpoints in a complex HA scheme. Current production scheme includes 4 redundant Quagga routing zones and 6 standalone DNS server zones (each of them could be served by any available routing zone). Flow client operates within DNS server zone, and flow daemon operates on top of Quagga zone.
DSCP-marked traffic arrives to DNS server zone, here are two statically configured flows that aggregates the traffic by DS field. It is possible to create additional flow with priority set to high in order to insure the traffic originated from some clients will be prioritized according to SLA.

The project is running in production mode since 15.04.2015 and helps to mitigate various DNS DoS attacks (slow-drip, resource exhaustion etc) on caching DNS platform that serves 100Mbps DNS traffic.

What is not covered by current article is detection of attacks and the possible appliance of that project to other fields, for example it is possible to protect WEB services by using the existing software without major changes due to the fact that analytical data about attacks is fed via JSON. You could imagine simple script that generates JSON file periodically replicated by Flow Management Agent.

You can download the sources.

Project goals:
  • protecting against DDoS attack on DNS server & improve quality of service,
  • traffic prioritization according to SLA levels.
Feature list:
  • based on Solaris Crossbow flow feature and flow DSCP marking (CR 18734919),
  • tested on ORACLE Solaris 11.2 SRU8.4,
  • running the software Solaris non-global zones is supported,
  • compatible with Differentiated Services Field (protocol standard RFC 2474),
  • analytical source could be file-based (currently implemented) or interactive,
  • the autoprovisioning of new attacking source due to IP multicast transport use,
  • Sqlite3 backend is used for providing fast recovery with minimal impact on the system. When flow-mgmtd is restarted it uses data from hardcoded file /var/run/flow-mgmt.flow.db which contains previously recorded information about expiring events and flows created. Please note flows are generated as temporary and without the reflection into standard flow database path in order to support custom data model and for compatibility with Solaris immutable zones feature.
How it works:
  1. receives traffic analytics from outside (DNS Cache), takes decision whether to apply traffic policies,
  2. transforms analytical results into JSON protocol,
  3. floods JSON payload up to IP multicast group subscribed members (located on redundant OSPF routers),
  4. flow agent (OSPF router side) receives JSON commands and creates DSCP policies on forwarded traffic (via creation of dynamic flows)
  5. flow agent makes the policies to expire after configurable timeout (thus removing expired flows from the system),
  6. customer traffic arrives to consumer (DNS server) and is redistributed among different flows (statically configured) according to DSCP policies,
  7. high spikes of traffic are supressed prior the arriving to consumer (DNS server); thus saving CPU cycles on DNS server,
  8. customer traffic activities classified in different sinks couldn’t influence each other.
Advantages:
  • small CPU footprint on high load
  • traffic is managed in an intelligent way, not simply dropped; it starts to drop when flow bandwidth allocation maxbw is exceeded.
  • if it isn’t running, all things continue to work as before, but unprotected.
Disadvantages:
  • ORACLE doesn’t publish libdlflow API (part of Datalink management library), and it is still undocumented.

November 15, 2011

Changing traffic policies of DHCPv4 Filtering STREAMS Module at run-time using Solaris MDB

Overview
What will you do, if you are running a mission critical DHCP server and you want to adjust some traffic policies of dhcpmod kernel module without process restart?
The answer is: by using Solaris MDB.

            Finding out where the traffic policies reside
            It is assumed that you have already heard about Solaris STREAMS or even more have read STREAMS Programming Guide.
The basic concept is that Solaris exports special DDI interfaces for working with queues (queue_t). We use reading queue parameter q_ptr which stores the address to user-specific pointer - a structure created by routine mstrmod_open() when the queue is opened. In the case of dhcpmod all traffic policy parameters resides as unsigned integers in that structure. You can consult the dhcpmod's sources for more details.

Firstly you should look for dhcpmod in system streams cache:

mdb -k
> ::walk stream_head_cache | ::stream
<output stripped>
| ^
v |
+-----------------------+-----------------------+
| 0xffffffff9f2ce648| 0xffffffff9f2ce550|
| dhcpmod | dhcpmod |
| | |
| cnt = 0t0 | cnt = 0t0 |
| flg = 0x00000822 | flg = 0x00000832|
+-----------------------+-----------------------+
| ^
<output stripped>

0xffffffff9f2ce550 is a pointer to reading queue of filtering instance 1.

We need to print more information about it:

> ffffffff9f2ce550::print queue_t
{
q_qinfo = mstrmod_rinit
q_first = 0
q_last = 0
q_next = 0xffffffffa1546008
q_link = 0
q_ptr = 0xfffffe86a09d7400
q_count = 0
q_flag = 0x832
q_minpsz = 0
q_maxpsz = 0xffffffffffffffff
q_hiwat = 0
q_lowat = 0
q_bandp = 0
q_lock = {
_opaque = [ 0 ]
}
q_stream = 0xffffffffa1558d90
q_syncq = 0xffffffff9f2ce740
q_nband = 0
q_wait = {
_opaque = 0
}
q_sync = {
_opaque = 0
}
q_nfsrv = 0xfffffe86f89d0aa8
q_nbsrv = 0
q_draining = 0
q_struiot = 0xffff
q_syncqmsgs = 0
q_mblkcnt = 0
q_sqhead = 0
q_sqtail = 0xffffffff91e1c600
q_sqflags = 0
q_rwcnt = 0
q_sqnext = 0
q_sqprev = 0
q_sqtstamp = 0x3a55c18
q_qtstamp = 0xbaddcafebaddcafe
q_spri = 0
q_fp = 0xfffffe8683fca0b0
}

The parameter q_flag is from struct module_info and is equal 0x832 (for module dhcpmod). About STREAMS structures you can consult <sys/stream.h> for more details.
The parameter q_qinfo = mstrmod_rinit is our reading queue's primitive which confirms we are in the right place. The parameter q_ptr (0xfffffe86a09d7400) is our structure.
Now we dump what first 4 paragraphs (in MDB terms) of structure contains:

> 0xfffffe86a09d7400::dump -eq -w 4
fffffe86a09d7400:
9e9e82b0 ffffffff 00000000 00000000
8f81835 00000001 00000001 00000001
0000000a 00000005 00000001 00000000
000000ff 00000000 48a36f90 fffffe82

Offset 18 is the runtime-dependent instance ID of our queue (you can verify it's value via kstat as discussed earlier). If it isn't what you've requested, you should iterate via ::walk stream_head_cache to the next instance until succeeded.

> 0xffffffe86a09d7400+18/B
0xffffffe86a09d7400: 1
Offset 20 contains allowed packets per minute parameter. Offset 30 shows discard by rate policy parameter.
Note that for 32-bit kernel offset values will be different. Specifically, instance ID offset will be 9, allowed packets per minute offset will be 10, discard by rate policy offset will be 15.
Now we can accomplish our tasks.

Task 1. Turn off discard by rate policy.
Verifying discard by rate policy's current setting:

> 0xffffffe86a09d7400+30/B
0xffffffe86a09d7400: ff
You see ff → 255

Writing a new value:

> 0xffffffe86a09d7400+30/W 0x0
0xffffffe86a09d7400: 0xff = 0x0


Task 2. Increasing allowed packets per minute value
Verifying allowed packets per minute value:

> 0xffffffe86a09d7400+20/B
0xffffffe86a09d7400: a
You see a → 10


To increase allowed packets per minute value from 10 to 20 (0x14), type:

> 0xffffffe86a09d7400+20/W 0x14
0xffffffe86a09d7400: 0xa = 0x14


Protecting DHCP server against DoS attacks by creating specialized Solaris kernel STREAMS module



Overview

This article describes how writing kernel STREAMS module could solve DoS various issues faced by user-space DHCP server application that acts in heavy traffic condition. This software was tested on ORACLE Solaris 10 x64 and sparc platform. You can download source code. It ought to be used in conjunction with Internet System Consortium DHCP server. Patches for ISC DHCP software can be downloaded.

Abstract
Hosts in IP networks must be configured before it can communicate between them. Historically this task is solved by manual configuration of each host or via DHCP protocol.
DHCP protocol provides hosts automatically with the most essential configuration (but not limited to!) which is obtained from one or more DHCP servers: IP address, IP network mask, IP of default gateway, cache DNS servers etc.
The well-known scenario is the following : DHCP-aware clients send requests message to the DHCP server. After receiving valid request DHCP server, that manages a pool of IP addresses and other client-related information, assigns IP configuration parameters which is accepted by client via DHCP reply. DHCP traffic uses UDP broadcast or unicast.
Typical DHCP server implementation (ex. ISC DHCP) is single-threaded process that handles incoming DHCP requests and provides the reply packets if needed. Since it uses UDP, it can be easy compromised by sending a large amount of traffic. This leads packets not to be processed and getting dropped on high input load. Moreover the existence of bad implemented DHCP clients in modern IP networks significantly increases the probability of such situation. It is impossible to control which kind of devices are connected to the network.
The common approach to the high load issue has at least two ways:
  • More threads are processing more traffic
Solaris bundled DHCP server is multi-threaded application, and so load of ingress traffic is redistributed between its threads thus reducing the negative effects of attacks if not solving it completely.
As to single-threaded DHCP software products (such as ISC DHCP), significant effort on re-writing of DHCP server software is required in order to have multhread support. Also to process all traffic a vast part of it could be a mess, can't be considered a suitable option.
  • Localize attacker sources and drop before it reach DHCP server
      Standard IP firewalls are of little use, because packets to be filtered all come from small number of DHCP relays. DHCP packets passed through DHCP relay has a common source - IP address of relay. In order to archive the goal more detail about DHCP will be discussed.

Known practices against DHCP DoS
Many modern ISPs often use DHCP relaying for serving many IP subnets on centralized DHCP servers. In that case DHCP relay agent software installed on low-end edge equipment assumes control on traffic retransmission of their network segment. Some vendors (but not all) have already implemented DHCP snooping feature which takes measures against some types of attacks including DHCP DoS. This feature has an disadvantage that it is error prone, because it is required all DSLAMs/FTTH switches configured. The configuration can be more burdensome if VLAN trunking is used, because it must be configured per each VLAN. However the worst disadvantage is that feature cannot prevent flood from an already authorized DHCP client which can send a huge amount of completely justified (and indispensable) DHCP client traffic.
Some vendors have implemented a feature which addresses flood issue. It is so called DHCP sending rate (or rate-limit) feature which places client port state in administratively down for a reasonable period of time if the number of DHCP packets originated from a single port exceeds configured limit. But this prevents that customer from functioning properly until the attack is finished. So technical support will be involved in problem solving even if the port is unblocked automatically after some time.
Discussed techniques help to stop excess of traffic at access equipment edge, but requires special implementation efforts on the hardware vendor side, which isn't suitable in multivendor environments.
Modern ISPs requires more robust, centralized, vendor independent and free of human factor solution.

Module dhcpmod topology scheme
It is proposed to stop traffic at DHCP server before it enters into DHCP server software. That option means to move a part of packet processing into the kernel.
For accessing traffic ISC DHCP under Solaris uses Data Link Provider Interface (DLPI). DLPI V2 enables a data link service user (DHCP) to access and uses any of a variety of conforming data link service providers. ISC DHCP uses Data Link Service (DLS) provider V2. It opens a Ethernet-capable device, creates a stream between DLS provider and DLS user and then registers the desired Physical Point of Attachment (PPA).
New software is a multi-threaded STREAMS module (named as dhcpmod).
Fig. 1 presents module insertion scheme.


For filtering DHCP traffic if you have Solaris zones with DHCP installed in, another technology was choosed – Solaris Packet Filtering HOOK. It will be discussed later in the upcoming article. Why we cannot use it for DLPI-enabled DHCP? Because DLPI acts at lower level than PF HOOK interception entry points and so packet discarding for DLPI-aware clients is impossible.

Processing DHCP packets as STREAMS mblk
STREAMS reading queue delivers it with IP header (8 bytes long), followed by UDP header (20 bytes long). Start of packet is pointed by b_rptr and end of packet respectively by b_wptr.
Module dhcpmod firstly checks for IP version and headers length. All packets are discarded silently if are not directed to UDP port 67 or contains positive values of IP fragmentation offset, or IP flag is non-zero and not equal to IP_DF (don't fragment). Bit alignment checks are included too.
DHCP packet is composed of two parts:
  • BOOTP header
    which is constantly in size and has predetermined number of BOOTP parameters
  • DHCP Options which are not limited by the length of the value. Each option has a specific opcode that represents the type. The length specifies the length of the value. The length field is followed by the value. The set of options are appended to the DHCP packet after the BOOTP part of the packet. The options can be present in any order.
Workflow within kernel module dhcpmod
The reading queue (RQ) is the core element. The basic logic is to extract the most interesting parameters such as source IP address (in DHCP relay environment it helps to distinguish between IP subnets), Client-Ethernet-Address which represents MAC address of DHCP client equipment, DHCP Option 82 parameter (usually used for binding of IP address to physical port on low-end). Last parameter contains so called Agent Remote ID and Agent Circuit ID (at least one of them).
These parameters approximates packet information by using CRC32 algorithm incorporated in Solaris kernel.
Next action is to store calculated combined parameter in a HASH table and packet rating counters as well.
Fig. 2 shows the workflow briefly


For periodic cleanups of HASH table, special callback routine is installed. It is scheduled twice a second via qtimeout(). HASH table size is auto-adjusted (step is the power of two) while it grows to keep its effectiveness.

Management policies in dhcpmod
The module is configured using custom IOCTLS passed to a writing queue (WR). Accepted IOCTL argument is a pointer to uint_t variable. All policies can be set for the current queue instance.
The following traffic policies are supported:

/*
* Ioctls.
*/
#define DHCPIOC ('D' << 8)
/* dropping rate per min */
#define DHCPIOCSDROPANYPPM (DHCPIOC|1)
/* dropping policy if no DHCP Option 82 info */
#define DHCPIOCSDROPPOLIFNORA (DHCPIOC|2)
/* dropping rate per min if no DHCP Option 82 info */ 
#define DHCPIOCSDROPNORAPPM (DHCPIOC|3) 
/* dropping policy for all packets if no DHCP Option 82 info */
#define DHCPIOCSDROPPOLALLNORA (DHCPIOC|4)
/* dropping policy by rate */ 
#define DHCPIOCSDROPPOLBYPPM (DHCPIOC|5) 

DHCPIOCSDROPANYPPM Allowed number of packets for discard packets by rate
DHCPIOCSDROPPOLIFNORA Turning off/on discard packets by rate policy if no DHCP Option 82 information
DHCPIOCSDROPNORAPPM Allowed number of packets if no DHCP Option 82 information
DHCPIOCSDROPPOLALLNORA Turning off/on discard all packets policy with no DHCP Option 82
DHCPIOCSDROPPOLBYPPM Turning off/on discard packets by rate policy

Conclusion
In this article an effort was presented towards implementing new DHCPv4 filtering software to run as a STREAMS module in the Solaris kernel. This software automatically prevents DoS on DHCP server without wasting CPU cycles of DHCP server process.
Hence this solution will add some processing delay. It is supposed that DHCP server application uses DLPI and some filtering work is already done in the kernel space: after attaching to PPA it filters only destination UDP port 67. So an inserting of new module will increase the processing delay imperceptibly.
Also this opens a new room for further enhancements: discarding packets without certain DHCP options etc.
Note that this solution is centralized, vendor independent and can be treated as a part of protecting solution for DHCP large network scheme. It was built for use in conjunction with DHCP snooping and DHCP Option 82 features.



November 14, 2011

DHCP traffic statistics datasheet exposed by DHCPv4 Filtering STREAMS Module via KSTAT infrastructure

          Overview
This software is a STREAMS module that subjects BOOTP packets arriving on its read queue according to special filtering algorithm and passes only those packets that the filter accepts on to its upstream consumer – DHCP server process. In such a way the majority of DOS/DDOS attacks and flood don't influence the resources of DHCP server software which lies in user space.
The ORACLE Solaris Operating Environment (OE) kernel provides a set of functions and data structures - named KSTAT - for device drivers and other kernel modules to export module-specific statistics to the outside world. Current software version of DHCPv4 filter supports KSTAT for extracting its statistics about the processed DHCP traffic.

KSTAT instance behavior
DHCPv4 filter software can provide filtering services to many concurrent DHCP server process. Each of them has a separate inbound entry accessible via kstat utility (class net, module dhcpmod).When the instance is unloaded, the entry still remains in the system making the statistical data available for further analysis. Instance state is denoted by field state which can be 0 if filter instance is uninitialized, 1 – if it is running, 2 – if it is stopped, 3 – if it is failed to start.

Statistic counters
Module dhcpmod maintains and reports the following statistics. All statistics are maintained as unsigned. The statistics are 64 bits unless otherwise noted.
buffer errors
Shows how many errors while packet processing took place. Should be always 0.
cache buckets
Shows how many buckets allocated for HASH tables. Large values indicates traffic bursts.
cache errors
Shows how many errors/miss-consistences allocated for HASH tables. Should be always 0.
cache expired
Denotes cache expiration events.
cache hits
Denotes cache hits events. Should be more then cache misses in factor of 10 at least. May be less for the first ten minutes since instance startup.
cache misses
Denotes cache misses (not found) events.
cache records
Shows actual HASH usage. Large values indicates large traffic bursts.
discarded packets
Denotes discard events since instance startup.
discarded packets per sec
Denotes actual discard events rate. The filter effectiveness depends on that value.
discarded rate limit packets
Denotes actual discard events rate limited. Continuously high values means flood presence.
failure packets
Shows how many packet failed to be processed by traffic filter and were passed in to upstream neighbor. Should be always 0.
fragmented IP packets
Number of fragmented IPv4 packets. Such traffic is discarded silently.
input packets
Number of packets received from NIC.
input packets per sec
Rate of packets received from NIC.
invalid BOOTP packets
Number of BOOTP packets that violates RFC 2132. Such traffic is discarded silently.
invalid IP packets
Number of broken IPv4 packets. Such traffic is discarded silently.
invalid UDP packets
Number of broken UDP packets or its check sums. Such traffic is discarded silently.
malformed packets
Total number of the packets that cannot be processed normal due to its corruption or standard non-conformance. Such traffic is discarded silently.
no memory errors
For internal use. Signals not enough memory in system. Should be always 0.
non-def BOOTP cookie packets
Number of broken BOOTP magic cookie packets. Such traffic is discarded silently.
non-def BOOTP type packets
Number of broken BOOTP hardware type packets. Such traffic is discarded silently.
non-def dest port packets
Number of invalid destination packets. Such traffic is discarded silently.
non-def src port packets
Number of invalid source packets. Such traffic is discarded silently.
non-support media type packets
Number of non-IEEE ETHERNET packets. Such traffic is discarded silently.
non-support msg type packets
Number of invalid DHCP message type packets. Such traffic is discarded silently.
overrun packets
Number of oversized packets. Such traffic is discarded silently.
packets without DHCP Option 82
Number of packets which don't include RFC 3046 DHCP Option 82. Such traffic is handled according to custom policies.
passed packets
Denotes successful filter passed through unaltered events since instance startup.
passed packets per sec
Denotes rate of successful filter pass-through events.
underrun packets
Number of oversized packets. Such traffic is discarded silently.

Real-world example of accessing traffic statistics
You can use the command-line tool /usr/bin/kstat interactively to print all or selected KSTAT information about DHCP traffic from the system.


$ kstat -c net -m dhcpmod
module: dhcpmod instance: 1
name: inbound class: net
buffer errors 0
cache buckets 16384
cache errors 0
cache expired 1978313
cache hits 7263887
cache misses 2012974
cache records 8381
crtime 522.985408215
discarded packets 3230542
discarded packets per sec 67
discarded rate limit packets 3229064
failure packets 861
fragmented IP packets 0
input packets 27526307
input packets per sec 268
invalid BOOTP packets 0
invalid IP packets 4
invalid UDP packets 0
malformed packets 617
no memory errors 0
non-def BOOTP cookie packets 2
non-def BOOTP type packets 0
non-def dest port packets 0
non-def src port packets 861
non-support media type packets 5
non-support msg type packets 606
overrun packets 0
packets without DHCP Option 82 323744
passed packets 6048319
passed packets per sec 44
snaptime 160039.377522581
state 1
underrun packets 0
module: dhcpmod instance: 2
name: inbound class: net
buffer errors 0
cache buckets 4
cache errors 0
cache expired 490
cache hits 155
cache misses 499
cache records 2
crtime 522.986099883
discarded packets 0
discarded packets per sec 0
discarded rate limit packets 0
failure packets 0
fragmented IP packets 0
input packets 658
input packets per sec 0
invalid BOOTP packets 0
invalid IP packets 0
invalid UDP packets 0
malformed packets 0
no memory errors 0
non-def BOOTP cookie packets 0
non-def BOOTP type packets 0
non-def dest port packets 0
non-def src port packets 0
non-support media type packets 0
non-support msg type packets 0
overrun packets 0
packets without DHCP Option 82 0
passed packets 658
passed packets per sec 0
snaptime 160039.379437423
state 1
underrun packets 0


For viewing one instance separately use -i key followed by instance ID.

$ kstat -c net -m dhcpmod -i 1