Introduction

This article is based on the WireGuard Golang implementation, on the Linux platform, at Git revision: 12269c2

Content

Packet Sending and Receiving

WireGuard interacts with the system’s network stack by creating a TUN device.

Data Abstraction Overview

Rate Limiter

Before performing data encryption and decryption, WireGuard needs to negotiate a symmetric encryption key (ephemeral) through the Noise Protocol based on cryptographic algorithms. This negotiation process (handshake) involves encryption/decryption and hashing operations. To prevent DoS attacks targeting the responder’s CPU load, the responder rate-limits handshake requests based on real-time load. It instantiates a token bucket for each source IP (one token bucket per source IP) and applies frequency limits to handshake packets based on the token bucket. The default policy allows 20 handshake packets per second, with a burst capacity of up to 5 handshake packets at any given moment (i.e., per nanosecond).

Index Table

Bytes 4 through 8 of each packet header store a 32-bit unsigned integer as an index. Through this index and the index table (map[uint32]IndexTableEntry), the corresponding Peer, Handshake, and Keypair can be looked up.

The index value itself is a random value, created in:

func (table *IndexTable) NewIndexForHandshake(peer *Peer, handshake *Handshake) (uint32, error)

This function uses a two-phase locking strategy (the index table first checks whether the index is already in use under a read lock; if the index is not used, it then creates the corresponding index entry under a write lock).

What is the purpose of the index table?

The index table is used in two scenarios.

The first is during handshake. The WireGuard initiator creates a handshake packet and consumes the response packet, completing the Triple-way DH process. The initiator creates and sends the handshake packet, then waits for and consumes the response packet. There is an intermediate state between sending and waiting for the response, and the values of this intermediate state are stored in a Handshake instance. The Handshake, as a value in the index table (one of the struct fields), is bound to the handshake packet through the index. When consuming the response packet, the receiver ID in the response header (the responder returns the sender ID from the initial handshake packet as the receiver ID) is used to query the index table to obtain the Handshake instance from when the handshake packet was created, thus completing the entire Triple-way DH process.

The second is when receiving data packets. After the responder consumes the handshake packet and sends a response, it creates Peer and Keypair locally and updates the index table. The index value in the received data packet header allows WireGuard to find the Keypair and Peer created during the handshake through the index table. The Keypair is used to verify whether the data packet has a corresponding key pair and whether the encryption key has expired. The Peer is used to locate the data packet’s queue. The data packet is authenticated and decrypted based on the Keypair, and the decrypted data packet is added to the Peer’s queue awaiting consumption.

Routing Table

Called AllowedIPs in the code, a struct type. Compared to the system routing table, this is WireGuard’s internal routing table implementation, based on a trie (prefix tree).

When is it initialized? When configuring WireGuard via IPC, for example, using the command-line tool wg: wg set wg0 allowed-ips '10.0.0.2/32'.

What is its purpose?

It maps the relationship between addresses (address ranges) and Peers.

There are two scenarios.

The first is when the initiator sends data. A data packet is sent by the system routing table to WireGuard’s TUN device. WireGuard reads the IP packet from the file descriptor of the TUN device, queries the internal routing table based on the destination address of the IP packet, and determines the target Peer instance. Based on the information in the Peer instance, it encrypts and sends the data to the Peer’s corresponding server. This serves as internal routing.

The second is when the responder receives data. The decrypted IP packet’s source address is used to query the internal routing table (note that before decryption there is one IP packet, and after decryption the previous header is stripped away – from the data portion, another IP packet is decrypted, i.e., IP-in-IP). If the Peer instance matches the one obtained by querying the index table with the receiver ID before decryption, it confirms that this decrypted IP packet is indeed intended for this Peer. This serves as path authentication based on the source IP (WireGuard virtual IP).

Device

Called Device in the code, a struct type.

In the userspace implementation, it is created when starting the wireguard-go binary. For the kernel implementation, it is created when executing ip link add dev wg0 type wireguard.

In the WireGuard Golang implementation, Device is the abstraction of the local WireGuard device. It is composed of many custom structs, such as: TUN device, network socket device, internal routing table, memory pool, encryption queue, decryption queue, and handshake queue.

The WireGuard Device struct can be considered an abstraction of the private key – each private key corresponds to an independent device.

In practice:

Windows: Multiple WireGuard devices with different names can run simultaneously.
macOS: Multiple WireGuard devices with different names can run simultaneously.
Linux: Multiple WireGuard devices with different names can run simultaneously.
Android: Only one WireGuard device can run at a time.
iOS: Only one WireGuard device can run at a time.

For the latter two platforms, if you need to access multiple WireGuard networks simultaneously on a device (without manually switching devices), you need to multiplex the current WireGuard device through peers. See the Peer section below.

Peer

Called Peer in the code, a struct type. WireGuard does not distinguish between client and server. When a server actively initiates a handshake packet, we call it the initiator; when a server passively responds to a handshake packet, we call it the responder. We use the term Peer to replace the concepts of client and server.

When is it initialized? When configuring WireGuard via IPC, for example, using the command-line tool wg: wg set wg0 public_key tUpr9....

In the WireGuard Golang implementation, a Peer is the local abstraction of a remote device. Each remote device’s static public key corresponds to a Peer, and each device can connect to multiple Peers. It is composed of many custom structs, such as: keypair collection, handshake packet, endpoint, receive queue, send queue, staging queue, timers, etc.

For platforms that do not support running multiple WireGuard devices simultaneously (Android, iOS), we can multiplex the local device as follows:

Create a device locally (generate a public/private key pair).
Add the local device’s public key to network A.
Add the local device’s public key to network B.
Add the remote peer’s public key from network A to the local device, and specify the remote office LAN subnet in the routing table (allowedIPs).
Add the remote peer’s public key from network B to the local device, and specify the remote office LAN subnet in the routing table (allowedIPs).

This way, our device can simultaneously access LAN devices on both office networks A and B.

Peer Table

Called peers in the code, an embedded struct within the Device struct, used to record the Peer corresponding to each remote device. It is essentially a map, with the public key as the key and a pointer to a Peer instance as the value.

Device Queues

The following queues exist:

Encryption queue: for packets pending encryption.
Decryption queue: for packets pending decryption.
Handshake queue: for handshake packets pending consumption.

Multiple goroutines concurrently attempt to consume these device queues.

Peer Queues

Peer staging queue: for packets pending encryption that are generated when handshake is not yet complete and encryption is temporarily unavailable.
Inbound queue: holds decrypted packets awaiting consumption. These are sent to the TUN device, where the Linux network stack delivers them to the local service or routes them to other devices.
Outbound queue: holds encrypted packets awaiting transmission. These are sent to the server.

Packets in the staging queue are in an unencrypted state and are consumed in func (peer *Peer) SendStagedPackets(). The inbound and outbound queues are consumed serially.

Endpoint

Called endpoint in the code, it records the real source address and destination address. For example, if the public server’s address is 1.1.1.1 and the client sends a request through the ISP’s NAT device (2.2.2.2) to the public server, then the endpoint struct records the destination address as 1.1.1.1 and the source address as 2.2.2.2.

The destination address is typically configured manually by the client through wg (command-line tool). The principle is that the wg command sends specially formatted messages to WireGuard’s IPC socket via IPC (inter-process communication), and WireGuard consumes these IPC messages to configure itself. This is specific to the userspace WireGuard implementation; the kernel implementation interacts and configures directly via Netlink. The destination address is also automatically updated to the latest source IP from received data packets.

Handshake

Called Handshake in the code, a struct type.

When is it initialized? In the func (device *Device) NewPeer(pk NoisePublicKey) (*Peer, error) method.

What is its purpose? Each peer has its own Handshake instance, storing the intermediate state values needed for the key derivation operations (from asymmetric encryption to symmetric encryption) during the handshake process.

Keypair

Called Keypair in the code, a struct type. This struct contains the values needed for symmetric encryption, such as instances of the cipher.AEAD interface type (two of them, for encrypting send and receive separately), hence the name keypair.

A Keypair is created in BeginSymmetricSession, based on the values from the handshake.

Keypair Collection

Called Keypairs in the code, a struct type. WireGuard re-handshakes every 3 minutes, replacing old symmetric encryption keys with new ones (i.e., creating new Keypairs to replace old Keypairs). There are three states of Keypairs: the previously used Keypair, the currently active Keypair, and the upcoming Keypair. The keypair collection records the Keypair instances for these three states.

Nonce

The nonce is a critical parameter in stream encryption and authentication. In chacha20poly1305 AEAD, the nonce size is 12 bytes. chacha20poly1305 requires the nonce to be a unique value used only once.

Data packet decryption uses ChaCha20Poly1305. Bytes 8 through 16 (a total of 8 bytes) form a counter, used to implement unique nonce values (the first four bytes are 0, and the last 8 bytes are the counter value).

Data Processing Logic

Data Receiving Goroutine

The data receiving operation is implemented through an independent goroutine. The relevant function is:

func (device *Device) RoutineReceiveIncoming(maxBatchSize int, recv conn.ReceiveFunc){ ... }

What Scenarios Start the Receiving Goroutine?

User configuring WireGuard via the wg command
Linux bringing up the TUN device via the ip command

For point 1, the corresponding command is: wg set wg0 listen-port 51820. Internally, WireGuard communicates through UAPI (which is actually a Unix domain socket), located at: /var/run/wireguard/wg0.sock. When the above command is executed, wg sends a message to /var/run/wireguard/wg0.sock. WireGuard listens for messages, and if the command configures the listen port (i.e., the command contains the listen-port parameter), it starts the listening goroutine on the new port.

For point 2, the corresponding command is: ip link set wg0 up. This command starts the TUN device via a Netlink message. The kernel sends the message to the WireGuard process, which captures the Netlink message and confirms it is a TUN device startup event. During the startup operation, new data-receiving goroutines are launched.

How Is Data Received?

The system call for receiving data is wrapped in: func (s *StdNetBind) Open(uport uint16) ([]ReceiveFunc, uint16, error). Linux supports reading multiple packets in a single system call, so it is actually a wrapper around the recvmmsg Linux system call to implement batch packet reading and parse the source address of each packet through packet control information (unix.PKTINFO), updating the Peer’s endpoint address in real time.

Before receiving data packets, the memory space needed for batch receiving is initialized in a loop. These memory spaces are fixed-size byte arrays (each packet is stored in its own byte array). Since packet receiving and processing (each packet occupies a segment of memory, which is reclaimed after processing) generates many memory allocation and garbage collection operations, creating a performance bottleneck, the project uses sync.Pool to reuse packet memory space.

The project also uses sync.Cond to implement memory control (disabled by default, i.e., memory growth is unlimited). The principle is to wrap sync.Pool.Get() and sync.Pool.Put() operations. In the Get wrapper, the allocation count is tracked (atomic operation; the counter increments by 1 for each allocation). When the counter exceeds the maximum value, sync.Cond.Wait() blocks the process, pausing subsequent sync.Pool.Get() operations and stopping memory allocation. In the Put wrapper, sync.Pool.Put() returns the memory space, the counter is decremented by 1, and sync.Cond.Signal() resumes the process blocked by sync.Cond.Wait().

How Are Received Packets Processed?

Data reception is triggered by a loop that calls the recvmmsg wrapper to batch-receive packets. Received packets are classified by their first byte to determine the packet type.

There are 4 message types, corresponding to handshake packets (multiple states) and data packets.

Data Packet Processing

Since packets are received in batches, each system call fills the byte arrays in the []*[MaxMessageSize]byte typed bufsArrs variable with one or more packets. A loop is needed to iterate over each packet (packets may belong to different Peers).

Bytes 4 through 8 of each packet are called Receiver in the code – a 32-bit ID (4 bytes). This ID uses the index table to record and look up the Peer, Keypair, and Handshake corresponding to the packet. Based on this information, it can determine whether the ephemeral key has expired, whether it is a replay attack, and which Peer the packet belongs to.

To ensure forward secrecy, WireGuard periodically or after a certain amount of data creates new ephemeral keys (used for symmetric encryption) based on asymmetric encryption. Because data is encrypted with rotating ephemeral keys, even if packets are intercepted, they cannot be decrypted without the ephemeral key from that specific point in time. Each created ephemeral key (symmetric encryption) is abstracted as a session, and each session has a defined lifetime.

After receiving a data packet, the keypair information is retrieved by querying the index table using the Receiver in the packet header. The retrieved information includes a timestamp; the keypair must have been created within 3 minutes to be considered valid.

If the keypair is valid, a QueueInboundElement struct instance is created to record the packet data (packet content, Keypair, Handshake, endpoint, etc.).

type QueueInboundElement struct {
	// Memory address of the packet
	buffer   *[MaxMessageSize]byte
	// Slice of valid data (all packets have the same memory size, but actual valid data only occupies a small portion)
	packet   []byte
	counter  uint64
	keypair  *Keypair
	endpoint conn.Endpoint
}

Each receive function has a map elemsByPeer = make(map[*Peer]*QueueInboundElementsContainer, maxBatchSize) that records the relationship between Peers and their unprocessed packets (multiple). Pointers to multiple packets are stored in the elems slice of QueueInboundElementsContainer:

type QueueInboundElementsContainer struct {
	sync.Mutex
	elems []*QueueInboundElement
}

At the end, the receive function sends the QueueInboundElementsContainer to two channels: device.queue.decryption.c and peer.queue.inbound.c. These two channels are consumed in RoutineDecryption and RoutineSequentialReceiver.

In the WireGuard Golang implementation, decryption and consumption of decrypted data happen in two independent goroutines. By sending pointers to received packets to two channels, decryption and consumption occur in separate goroutines. How does the project ensure that decryption happens before consumption (as I mentioned, decryption and consumption are in two independent goroutines)?

In practice, a mutex lock is used for blocking synchronization (ensuring decryption before consumption) between creating the packet (undecrypted state) and consuming it (decrypted state).

// Packet reception
func (device *Device) RoutineReceiveIncoming(maxBatchSize int, recv conn.ReceiveFunc) {

	for {
		count, err = recv(bufs, sizes, endpoints)

		// handle each packet in the batch
		for i, size := range sizes[:count] {
			// ...
			packet := bufsArrs[i][:size]
			msgType := binary.LittleEndian.Uint32(packet[:4])
			switch msgType {
			case MessageTransportType:
				// ...
				peer := value.peer
				elem := device.GetInboundElement()
				elem.packet = packet
				elem.buffer = bufsArrs[i]
				elem.keypair = keypair
				elem.endpoint = endpoints[i]
				elem.counter = 0

				elemsForPeer, ok := elemsByPeer[peer]
				if !ok {
					elemsForPeer = device.GetInboundElementsContainer()
					elemsForPeer.Lock()
					elemsByPeer[peer] = elemsForPeer
				}
				elemsForPeer.elems = append(elemsForPeer.elems, elem)
				// ...
				continue
			case MessageInitiationType:
	        // ...
		}
		for peer, elemsContainer := range elemsByPeer {
			if peer.isRunning.Load() {
				peer.queue.inbound.c <- elemsContainer
				device.queue.decryption.c <- elemsContainer
			}
			// ...
		}
	}
}

As shown above, after receiving and creating the elem, the elem is added as a member of the elemsForPeer.elems slice. When creating elemsForPeer, there is an elemsForPeer.Lock() operation. The elemsForPeer is then sent to two queue channels (decryption and consumption).

// Decryption logic
func (device *Device) RoutineDecryption(id int) {
	var nonce [chacha20poly1305.NonceSize]byte

	for elemsContainer := range device.queue.decryption.c {
		for _, elem := range elemsContainer.elems {
			//...
			elem.packet, err = elem.keypair.receive.Open(
				content[:0],
				nonce[:],
				content,
				nil,
			)
		}
		elemsContainer.Unlock()
	}
}

When decryption is complete, there is an elemsContainer.Unlock() operation.

func (peer *Peer) RoutineSequentialReceiver(maxBatchSize int) {
	// ...
	bufs := make([][]byte, 0, maxBatchSize)
	for elemsContainer := range peer.queue.inbound.c {
		if elemsContainer == nil {
			return
		}
		elemsContainer.Lock()
		for i, elem := range elemsContainer.elems {
			if elem.packet == nil {
				// decryption failed
				continue
			}
			// ...
			switch elem.packet[0] >> 4 {
			case 4:
				// ...
			case 6:
				// ...
			}

			bufs = append(bufs, elem.buffer[:MessageTransportOffsetContent+len(elem.packet)])
		}

		if len(bufs) > 0 {
			_, err := device.tun.device.Write(bufs, MessageTransportOffsetContent)
		}
	}
}

When consuming data, there is an elemsContainer.Lock() operation. This operation blocks data consumption until decryption is complete and the lock is released.

This ensures that decryption happens before consumption.

Data Packet Decryption

The data packet decryption logic is encapsulated in func (device *Device) RoutineDecryption(id int). The RoutineDecryption goroutine runs at program startup, and the number of goroutines matches the CPU count. For example, on a server with 4 cores and 8 threads, 8 RoutineDecryption goroutines are started to decrypt received data packets in parallel.

Each data packet, after being retrieved via the index and index table, obtains its own Keypair struct instance. The Keypair instance is created in func (peer *Peer) BeginSymmetricSession() error.

BeginSymmetricSession encapsulates the logic for: deriving symmetric encryption keys from asymmetric encryption keys, encrypting with ChaCha20Poly1305, key rotation, etc. It is where the process begins of generating ephemeral symmetric encryption keys from asymmetric encryption and performing key rotation.

Data Packet Consumption

peer.queue.inbound.c is consumed in the RoutineSequentialReceiver goroutine. RoutineSequentialReceiver is started when the Peer starts, and the Peer is started when the TUN device is brought up or when the user executes any wg set command. Therefore, the data packet consumption goroutine is started by:

User configuring WireGuard via wg set command
Linux bringing up the TUN device via the ip command

RoutineSequentialReceiver processes packets serially. This implementation relies on the packet counter value and the Filter struct, using the func (f *Filter) ValidateCounter(counter, limit uint64) bool method to ensure data packet freshness and defend against replay attacks¹.

How is packet freshness ensured to prevent replay attacks?

The ValidateCounter function uses a sliding window and bitmap approach, implemented based on RFC 6479.

This RFC was authored by two Huawei engineers. The idea is to record the latest sequence number and divide it into fixed-size blocks. For example, 127 can be split into 64 + 63, 191 into 64 + 64 + 63, 255 into 64 + 64 + 64 + 63. The number of blocks after splitting – for example, 127 is 1 + 1 = 2, 191 is 2 + 1 = 3, 255 is 3 + 1 = 4 – serves as the index (indices start from 0, so the actual indices for the examples above are 1, 2, 3).

Each block’s value is then stored as a bitmap. Because each block’s value is actually in the range 0-63, the corresponding bitmap is a 64-bit unsigned integer. By updating the bit at the corresponding position (setting it to 1), it indicates whether the packet with that sequence number has already been received. In other words, the algorithm first determines the block index by computing the quotient when divided by 64, then records the remainder. The remainder is stored via bitmap, which has the advantage of confirming in O(1) complexity whether the packet has already been received.

The sliding window uses the last packet’s sequence number and the window size to determine whether a packet is too old (the latest packet’s sequence number minus the window size equals the minimum acceptable sequence number).

In the sliding window and bitmap implementation (preventing replay attacks), the following scenarios may occur:

If the sequence number is greater than the last received packet’s sequence number – this is normal packet reception behavior.
- The sequence number corresponds to the same block index. Directly read or update the corresponding bitmap.
  - If the corresponding bit in the bitmap is 0, it is a new packet; accept it and set the bit to 1.
  - If the corresponding bit in the bitmap is already 1, a replay attack has occurred.
- The sequence number corresponds to a different block index. Before reading and updating the bitmap at the corresponding index, initialize (zero out) the new block index’s bitmap.
If the sequence number is less than the last received packet’s sequence number.
- If the sequence number is within the allowable window size range (window size in code is: ( 128 - 1 ) * 64), perform the read or update operation.
  - If the corresponding bit in the bitmap is 0, it is a new packet; accept it and set the bit to 1.
  - If the corresponding bit in the bitmap is already 1, a replay attack has occurred.
- Otherwise, the packet is expired and is directly discarded.

Packets are then written to the platform-specific TUN device implementation through the Write method of the tun.Device interface. This process simulates a virtual NIC receiving packets, similar to how a physical NIC would.

Handshake Packet Processing

Handshake packets are sent to the device.queue.handshake.c channel, which is a buffered channel with a default size of the constant QueueInboundSize (1024). Therefore, under high load, at most 1024 handshake packet pointer copies can be buffered in the channel.

This channel is consumed in the goroutine func (device *Device) RoutineHandshake(id int). RoutineHandshake goroutines run at program startup, and the number of goroutines matches the CPU count. For example, on a server with 4 cores and 8 threads, 8 RoutineHandshake goroutines are started to consume all valid handshake packet information.

type QueueHandshakeElement struct {
	msgType  uint32
	packet   []byte
	endpoint conn.Endpoint
	buffer   *[MaxMessageSize]byte
}

Each handshake packet’s data is consumed as an instance of the above struct type in RoutineHandshake. There are 3 types of handshake packets:

MessageInitiationType: The initial handshake packet sent when starting a handshake
MessageResponseType: The response handshake packet returned after receiving an initial handshake packet
MessageCookieReplyType: If the receiver (i.e., the party returning the response handshake packet) experiences high load due to a large volume of initial handshake packets, it returns a cookie reply packet (containing a cookie). By default, a handshake packet has two MAC codes. msg.mac1 must be present and valid. msg.mac2 is generated by the sender based on the cookie. Under high load, even if msg.mac1 is valid, the packet may be dropped if there is no valid msg.mac2.

How Is High Load Determined?

Handshake packets are temporarily stored in the device’s handshake queue, which can hold 1024 handshake packets by default.

When a new handshake packet is processed, the number of pending handshake packets in the queue (i.e., the queue length) is used to determine whether the system is under high load. If the pending count is greater than or equal to 128, it is considered high load. This approach can be used to detect whether the service is under a DoS attack.

Because there are multiple RoutineHandshake goroutines, two situations may arise:

High load occurs while the current handshake packet is being processed. The duration of the high load state is recorded (by default, the current timestamp plus 1 second is set as the high load duration). Then true is returned, indicating the current handshake packet was processed during high load.
No high load when the current handshake packet is being processed, but another goroutine detected high load, and the current processing time falls within that goroutine’s high load duration. In this case, true should also be returned.

How Do Message Authentication Codes (MAC) Work?

Every data packet must have msg.mac1, and under high load, msg.mac2 is also required.

msg.mac1 and msg.mac2 constitute the last 32 bytes of the encrypted packet. Assuming the total length of the encrypted packet is n bytes, msg.mac1 is msg[n-32:n-16] and msg.mac2 is msg[n-16:n].

msg.mac1 is a 128-bit hash value computed via blake2s from the packet content excluding the last 32 bytes (i.e., the generated hash length is 128 bits, or 16 bytes), and the hash computation uses a key parameter.

func() {
   // Key parameter is: st.mac1.key[:]
   mac, _ := blake2s.New128(st.mac1.key[:])
   mac.Write(msg[:smac1])
   mac.Sum(mac1[:0])
}()

How Is the Key Parameter Generated?

The key parameter is derived from a fixed string mac1---- and the responder’s public key through the blake2s hash algorithm without a key parameter:

func() {
   // Key parameter is nil
   hash, _ := blake2s.New256(nil)
   // Content is []byte(WGLabelMAC1) + pk[:]
   hash.Write([]byte(WGLabelMAC1))
   hash.Write(pk[:])
   // Hash result is written to st.mac1.key[:0]
   // Since we want to replace the value in key rather than append to the end, we pass a zero-length new slice as the Sum function's parameter
   hash.Sum(st.mac1.key[:0])
}()

By default, msg.mac2 in the initial handshake packet is 0. Only after the responder returns a MessageCookieReplyType handshake packet does the initiator compute and fill in the msg.mac2 value.

When Does the Responder Send a `MessageCookieReplyType` Handshake Packet?

When under high load.

What Does the `MessageCookieReplyType` Handshake Packet Contain?

The code is as follows:

type MessageCookieReply struct {
	Type     uint32
	Receiver uint32
	Nonce    [chacha20poly1305.NonceSizeX]byte
	Cookie   [blake2s.Size128 + poly1305.TagSize]byte
}

The key information in the handshake packet is the cookie value encrypted with XChaCha20-Poly1305.

The encryption function is:

xchapoly, _ := chacha20poly1305.NewX(st.mac2.encryptionKey[:])
xchapoly.Seal(reply.Cookie[:0], reply.Nonce[:], cookie[:], msg[smac1:smac2])

The following parameters are involved:

Encryption key: st.mac2.encryptionKey[:]
Nonce: reply.Nonce[:]
Plaintext data (unencrypted cookie): cookie[:]
Additional data: msg[smac1:smac2]

What Is the Encryption Key?

It is the hash of the string cookie-- and the responder’s public key, computed via blake2s.New256 to produce a 32-byte hash value.

The creation function is:

func (st *CookieChecker) Init(pk NoisePublicKey) {
    //...
	func() {
		hash, _ := blake2s.New256(nil)
		hash.Write([]byte(WGLabelCookie))
		hash.Write(pk[:])
		hash.Sum(st.mac2.encryptionKey[:0])
	}()

	st.mac2.secretSet = time.Time{}
}

How Is the Nonce Created?

There are two types of nonces.

Nonce used when sending data. This value is updated when the initiator sends a data packet. By default, after a successful handshake, it atomically increments from 0 until the counter reaches the maximum allowed number of packets (at which point a re-handshake is triggered, creating a new symmetric encryption keypair).
Nonce used when creating a cookie reply under high load. Created by the responder when generating a cookie reply; it is a random value and is sent as part of the reply packet content to the initiator. It is created in CreateReply.

The hash of the requesting initiator’s source IP (if the user is behind NAT, this is the NAT public address).

Created in the following function:

func (st *CookieChecker) CreateReply( msg []byte, recv uint32, src []byte, ) (*MessageCookieReply, error){
	st.RLock()

	// refresh cookie secret

	if time.Since(st.mac2.secretSet) > CookieRefreshTime {
		st.RUnlock()
		st.Lock()
		// If the cookie is too old, generate a new secret to produce a new cookie hash
		_, err := rand.Read(st.mac2.secret[:])
		if err != nil {
			st.Unlock()
			return nil, err
		}
		st.mac2.secretSet = time.Now()
		st.Unlock()
		st.RLock()
	}

	var cookie [blake2s.Size128]byte
	func() {
		mac, _ := blake2s.New128(st.mac2.secret[:])
		// src is actually the initiator's public (egress) IP
		mac.Write(src)
		mac.Sum(cookie[:0])
	}()
	// ...
}

What Is the Additional Data?

It is actually msg.mac1, from the initiator’s initial handshake request.

It is not encrypted, but it affects the ciphertext output (the actual cookie value).

For XChaCha20-Poly1305, the ciphertext length equals the plaintext length + the tag length. XChaCha20 encrypts the plaintext into ciphertext (same length).

The tag (16 bytes) is actually the hash of the XChaCha20-encrypted ciphertext and the additional data. XChaCha20 takes the nonce and a 32-byte key as parameters, encrypting a 32-byte all-zero byte array. The first 16 bytes of the result serve as the Poly1305 HMAC key. The 16-byte tag value is computed from the ciphertext and additional data.

So the cookie value is actually the initiator’s egress IP hashed and encrypted into ciphertext (with a 16-byte verification tag appended to the end of the ciphertext).

Why Does Cookie Encryption Use `XChaCha20-Poly1305` Instead of `ChaCha20-Poly1305`?

The difference between the two is the nonce size: the former uses 24 bytes while the latter uses 12 bytes.

First, let’s discuss the use cases. XChaCha20-Poly1305 is used for cookie reply handshake packet encryption, while ChaCha20-Poly1305 is used for data packet encryption.

The purpose is to avoid public key leakage caused by nonce reuse. WireGuard’s encrypted data transport updates the keypair every 2 minutes, using new ephemeral keys for encrypted transmission. Therefore, the nonce for transport data packets is a counter, and before this counter exceeds the maximum value allowed by 12 bytes, the key has already been updated, so whether the nonce repeats does not affect encryption security.

However, cookie reply handshake packets are encrypted based on the responder’s public key (which never changes), and the nonce is a random value. To minimize the possibility of nonce reuse, the nonce size matters significantly here. Compared to a 12-byte value, a 24-byte random value greatly reduces the probability of collision.

How Does the Initiator Consume the `MessageCookieReplyType` Response?

The entry point for consuming MessageCookieReplyType is:

func (device *Device) RoutineHandshake(id int){
	// ...
}

The client process receives data packets and, based on the packet type, calls the following function:

func (st *CookieGenerator) ConsumeReply(msg *MessageCookieReply) bool {
	// ...

	// Must contain a valid msg.mac1
	if !st.mac2.hasLastMAC1 {
		return false
	}

	// ...
	var cookie [blake2s.Size128]byte

	// The cookie is encrypted using a key derived from the responder's public key hash; the initiator constructs the same hash locally
	xchapoly, _ := chacha20poly1305.NewX(st.mac2.encryptionKey[:])
	// If err is nil, decryption and verification succeeded, return true
	// If err is not nil, decryption and verification failed, return false
	_, err := xchapoly.Open(cookie[:0], msg.Nonce[:], msg.Cookie[:], st.mac2.lastMAC1[:])
	if err != nil {
		return false
	}

	// Record the latest timestamp
	st.mac2.cookieSet = time.Now()
	// Decrypted content is saved to the initiator's memory
	st.mac2.cookie = cookie
	return true
}

Consuming the cookie reply packet. The st.mac2.cookieSet timestamp in the initiator’s local Peer is updated.

When Is a Handshake Packet with msg.mac2 Sent?

After receiving a MessageCookieReply handshake packet, a new handshake request is not immediately triggered. This is because the purpose of msg.mac2 is to prevent server overload (since responding to handshake packets involves CPU-intensive cryptographic operations such as key derivation). Before establishing a handshake, WireGuard attempts to establish the handshake at specific intervals (Rekey-Timeout, which is 5s in the code), with a limited number of attempts. A handshake packet with a valid msg.mac2 will be created in the next attempt after receiving a valid cookie value.

How Is Handshake Rate Limiting Implemented via Token Bucket?

Each IP has its own token bucket. Tokens accumulate over time up to a maximum value. Each request has a cost, and if requests are too frequent and all tokens are consumed, further requests are rejected until the token bucket accumulates enough tokens again.

The request cost unit is nanoseconds (time). Assuming 20 packets are allowed per second:

const (
    packetsPerSecond   = 20
    packetCost         = 1000000000 / packetsPerSecond
)

// packetCost = 1000000000 / 20 = 50000000

This means each request consumes 50000000 nanoseconds of “time” tokens – in this example, one request per 50 milliseconds.

Tokens in each IP address’s token bucket are accumulated by computing time differences. In the Allow method, this is achieved by computing the time elapsed since the last access:

now := rate.timeNow()
entry.tokens += now.Sub(entry.lastTime).Nanoseconds()
entry.lastTime = now

Get the current time now.
Compute the nanoseconds elapsed since the last access using now.Sub(entry.lastTime).Nanoseconds().
Add these nanoseconds to entry.tokens.

To prevent tokens from growing indefinitely, a maximum token limit is defined by maxTokens:

if entry.tokens > maxTokens {
    entry.tokens = maxTokens
}

If the accumulated tokens exceed this maximum after time-difference accumulation, the token count is capped at maxTokens, ensuring the token bucket does not accumulate excessive tokens from prolonged inactivity.

Each request attempts to consume a certain number of tokens from the token bucket (defined by packetCost):

if entry.tokens > packetCost {
    entry.tokens -= packetCost
    return true
}

If the current token count exceeds the per-request token cost packetCost, the corresponding tokens are deducted, and the request is allowed. If tokens are insufficient, the request is rejected.

Token bucket creation and cleanup operate on a per-second basis. If more than 1 second elapses between a client’s requests, that client’s token bucket is considered expired and cleaned up. An independent token bucket is created for each requesting client. The default token count allocated when a token bucket is created is:

packetCost = 1000000000 / packetsPerSecond
maxTokens  = packetCost * packetsBurstable

Assuming packetsPerSecond is 20, the cost per request is:

packetCost = 1000000000 / 20 = 50000000 // nanoseconds

If packetsBurstable is set to 5, then:

maxTokens = 50000000 * 5 = 250000000 // nanoseconds

This means that when the token bucket is fully charged, it contains enough tokens to handle 5 consecutive requests, with each request costing 50000000 nanoseconds.

This setup gives the token bucket an initial request-processing capacity to handle burst traffic. By adjusting the packetsPerSecond and packetsBurstable parameters, the token bucket behavior can be flexibly configured for specific application scenarios and performance requirements.

Each client IP’s token bucket is stored in a map:

type Ratelimiter struct {
	mu      sync.RWMutex
	timeNow func() time.Time

	stopReset chan struct{} // send to reset, close to stop
	// table is used to store all token buckets
	table     map[netip.Addr]*RatelimiterEntry
}

To promptly clean up unused token buckets (where the interval between two requests from the same client exceeds 1 second), a periodic cleanup routine is started when the first token bucket is added to the table.

The following code initializes the cleanup routine (in a stopped state):

func (rate *Ratelimiter) Init() {

	// ...

	go func() {
		ticker := time.NewTicker(time.Second)
		ticker.Stop()
		for {
			select {
			// Blocks by default
			case _, ok := <-stopReset:
				ticker.Stop()
				if !ok {
					return
				}
				// Start a new ticker
				ticker = time.NewTicker(time.Second)
			// Blocks by default because ticker was stopped before the for loop
			case <-ticker.C:
				if rate.cleanup() {
					// Stop the ticker after successful cleanup
					ticker.Stop()
				}
			}
		}
	}()
}

The following code sends a signal to the stopReset channel to start the cleanup goroutine:

// If the map length changes from 0 to 1
if len(rate.table) == 1 {
	rate.stopReset <- struct{}{}
}

Are Handshake Packets Rate-Limited?

If the responder experiences high load, handshake packets are rate-limited.

The rate-limiting function call is included in func (device *Device) RoutineHandshake(id int).

How Do Timers Work in WireGuard?

WireGuard uses many timers. The main logic is in the following function:

func (peer *Peer) timersInit() {
	peer.timers.retransmitHandshake = peer.NewTimer(expiredRetransmitHandshake)
	peer.timers.sendKeepalive = peer.NewTimer(expiredSendKeepalive)
	peer.timers.newHandshake = peer.NewTimer(expiredNewHandshake)
	peer.timers.zeroKeyMaterial = peer.NewTimer(expiredZeroKeyMaterial)
	peer.timers.persistentKeepalive = peer.NewTimer(expiredPersistentKeepalive)
}

This function should be read together with the NewTimer implementation:

func (peer *Peer) NewTimer(expirationFunction func(*Peer)) *Timer {
	timer := &Timer{}
	// Uses Go's timer.AfterFunc API, specifying the logic to execute upon timeout
	timer.Timer = time.AfterFunc(time.Hour, func() {
		timer.runningLock.Lock()
		defer timer.runningLock.Unlock()

		timer.modifyingLock.Lock()
		if !timer.isPending {
			timer.modifyingLock.Unlock()
			return
		}
		timer.isPending = false
		timer.modifyingLock.Unlock()

		// Execute the logic here
		expirationFunction(peer)
	})
	timer.Stop()
	return timer
}

NewTimer follows the factory pattern in software engineering. Each timer’s expiration behavior is encapsulated in the expirationFunction function value. When the timer expires, this function is called. Different parameters (function values) create timers with different expiration logic.

After each handshake packet is sent, the timer’s reset method is called to activate the timer and configure a new timeout. The code is in the following function:

func (peer *Peer) SendHandshakeInitiation(isRetry bool) error {
	// ...
	peer.timersHandshakeInitiated()

	return err
}

func (peer *Peer) timersHandshakeInitiated() {
	if peer.timersActive() {
		peer.timers.retransmitHandshake.Mod(RekeyTimeout + time.Millisecond*time.Duration(fastrandn(RekeyTimeoutJitterMaxMs)))
	}
}

func (timer *Timer) Mod(d time.Duration) {
	timer.modifyingLock.Lock()
	timer.isPending = true
	timer.Reset(d)
	timer.modifyingLock.Unlock()
}

As shown, the Mod function wraps timer.Reset(d).

Because the retransmitHandshake timer encapsulates the timeout logic function expiredRetransmitHandshake, the corresponding logic is:

func expiredRetransmitHandshake(peer *Peer) {
		if peer.timers.handshakeAttempts.Load() > MaxTimerHandshakes {
			// ...
		} else {
		peer.timers.handshakeAttempts.Add(1)
		peer.device.log.Verbosef("%s - Handshake did not complete after %d seconds, retrying (try %d)", peer, int(RekeyTimeout.Seconds()), peer.timers.handshakeAttempts.Load()+1)

		/* We clear the endpoint address src address, in case this is the cause of trouble. */
		peer.markEndpointSrcForClearing()

		// Start the next handshake, marked as a retry (counted towards the total attempt count, limited to 18 in the code)
		peer.SendHandshakeInitiation(true)
	}
}

If the timer expires, the expiredRetransmitHandshake function is called, and a new initial handshake packet with msg.mac2 is sent.

How Are Symmetric Encryption Keys Derived?

After the initiator receives a valid MessageResponseType response handshake packet, it transitions state based on the previously established asymmetric encryption variables. The logic is in func (peer *Peer) BeginSymmetricSession() error.

After the responder receives a valid MessageInitiationType initial handshake packet, it also calls func (peer *Peer) BeginSymmetricSession() error.

How is the initiator’s chainKey produced? How is the responder’s chainKey produced?

The chainKey is created using a key derivation function (KDF), which is essentially an HMAC wrapper.

There are two types of HMAC functions, distinguished by the number of input data parameters:

func HMAC1(sum *[blake2s.Size]byte, key, in0 []byte)
func HMAC2(sum *[blake2s.Size]byte, key, in0, in1 []byte)

The HMAC function uses blake2s as the hash function to compute the message authentication code (MAC) of data in0 in1 with key as the key.

There are three types of KDF functions, distinguished by the number of keys to derive:

func KDF1(t0 *[blake2s.Size]byte, key, input []byte)
func KDF2(t0, t1 *[blake2s.Size]byte, key, input []byte)
func KDF3(t0, t1, t2 *[blake2s.Size]byte, key, input []byte)

The key derivation approach is: given the input key and input, compute a first message authentication code prk via HMAC1 (the first type of HMAC function mentioned above). t0 is the result of calling HMAC1 again with prk as the key and []byte{0x1} as the data:

func KDF1(t0 *[blake2s.Size]byte, key, input []byte) {
	HMAC1(t0, key, input)
	HMAC1(t0, t0[:], []byte{0x1})
}

Compared to KDF1, here is the KDF2 code:

func KDF2(t0, t1 *[blake2s.Size]byte, key, input []byte) {
	var prk [blake2s.Size]byte
	HMAC1(&prk, key, input)
	HMAC1(t0, prk[:], []byte{0x1})
	HMAC2(t1, prk[:], t0[:], []byte{0x2})
	setZero(prk[:])
}

As shown, t1 is the message authentication code using t0 and []byte{0x2} as data, with prk[:] as the key. Similarly, in KDF3, t2 is the message authentication code using t1 and []byte{0x3} as data, with prk[:] as the key.

func (peer *Peer) BeginSymmetricSession() error {

	// ...

	if handshake.state == handshakeResponseConsumed {
		KDF2(
			// Initiator derives symmetric encryption keys for sending and receiving from chainKey
			&sendKey,
			&recvKey,
			handshake.chainKey[:],
			nil,
		)
		isInitiator = true
	} else if handshake.state == handshakeResponseCreated {
		KDF2(
			// Responder derives symmetric encryption keys for sending and receiving from chainKey
			&recvKey,
			&sendKey,
			handshake.chainKey[:],
			nil,
		)
		isInitiator = false
	} else {
		return fmt.Errorf("invalid state for keypair derivation: %v", handshake.state)
	}

	// ...

	keypair := new(Keypair)
	// Send and receive use different AEAD keys
	keypair.send, _ = chacha20poly1305.New(sendKey[:])
	keypair.receive, _ = chacha20poly1305.New(recvKey[:])

	// ...
}

The code above shows how independent AEAD keys for sending and receiving are derived from the same chainKey. In the logic above, the initiator’s sendKey and the responder’s recvKey are the same, and the initiator’s recvKey and the responder’s sendKey are the same.

What is chainKey?

type Handshake struct {
	// ...
	chainKey                  [blake2s.Size]byte       // chain key
	// ...
}

As shown, it is a field of the Handshake type.

Initiator’s chainKey initialization logic:

func init() {
	InitialChainKey = blake2s.Sum256([]byte(NoiseConstruction))
	mixHash(&InitialHash, &InitialChainKey, []byte(WGIdentifier))
}
func mixKey(dst, c *[blake2s.Size]byte, data []byte) {
	KDF1(dst, c[:], data)
}
func (h *Handshake) mixKey(data []byte) {
	mixKey(&h.chainKey, &h.chainKey, data)
}
func (device *Device) CreateMessageInitiation(peer *Peer) (*MessageInitiation, error) {
	// ...
	handshake.chainKey = InitialChainKey
	// ...
    // The initiator's ephemeral public key is transmitted in plaintext to the responder
	msg := MessageInitiation{
		Type:      MessageInitiationType,
		Ephemeral: handshake.localEphemeral.publicKey(),
	}

	handshake.mixKey(msg.Ephemeral[:])
	// ...
	// handshake.localEphemeral is the local ephemeral private key
	// handshake.remoteStatic is the responder's static public key
	// First of the triple DH, based on the initiator's ephemeral private key and responder's static public key
	// To obtain the same ss, the responder needs the static private key
	ss, err := handshake.localEphemeral.sharedSecret(handshake.remoteStatic)
	// Compute a new chainKey and key using ss as data
	KDF2(
		&handshake.chainKey,
		&key,
		handshake.chainKey[:],
		ss[:],
	)
	// ...
	// handshake.precomputedStaticStatic[:] is computed from the initiator's static private key and responder's static public key via DH,
	// triggered during `wg set wg0 private-key` and `wg set wg0 peer`,
	// see `SetPrivateKey` and `NewPeer` functions,
	// this involves DH computation
	KDF2(
		&handshake.chainKey,
		&key,
		handshake.chainKey[:],
		handshake.precomputedStaticStatic[:],
	)
	// ...
}

func (device *Device) ConsumeMessageResponse(msg *MessageResponse) *Peer {
	// ...
	// msg.Ephemeral[:] is the responder's ephemeral public key
	mixHash(&hash, &handshake.hash, msg.Ephemeral[:])
	mixKey(&chainKey, &handshake.chainKey, msg.Ephemeral[:])

	// Second of the triple DH, based on the initiator's ephemeral private key and responder's ephemeral public key
	ss, err := handshake.localEphemeral.sharedSecret(msg.Ephemeral)
	mixKey(&chainKey, &chainKey, ss[:])

	// Third of the triple DH, based on the initiator's static private key and responder's ephemeral public key
	// This is the last change to chainKey
	ss, err = device.staticIdentity.privateKey.sharedSecret(msg.Ephemeral)
	mixKey(&chainKey, &chainKey, ss[:])
	// ...
}

Responder’s chainKey initialization logic:

func init() {
	InitialChainKey = blake2s.Sum256([]byte(NoiseConstruction))
	mixHash(&InitialHash, &InitialChainKey, []byte(WGIdentifier))
}
func mixKey(dst, c *[blake2s.Size]byte, data []byte) {
	KDF1(dst, c[:], data)
}
func (h *Handshake) mixKey(data []byte) {
	mixKey(&h.chainKey, &h.chainKey, data)
}
func (device *Device) ConsumeMessageInitiation(msg *MessageInitiation) *Peer {
	// ...
	mixHash(&hash, &InitialHash, device.staticIdentity.publicKey[:])
	mixHash(&hash, &hash, msg.Ephemeral[:])
	// msg.Ephemeral[:] is the initiator's ephemeral public key
	mixKey(&chainKey, &InitialChainKey, msg.Ephemeral[:])
	// ...
	// Triple DH, first round
	ss, err := device.staticIdentity.privateKey.sharedSecret(msg.Ephemeral)
	if err != nil {
		return nil
	}
	KDF2(&chainKey, &key, chainKey[:], ss[:])
	// ...
	KDF2(
		&chainKey,
		&key,
		chainKey[:],
		handshake.precomputedStaticStatic[:],
	)
	// ...
}
func (device *Device) CreateMessageResponse(peer *Peer) (*MessageResponse, error) {
	// ...
	handshake.localEphemeral, err = newPrivateKey()
	msg.Ephemeral = handshake.localEphemeral.publicKey()
	handshake.mixHash(msg.Ephemeral[:])
	handshake.mixKey(msg.Ephemeral[:])

	// Triple DH, second round
	ss, err := handshake.localEphemeral.sharedSecret(handshake.remoteEphemeral)
	handshake.mixKey(ss[:])
	// Triple DH, third round
	ss, err = handshake.localEphemeral.sharedSecret(handshake.remoteStatic)
	handshake.mixKey(ss[:])
	// ...
}

Why is chainKey the same on both sides?

During key derivation based on the triple DH operations, due to the properties of DH, both sides are computing HMAC authentication codes based on the same input data. Therefore, the final chainKey is the same.

How Is Data Sent?

The outbound data packet logic mainly consists of:

Serial data consumption
Parallel data encryption
Actual sending based on the specific Peer (with the concrete destination address)

Scenarios for actively sending outbound data packets:

Initial connection establishment: reading packet content from the TUN device and sending the request to the remote Peer
When keepalive is enabled: actively sending keepalive packets at specified intervals

The key functions involved:

func (peer *Peer) StagePackets(elems *QueueOutboundElementsContainer) {
   // ...
}

This function ensures the freshness of packets in the staging queue. Packets should not be transmitted until a secure encrypted communication channel is established through a successful handshake. By default, 128 packets are cached. If the staging queue is full, old packets are discarded to make room for new ones.

func (peer *Peer) SendStagedPackets(){
	// ...
}

Outbound data packets are encrypted in parallel and sent serially through this function.

func (device *Device) RoutineReadFromTUN() {
	// ...
}

Packets from the Linux network stack are sent to the TUN device. This function monitors the TUN device, captures system packets (then stages, sorts, encrypts, and finally sends them to the remote Peer).

How Are Handshake Packets Created?

The data needed for the handshake is managed as a Handshake struct in the code.

The initiator creating a handshake packet means initializing a Handshake instance and, based on the Handshake instance’s information, initializing the handshake packet struct MessageInitiation.

This logic is in func (device *Device) CreateMessageInitiation(peer *Peer) (*MessageInitiation, error).

Why are the initiator’s public key and timestamp in the initial handshake packet encrypted with different DH shared secrets?

How is the initiator’s public key encrypted? Using an ephemeral key computed from the initiator’s ephemeral private key and the responder’s static public key via curve25519.
How is the initiator’s timestamp encrypted? Using an ephemeral key computed from the initiator’s static private key and the responder’s static public key via curve25519.
What encryption algorithm is used? ChaCha20-Poly1305

In point 1, the initiator’s ephemeral public key is transmitted in plaintext to the responder. The responder can only decrypt successfully with the correct responder static private key. Therefore, from the initiator’s perspective, this authenticates the responder’s identity (only the correct responder can decrypt and obtain the initiator’s static public key).

In point 2, the encryption is based on the initiator’s static private key. From the responder’s perspective, this authenticates the initiator’s identity, and from the initiator’s perspective, it authenticates the responder’s identity. Point 2 effectively requires that the timestamp comes from the correct initiator and is only visible to the correct responder. In the responder’s logic, the timestamp is used to determine whether it is a replay attack (comparing successive timestamps from the initiator) and whether it is a handshake flood attack (packet frequency exceeding 50 per second).

When are ephemeral key pairs updated?

Ephemeral key pairs have a data packet lifetime limit (3 minutes) and a maximum packet count limit.

Does WireGuard depend on time accuracy?

No.

Replay attack detection is based on the initiator’s timestamps (the interval between two consecutive request timestamps, both from the initiator).

Handshake flood attack detection is based on the responder’s timestamps (the interval between two consecutive handshake packet consumption timestamps, both from the responder locally).

How is msg.mac1 computed in the handshake packet?

msg.mac1 is a hash of the initiator’s handshake packet binary data computed via blake2s (with an encryption parameter derived from the responder’s public key).

Why can DoS attacks against the initiator occur?

The curve25519 shared secret computation is a CPU-intensive operation. The initiator performs 2 curve25519 shared secret computations when creating an initial handshake packet. If the initiator does not verify the legitimacy of MessageCookieReplyType and creates a new initial handshake packet upon receipt, a large number of curve25519 computations will occur locally, exhausting CPU resources and making the DoS attack effective.

How to prevent DoS attacks against the initiator?

This logic is in the initiator’s consumption of the response handshake packet:

func (st *CookieGenerator) ConsumeReply(msg *MessageCookieReply) bool {

	//...

	xchapoly, _ := chacha20poly1305.NewX(st.mac2.encryptionKey[:])
	// Ensure st.mac2.lastMAC1[:] has not been tampered with via AEAD
	// msg.mac1 is the hash of the initial handshake packet, ensuring a one-to-one correspondence between response and initial handshake packets
	_, err := xchapoly.Open(cookie[:0], msg.Nonce[:], msg.Cookie[:], st.mac2.lastMAC1[:])
	if err != nil {
		return false
	}

	st.mac2.cookieSet = time.Now()
	st.mac2.cookie = cookie
	return true
}

When the responder sends a MessageCookieReplyType handshake packet to the initiator, the initiator computes the HMAC key using the 32-byte key and nonce (first 16 bytes), computes a 16-byte result from the HMAC key and msg.mac1, and compares it with the ciphertext [n-16, n]. If they match, the response packet has not been forged. Because producing the HMAC key requires the 32-byte AEAD key (which is actually the responder’s public key), an attacker attempting to forge a MessageCookieReplyType response handshake packet cannot derive the HMAC key without the AEAD key, and therefore cannot forge the content in ciphertext [n-16, n]. This ensures:

Only an initiator who knows the responder’s public key can initiate a handshake request (achieving responder stealth)
Only MessageCookieReplyType response handshake packets created by the responder are accepted by the initiator (preventing DoS attacks against the initiator)

How Is Connection Kept Alive?

If the WireGuard initiator has keepalive enabled, it actively sends an empty data packet to the responder upon first startup (when the keepalive value changes from 0 to non-zero). Assuming the initiator is behind NAT, the NAT device establishes a record mapping the responder’s IP and port to the initiator’s IP and port.

Since the internal timer sends empty data packets at the interval specified by keepalive, even if the responder experiences a network failure, the NAT device’s record will remain valid through regular refreshes.

This means that even if a network outage causes packet loss for 1 hour due to MTU or other potential issues, the WireGuard connection from the public network to the private network will automatically recover once the network is restored.

If it is the initial connection (or the link is down), because the ephemeral key was just created (or expires after 180s), the handshake process is triggered. The handshake retries every 5s up to a maximum of 18 attempts (then gives up retrying and no more initial handshake packets are sent). Then the next keepalive packet is created, triggering the handshake process again. This is what happens in practice.

If the initiator is behind NAT and the responder is a public server (with a fixed public IP), and both sides have keepalive enabled, what packets are generated?

Both sides begin sending keepalive packets when the Peer public key is set.
The act of sending keepalive on both sides triggers the handshake process.
Both sides create initial handshake packets and attempt to send them.

Scenario 1: Suppose the responder attempts to send first. Due to the lack of the initiator’s endpoint address, an error occurs leading to handshake retries. After 18 retries, another keepalive packet is attempted, triggering the handshake process again, repeating the 18-retry cycle indefinitely.

func (peer *Peer) SendBuffers(buffers [][]byte) error {
	// ...
	peer.endpoint.Lock()
	endpoint := peer.endpoint.val
	if endpoint == nil {
		peer.endpoint.Unlock()
		return errors.New("no known endpoint for peer")
	}
	// ...
}

Scenario 2: The initiator starts the handshake process first. Since the responder is a public server, the process completes smoothly.

Then the initiator starts sending keepalive packets to the responder. Due to the previous handshake process, the responder already has the initiator’s NAT public IP and port², so both sides can initiate communication.

Both sides send keepalive packets to each other at specified intervals.

If the initiator’s (behind NAT) public IP changes (a common occurrence with residential broadband), the initiator will temporarily be unable to receive the responder’s keepalive packets, and the responder itself cannot detect this. This situation recovers within a controllable timeframe because:

The ephemeral key pair expires, triggering the handshake process (at most 165s). The initiator then re-initiates the handshake.
Since the initiator also has keepalive enabled, when the IP changes, the keepalive packet carrying the correct source IP updates the responder’s endpoint address information. Thus, bidirectional communication is quickly restored.

Two additional points:

Sent keepalive packets do not receive any response, even if lost.
Thanks to the keepalive mechanism, WireGuard also has NAT traversal capability.

Is Packet Order Guaranteed?

WireGuard itself does not guarantee in-order packet delivery; this must be handled by upper-layer protocols.

How Is Nonce Uniqueness Ensured?

The nonce is a counter value that atomically increments each time a data packet is sent. The nonce value is used to ensure data freshness (preventing duplicate and stale packets).

See the code below:

func (peer *Peer) SendStagedPackets() {
top:
	// ...
	for {
		var elemsContainerOOO *QueueOutboundElementsContainer
		select {
		case elemsContainer := <-peer.queue.staged:
			i := 0
			for _, elem := range elemsContainer.elems {
				elem.peer = peer
				elem.nonce = keypair.sendNonce.Add(1) - 1
				if elem.nonce >= RejectAfterMessages {
				// ...
				} else {
					elemsContainer.elems[i] = elem
					i++
				}

				elem.keypair = keypair
			}
			elemsContainer.Lock()
			elemsContainer.elems = elemsContainer.elems[:i]

			// ...

			// add to parallel and sequential queue
			if peer.isRunning.Load() {
				peer.queue.outbound.c <- elemsContainer
				peer.device.queue.encryption.c <- elemsContainer
			} else {
            // ...
			}
			// ...
		default:
			return
		}
	}
}

Keypair Rotation Process

When are keypairs created?

When the initiator receives the handshake response packet and begins key derivation.
When the responder receives the handshake packet and sends the response.

When are keypairs updated?

In other words, when does re-handshaking occur?

Case 1: After the initiator receives any data packet, it checks whether the packet’s keypair is about to expire:

time.Since(keypair.created) > (RejectAfterTime-KeepaliveTimeout-RekeyTimeout)

If it is about to expire (180 - 10 - 5 = 165s), the initiator initiates a new handshake request.

Case 2: After the initiator sends any data packet, it checks whether the packet counter has exceeded the maximum allowed value. If so, re-handshake.

Case 3: After the initiator sends any data packet, it checks whether the packet’s keypair has expired (180s):

(keypair.isInitiator && time.Since(keypair.created) > RekeyAfterTime)

If expired, re-handshake.

Case 4: Retry after handshake failure. After the initiator sends an initial handshake packet, it waits for RekeyTimeout (5s) via a timer. If a handshake response is received, the timer stops; otherwise, the handshake packet is resent, up to 18 retries.

Case 5: No response after sending a data packet. After each data packet is sent, re-handshaking begins after KeepaliveTimeout + RekeyTimeout (10s + 5s). By default, the responder sends an empty data packet (keepalive) 10s after receiving a data packet when it has no data to return, indicating the sender has no data to send back. The sender needs to receive a data packet or keepalive within 15s.

Case 6: Before sending any data packet, if the current keypair has exceeded the maximum allowed message count, re-handshake.

Case 7: Before sending any data packet, if the current keypair has exceeded the allowed usage time RejectAfterTime (180s), re-handshake. If the receiver’s local keypair is not updated within RejectAfterTime (180s), subsequent data packets are discarded.

How is the update performed?

Keypair rotation states for the initiator. A few points to know before proceeding:

Keypair rotation rotates the ephemeral symmetric encryption keys – replacing old ephemeral keys with new ones.
Keypair refers to the two ephemeral symmetric encryption keys used for sending and receiving, not any asymmetric encryption keys.
The purpose of keypair rotation is to achieve forward secrecy³.
There may be situations where both old and new keypairs are valid simultaneously. Because before the old keypair expires, WireGuard begins the handshake process 15s early to avoid packet loss due to key expiration.

Keypair rotation is tracked through the Peer struct’s keypairs field (the keypair collection):

type Peer struct {
	// ...
	keypairs          Keypairs
	// ...
}

type Keypairs struct {
	sync.RWMutex
	current  *Keypair
	previous *Keypair
	next     atomic.Pointer[Keypair]
}

The initiator creates an initial handshake packet, at which point a new index table index ID and Keypair instance are created. The index table is updated (replacing the old index ID with the new one). During symmetric key derivation from asymmetric encryption, the index table is updated (replacing the old Keypair with the new one), and the Keypair instance pointer is assigned to the initiator’s Peer.keypairs current field.

The responder receives the handshake packet and creates a handshake response. Similar to the initiator, a new index table index ID and Keypair instance are created. The index table is updated (replacing the old index ID with the new one). During symmetric key derivation from asymmetric encryption, the index table is updated (replacing the old Keypair with the new one), and the Keypair instance is stored in the responder’s Peer.keypairs next field, with previous set to nil.

The initiator consumes the handshake response packet. Handshake succeeds, and an empty data packet (keepalive) is returned.

Here is the first question: are the initiator’s data packet sending and handshake packet sending serial or parallel operations?

Scenario 1: The initiator receives a data packet to send to the responder, but since the handshake is not complete and the local ephemeral symmetric encryption key is missing, it enters the handshake process. Therefore, serial.

Scenario 2: A previous handshake was successful and data transfer has begun. The initiator, while receiving a data packet, discovers that the ephemeral symmetric encryption key is about to expire (has existed for more than 165s), so it enters the handshake process at the end of the receive logic. Therefore, serial.

Scenario 3: The initiator sends a data packet to the responder and discovers that the ephemeral symmetric encryption key has expired (has existed for more than 180s). After sending the data packet, the handshake process begins. Therefore, serial.

As shown, in all scenarios, the initiator’s data packet sending and handshake packet sending are serial. Returning to the previously discussed topic (keypair rotation mechanism):

When the responder receives a keepalive packet, the responder’s Peer.keypairs next contains the latest Keypair. The Keypair pointer is updated to the Peer.keypairs current field, and Peer.keypairs next is set to nil.

When the ephemeral symmetric key is about to expire or has expired, the initiator begins a new handshake process.

Several scenarios:

Scenario 1: The initiator is still the same as before.

At this point, Peer.keypairs current contains the Keypair from the previous handshake. The value in current is saved to previous, and the pointer to the newly created Keypair is stored.

The responder receives the initial handshake packet, creates a new Keypair, and updates the Peer.keypairs next value with the new Keypair pointer.

After the responder receives the second handshake and sends a keepalive packet, the Peer.keypairs current value is assigned to previous, the next value is assigned to current, and next is set to nil.

Scenario 2: The initiator is the previous responder (B), and the responder becomes the initiator (A).

This transition occurs when:

The responder sends a large number of data packets exceeding the current Keypair’s maximum packet count, triggering a handshake.
The responder sends a data packet but receives no keepalive (if the initiator has no more data to send, it sends a keepalive indicating the connection will close) or other valid data packet from the initiator within 15s, triggering the handshake process.

B initiates the handshake process. A receives the initial handshake packet, creates a new Keypair, updates the Peer.keypairs next value with the new Keypair pointer, and sets previous to nil.

B receives the handshake response packet, creates a new Keypair, and updates the Peer.keypairs current value with the new Keypair pointer.

B sends a keepalive packet. A receives the data packet, updates the Keypair pointer to Peer.keypairs current, and sets next to nil.

Special case discussion: when is the initiator’s next not nil?

When the responder receives an initial handshake packet but has not yet received the initiator’s keepalive packet, and the responder also initiates a handshake request, Peer.keypairs next is not nil.

In this case, the responder-initiated handshake process takes priority.

The initiator, after sending the initial handshake packet and receiving the response, then receives an initial handshake packet from the responder.

In this case, the initiator also prioritizes the new handshake process.

Timers and State

Timers are used for managing internal protocol state. WireGuard is a protocol with internal state (but strives for external statelessness).

Timers are initialized in the following function:

func (peer *Peer) timersInit() {
	peer.timers.retransmitHandshake = peer.NewTimer(expiredRetransmitHandshake)
	peer.timers.sendKeepalive = peer.NewTimer(expiredSendKeepalive)
	peer.timers.newHandshake = peer.NewTimer(expiredNewHandshake)
	peer.timers.zeroKeyMaterial = peer.NewTimer(expiredZeroKeyMaterial)
	peer.timers.persistentKeepalive = peer.NewTimer(expiredPersistentKeepalive)
}

What is the retransmitHandshake timer for?

After a handshake packet is sent, it counts down 5s. If the response to that handshake packet is received, the timer is stopped.

What is the sendKeepalive timer for?

If we are only receiving data but not sending any response for an extended period, intermediate network devices or proxies may consider the connection timed out (read timeout or write timeout). By sending a keepalive packet KeepaliveTimeout seconds (10s) after receiving a data packet, the link state of intermediate network devices is maintained. This is especially important when the receiver is behind a NAT device, because for devices behind NAT, only by actively sending packets will the intermediate NAT device maintain the mapping between the internal IP/port and the public IP/port.

What is the newHandshake timer for?

This is the sender-side logic complementing the above scenario. The sender continuously sends data. If no response data packet or keepalive is received within 15s after sending a data packet (the receiver, even if only receiving without sending, will send a keepalive after 10s; therefore, under normal link conditions, the sender will definitely receive a packet within 15s), the sender re-initiates the handshake.

What is the zeroKeyMaterial timer for?

This timer starts at the end of each key derivation. Key derivation creates a specific Peer’s Keypair and records the relationship between Peer, Keypair, Handshake, and packet ID in the index table. This timer’s purpose is to clean up all derived ephemeral key material on the server after RejectAfterTime * 3 (180s * 3 = 540s = 9 min). Since re-handshaking occurs every 3 minutes to derive new ephemeral keys, this timer is only triggered during prolonged network disconnections.

For scenarios with keepalive enabled, since the local device periodically sends keepalive packets, and the outgoing keepalive triggers the handshake process (which triggers key derivation), this timer will never fire.

What is the persistentKeepalive timer for?

When keepalive is enabled, WireGuard actively sends an empty data packet when the Peer public key is set.

This timer’s purpose is to periodically send empty data packets (keepalive) at the interval specified when keepalive was enabled, when no data packets have been sent, to maintain the connection state of the link (e.g., NAT devices).

What locks are in the timer?

// A Timer manages time-based aspects of the WireGuard protocol.
// Timer roughly copies the interface of the Linux kernel's struct timer_list.
type Timer struct {
	*time.Timer
	modifyingLock sync.RWMutex
	runningLock   sync.Mutex
	isPending     bool
}

There are two locks: a modifying lock (read-write lock) and a running lock (mutex).

The modifying lock is a read-write lock that allows concurrent reads of the isPending variable to determine whether the timer is active (started but not yet fired). When conditions are met, the lock enables atomic modification of the isPending value and the timer’s state.

func (timer *Timer) IsPending() bool {
	// Allow concurrent reads
	timer.modifyingLock.RLock()
	defer timer.modifyingLock.RUnlock()
	return timer.isPending
}

func (peer *Peer) NewTimer(expirationFunction func(*Peer)) *Timer {
	timer := &Timer{}
	timer.Timer = time.AfterFunc(time.Hour, func() {
		timer.runningLock.Lock()
		defer timer.runningLock.Unlock()

		// Acquire write lock
		timer.modifyingLock.Lock()
		// When the timer fires, the state variable needs to be reset from true to false
		// If already false, do not modify, just release the lock. See the running lock section below (synchronous deletion)
		if !timer.isPending {
			timer.modifyingLock.Unlock()
			return
		}
		// Otherwise, modify the variable and release the lock
		timer.isPending = false
		timer.modifyingLock.Unlock()

		expirationFunction(peer)
	})
	timer.Stop()
	return timer
}

func (timer *Timer) Mod(d time.Duration) {
	// When conditions are met, call the timer's Mod method to change the state variable from false to true
	// Then reset the timer to start a new countdown
	timer.modifyingLock.Lock()
	timer.isPending = true
	timer.Reset(d)
	timer.modifyingLock.Unlock()
}

func (timer *Timer) Del() {
	// When conditions are met, call the timer's Del method to change the state variable from true to false
	// Then stop the timer (timer was started but not yet fired)
	timer.modifyingLock.Lock()
	timer.isPending = false
	timer.Stop()
	timer.modifyingLock.Unlock()
}

The running lock is a mutex that is acquired in the following scenarios:

When the timer fires (countdown ends), the lock is acquired when the expiration function executes, and released after the logic completes.
When a Peer is deleted or replaced, or when the TUN device is shut down (which deletes all Peers before closing), the old Peer’s timers are synchronously deleted (the running lock is used to implement synchronous deletion).

This is the code for point 1:

func (peer *Peer) NewTimer(expirationFunction func(*Peer)) *Timer {
	timer := &Timer{}
	timer.Timer = time.AfterFunc(time.Hour, func() {
		// Acquire lock when fired
		timer.runningLock.Lock()
		// Release lock on exit
		defer timer.runningLock.Unlock()

		timer.modifyingLock.Lock()
		if !timer.isPending {
			timer.modifyingLock.Unlock()
			return
		}
		timer.isPending = false
		timer.modifyingLock.Unlock()

		expirationFunction(peer)
	})
	timer.Stop()
	return timer
}

This is the code for point 2:

func (timer *Timer) Del() {
	timer.modifyingLock.Lock()
	timer.isPending = false
	timer.Stop()
	timer.modifyingLock.Unlock()
}

func (timer *Timer) DelSync() {
	// Do not wait for the lock; immediately attempt atomic modification to stop the timer
	// A running but unfired timer is stopped immediately
	timer.Del()
	// Synchronous deletion guarantees:
	// 1) If the timer was fired before being stopped, the expiration logic will execute to completion before the timer close operation is performed.
	// 2) After the stop operation executes and the runningLock is successfully acquired (during this Del method's execution), any subsequent timer firings will skip the expiration logic.
	//
	// The logic here is: Del sets isPending to false. When runningLock is released, before calling the expiration function, isPending is checked. If false, skip. See the next code block.
	timer.runningLock.Lock()
	timer.Del()
	timer.runningLock.Unlock()
}

func (peer *Peer) NewTimer(expirationFunction func(*Peer)) *Timer {
	timer := &Timer{}
	timer.Timer = time.AfterFunc(time.Hour, func() {
		timer.runningLock.Lock()
		defer timer.runningLock.Unlock()

		timer.modifyingLock.Lock()
		// This ensures the subsequent `expirationFunction` will not be triggered, because the Del method set isPending to false, so the lock is released and it returns immediately.
		if !timer.isPending {
			timer.modifyingLock.Unlock()
			return
		}
		timer.isPending = false
		timer.modifyingLock.Unlock()

		expirationFunction(peer)
	})
	timer.Stop()
	return timer
}

Endpoint Update Scenarios

When does the initiator update the responder’s endpoint address?

Case 1: Via configuration file, specifying the responder’s endpoint address:

[Interface]
PrivateKey = WMWozvjIgpA2h75juoku2btWxbJ54i4Yt0A0RhpW7V8=
ListenPort = 51820
Address = 10.70.0.4/16
MTU = 1280

[Peer]
PublicKey = CmmeC0yqofMMZhEGHuK5dd2Mxyxe7tA8wSniDWiI5V0=
PresharedKey = KKQGN01IkZ1kJD2fAxtDZ6k5VFAI2fMca2q+SV7OrGE=
# Specifying the responder's public address and port
Endpoint = 39.101.166.124:51820
AllowedIPs = 10.70.0.0/16
PersistentKeepalive = 15

Case 2: Via command line:

wg set demo peer CmmeC0yqofMMZhEGHuK5dd2Mxyxe7tA8wSniDWiI5V0= endpoint 39.101.166.124:51820

Case 3: After successfully receiving a handshake response packet, the packet’s source IP is set as the endpoint address (destination address for outbound data packets).

Case 4: After successfully receiving a valid data packet, the packet’s source IP is set as the endpoint address (destination address for outbound data packets).

When does the responder update the initiator’s endpoint address?

Upon successfully receiving the initial handshake packet. The source IP is set as the endpoint address.
After the handshake completes, upon receiving the first data packet. The source IP is set as the endpoint address.
After processing the last valid data packet in a batch of response packets received via a system call. The source IP is set as the endpoint address.

Special Scenarios

Both sides have successfully completed a handshake (less than 180s since the last successful handshake). Now the initiator re-creates the WireGuard device (public and private keys unchanged, device restarted). The initiator’s ephemeral symmetric encryption keys are cleared. If the responder sends a data packet to the initiator at this point, what happens?

The responder encrypts the data with the old ephemeral symmetric encryption keys and sends it. After each data packet is sent, a timer is triggered:

/* Should be called after an authenticated data packet is sent. */
func (peer *Peer) timersDataSent() {
	if peer.timersActive() && !peer.timers.newHandshake.IsPending() {
		peer.timers.newHandshake.Mod(KeepaliveTimeout + RekeyTimeout + time.Millisecond*time.Duration(fastrandn(RekeyTimeoutJitterMaxMs)))
	}
}

// Function executed when the timer expires
func expiredNewHandshake(peer *Peer) {
	peer.device.log.Verbosef("%s - Retrying handshake because we stopped hearing back after %d seconds", peer, int((KeepaliveTimeout + RekeyTimeout).Seconds()))
	/* We clear the endpoint address src address, in case this is the cause of trouble. */
	peer.markEndpointSrcForClearing()
	// Begin a new handshake process here
	peer.SendHandshakeInitiation(false)
}

If no response is received within 15s, the handshake process restarts.

PS: The data packet will be lost, relying on upper-layer protocol integrity verification for retransmission.

During the handshake process, why does the initiator send a keepalive packet to the responder after consuming the response handshake packet?

After the responder sends the response handshake packet, it expects a keepalive from the initiator (which can be considered a type of data packet – an empty data packet). This tells the responder that the handshake is complete. A completed handshake updates the peer.timers.retransmitHandshake timer and resets atomic variables related to handshake retries, preparing for the next handshake.

In the above scenario, data packet loss occurred. How does WireGuard avoid this kind of loss?

The loss described above is due to using invalidated ephemeral symmetric encryption keys. When WireGuard receives any data packet, it checks whether the ephemeral symmetric encryption key is about to expire (only 15s of validity remaining). If so, it proactively initiates the handshake process to negotiate new ephemeral symmetric encryption keys. By handshaking early, it avoids packet loss caused by expired keys.

UAPI Socket Management

The UAPI socket is used to implement the wg command’s management of the WireGuard process. The wg command connects to the UAPI socket and sends control events (WireGuard configuration management protocol). Control events use two newline characters \n\n as the terminator for each operation. For example: listen_port=51820\n\nendpoint=1.1.1.1\n\n represents two control events: configuring listen_port and configuring endpoint.

UAPI communicates via a Unix domain socket, which has a local socket file. This socket file uses inotify to detect when the socket file is deleted.

The file descriptor corresponding to this Unix domain socket is used for communication in the configuration management protocol.

References

A replay attack is a network attack where an attacker intercepts legitimate data transmissions and re-sends (replays) them to deceive the receiving system. The goal is typically to illegally gain access or cause improper system behavior. The key aspect of replay attacks is that the attacker does not need to decrypt or understand the actual content of the data – simply re-sending existing packets may accomplish the attack. Defense measures typically include using timestamps, sequence numbers, or one-time tokens to ensure data freshness, enabling identification and rejection of replayed packets. ↩︎
WireGuard automatically updates the response packet destination address to the source address of the last valid network packet. See: Endpoint Update Scenarios ↩︎
About forward secrecy. Suppose an attacker records all our encrypted traffic using specialized equipment and obtains our asymmetric encryption keys (public and private keys) at some future point. Because ephemeral keys are renegotiated every 3 minutes (based on asymmetric encryption keys) and used only once, the attacker cannot decrypt the past encrypted traffic. This is forward secrecy. ↩︎

Introduction#

Content#

Packet Sending and Receiving#

Data Abstraction Overview#

Rate Limiter#

Index Table#

Routing Table#

Device#

Peer#

Peer Table#

Device Queues#

Peer Queues#

Endpoint#

Handshake#

Keypair#

Keypair Collection#

Nonce#

Data Processing Logic#

Data Receiving Goroutine#

What Scenarios Start the Receiving Goroutine?#

How Is Data Received?#

How Are Received Packets Processed?#

Data Packet Processing#

Data Packet Decryption#

Data Packet Consumption#

Handshake Packet Processing#

How Is High Load Determined?#

How Do Message Authentication Codes (MAC) Work?#

How Is the Key Parameter Generated?#

When Does the Responder Send a MessageCookieReplyType Handshake Packet?#

What Does the MessageCookieReplyType Handshake Packet Contain?#

What Is the Encryption Key?#

How Is the Nonce Created?#

What Is in the Plaintext Data (Unencrypted Cookie)?#

What Is the Additional Data?#

How Is the Encrypted Cookie Value Computed?#

Why Does Cookie Encryption Use XChaCha20-Poly1305 Instead of ChaCha20-Poly1305?#

How Does the Initiator Consume the MessageCookieReplyType Response?#

When Is a Handshake Packet with msg.mac2 Sent?#

How Is Handshake Rate Limiting Implemented via Token Bucket?#

Are Handshake Packets Rate-Limited?#

How Do Timers Work in WireGuard?#

How Are Symmetric Encryption Keys Derived?#

How Is Data Sent?#

How Are Handshake Packets Created?#

How Is Connection Kept Alive?#

Is Packet Order Guaranteed?#

How Is Nonce Uniqueness Ensured?#

Keypair Rotation Process#

Timers and State#

Endpoint Update Scenarios#

Special Scenarios#

UAPI Socket Management#

References#