记一次内存占用异常排查 —— memory ballast 被分配了物理内存

Howard Cheung 收录于类别 Development

2023-03-12 2023-03-12 约 2248 字预计阅读 5 分钟

memory ballast 的概念这里不再赘述，相信在使用 Golang 的读者应该都知道。确实不了解的话可以阅读提出这个概念的文章，里面有详细的描述。这几年，ballast 被大量运用，在大家的认知里，ballast 是降低 GC 频率的一个简单、实用的方法，我也一直没有看到过关于它的负面报道 —— 直到这次之前。

在 golang-nuts 邮件列表中，也有一个关于这个问题的讨论，但对 golang 了解不多，或者英文不太好的读者可能会一头雾水。本文会对这个问题的来龙去脉做一个简单易懂的概述。如果有错误，欢迎指正。

背景

最近遇到，总有一小部分实例，内存（RSS）占用比其他实例大。而且和正常的实例相比，经过反复排查也没有看出它们的环境有明显的差异。

后面发现，这些实例的 ballast 都整个地被分配了物理内存，并且是启动、创建 ballast 时就这样。

原因

OS（熟悉者可跳过）

众所周知，现代操作系统，尤其是类 Unix 系统中，虚拟内存机制被广泛使用。用户进程对内存的申请、访问等都是在虚拟地址空间内进行的，当进程访问内存时，才会通过“缺页异常”中断，调入对应的内存分页。

比如，当 Go runtime 申请了一块大小为 1GB 的连续内存时，会在虚拟地址空间中得到一段长度为 1GB 的地址，但在它被访问之前，OS 并不会调入对应的物理内存分页，此时也不会占用 1GB 的物理内存。这是 ballast 的理论基础。

ballast 通常的实现是，申请一个大切片，并设置它 KeepAlive（防止 Go 帮倒忙把它优化掉），然后保持它存在但永不访问它，这样结果就不会占用物理内存，同时会占着堆内存，使得 GC 的触发频率降低。

而事实上却出现了 ballast 占用物理内存的情况，最容易想到的原因是 Go runtime 在创建 ballast 大切片时访问了它。

Go runtime

在 Go 的内存分配机制中，大于 32KB 的内存属于大内存，是通过 mheap 分配的。Go 语言原本对应章节中有提到一个“清零”操作。如果在分配 ballast 的内存时，发生了这个清零操作，结果似乎就是会发生 ballast 吃内存的情况。Go 语言原本里没有介绍如何判断是否需要清零。

关于清零，在开头提到的邮件列表里，Golang 团队的开发者，也是下文将提到的 go1.19 GC 相关新特性的提出者，Michael Knyszek 进行了一段回复（译文）：

runtime 有一个简单的启发式方法来避免清零操作，但它远非完美。因此，ballast 本质上总是会有一点风险。在某些平台上尤其如此，例如 Windows，因为无法避免将内存标记为已提交（Windows 可以自由地对范围内的内存使用按需分页，因此整体系统内存压力可能会增加，但您不能避免将其计为特定进程的已提交）。

判断的具体逻辑（Github 地址）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62


// allocNeedsZero checks if the region of address space [base, base+npage*pageSize),
// assumed to be allocated, needs to be zeroed, updating heap arena metadata for
// future allocations.
//
// This must be called each time pages are allocated from the heap, even if the page
// allocator can otherwise prove the memory it's allocating is already zero because
// they're fresh from the operating system. It updates heapArena metadata that is
// critical for future page allocations.
//
// There are no locking constraints on this method.
func (h *mheap) allocNeedsZero(base, npage uintptr) (needZero bool) {
	for npage > 0 {
		ai := arenaIndex(base)
		ha := h.arenas[ai.l1()][ai.l2()]

		zeroedBase := atomic.Loaduintptr(&ha.zeroedBase)
		arenaBase := base % heapArenaBytes
		if arenaBase < zeroedBase {
			// We extended into the non-zeroed part of the
			// arena, so this region needs to be zeroed before use.
			//
			// zeroedBase is monotonically increasing, so if we see this now then
			// we can be sure we need to zero this memory region.
			//
			// We still need to update zeroedBase for this arena, and
			// potentially more arenas.
			needZero = true
		}
		// We may observe arenaBase > zeroedBase if we're racing with one or more
		// allocations which are acquiring memory directly before us in the address
		// space. But, because we know no one else is acquiring *this* memory, it's
		// still safe to not zero.

		// Compute how far into the arena we extend into, capped
		// at heapArenaBytes.
		arenaLimit := arenaBase + npage*pageSize
		if arenaLimit > heapArenaBytes {
			arenaLimit = heapArenaBytes
		}
		// Increase ha.zeroedBase so it's >= arenaLimit.
		// We may be racing with other updates.
		for arenaLimit > zeroedBase {
			if atomic.Casuintptr(&ha.zeroedBase, zeroedBase, arenaLimit) {
				break
			}
			zeroedBase = atomic.Loaduintptr(&ha.zeroedBase)
			// Double check basic conditions of zeroedBase.
			if zeroedBase <= arenaLimit && zeroedBase > arenaBase {
				// The zeroedBase moved into the space we were trying to
				// claim. That's very bad, and indicates someone allocated
				// the same region we did.
				throw("potentially overlapping in-use allocations detected")
			}
		}

		// Move base forward and subtract from npage to move into
		// the next arena, or finish.
		base += arenaLimit - arenaBase
		npage -= (arenaLimit - arenaBase) / pageSize
	}
	return
}

注：原子操作 Casuintptr 的作用是，如果 p1 == p2，则 p1 = p3 并 return 1；否则无操作，return 0。

它会去遍历此次分配内存将涉及到的各个 arena（Go 内存分配中的一类大对象，详见 Go 语言原本），分别检查它们的 zeroedBase（值越大说明无需清零的内存越少），判断是否需要清零，并会增大 zeroedBase 的值。即，它的值可以理解为已被分配过、需要清零的值。需要注意的是，只要有一个 arena 符合 arenaBase < zeroedBase，都是整体地返回 true。

可以看出，arena 里已经被分配过又回收的内存，再次分配给 ballast 时，这次分配就会被判断为需要清零，进而出现开头描述的问题。因为 ballast 通常都是在启动早期创建的，在它之前分配的内存很少，所以这是个概率较小的事件，但确实存在。

建议

对于仍在继续使用 ballast 的读者，为了预防此问题，建议考虑以下方案替代它。

memory target

这是 1.19 的新功能，可以设置一个固定数值的，GC 触发的目标堆大小。有两种方法：

环境变量 GOMEMLIMIT。设置为数字，单位 byte；也可以用数字加单位如 1MiB，1GiB。
debug.SetMemoryTarget(limit int64)，单位也是 byte

这个功能是为了替代 ballast 设计的，当它被设置后，runtime 会通过多种方法，包括调整 GC 触发频率、返还内存给操作系统的频率等，尽量使内存不超过它。它测量内存是否达到限制的指标是 go runtime 管理的所有内存，相当于 memStats 中 Sys - HeapReleased 的值。它的效果理论上类似且优于 ballast。

使用它限制内存时，可以关闭按比例的 GC（GOGC=off），或将其比例调大。

不过，它和 ballast 一样，不是硬限制，不要把它的值设置为环境允许的内存占用极限。

gc tuner

对于旧版本的 golang，还有一个方案是由 uber 提出的的。思路是动态地调整 GC 触发的比例。有两个开源实现：cch123/gogctuner、bytedance/gopkg/util/gctuner。

仍然使用 ballast

如果想继续使用 ballast ，我想以下两点可能有助于降低该问题发生的概率：

尽量早创建 ballast
在创建 ballast 前关闭 GC