記一次內存佔用異常排查 —— memory ballast 被分配了物理內存

Howard Cheung 收录于类别 Development

2023-03-12 2023-03-12 约 2201 字预计阅读 5 分钟

memory ballast 的概念這裏不再贅述，相信在使用 Golang 的讀者應該都知道。確實不瞭解的話可以閱讀提出這個概念的文章，裏面有詳細的描述。這幾年，ballast 被大量運用，在大家的認知裏，ballast 是降低 GC 頻率的一個簡單、實用的方法，我也一直沒有看到過關於它的負面報道 —— 直到這次之前。

在 golang-nuts 郵件列表中，也有一個關於這個問題的討論，但對 golang 瞭解不多，或者英文不太好的讀者可能會一頭霧水。爲此，本文會做一個比較簡單易懂的概述。如果有錯誤，歡迎指正。

背景

最近遇到，總有一小部分實例，內存（RSS）佔用比其他實例大。而且和正常的實例相比，經過反覆排查也沒有看出它們的環境有明顯的差異。

後面發現，這些實例的 ballast 都整個地被分配了物理內存，並且是啓動、創建 ballast 時就這樣。

原因

OS（熟悉者可跳過）

衆所周知，現代操作系統，尤其是類 Unix 系統中，虛擬內存機制被廣泛使用。用戶進程對內存的申請、訪問等都是在虛擬地址空間內進行的，當進程訪問內存時，纔會通過“缺頁異常”中斷，調入對應的內存分頁。

比如，當 Go runtime 申請了一塊大小爲 1GB 的連續內存時，會在虛擬地址空間中得到一段長度爲 1GB 的空間，但在它被訪問之前，OS 並不會調入對應的物理內存分頁，此時也不會佔用 1GB 的物理內存。這是 ballast 的理論基礎。

ballast 通常的實現是，申請一個大切片，並設置它爲 alive（防止 Go 幫倒忙把它優化掉），然後保持它存在但永不訪問它，這樣結果就不會佔用物理內存，同時會佔着堆內存，使得 GC 的觸發頻率降低。

而事實上卻出現了 ballast 佔用物理內存的情況，最容易想到的原因是 Go runtime 在創建 ballast 大切片時訪問了它。

Go runtime

在 Go 的內存分配機制中，大於 32KB 的內存屬於大內存，是通過 mheap 分配的。Go 語言原本對應章節中有提到一個“清零”操作。如果在分配 ballast 的內存時，發生了這個清零操作，結果似乎就是會發生 ballast 喫內存的情況。Go 語言原本里沒有介紹如何判斷是否需要清零。

關於清零，在開頭提到的郵件列表裏，Golang 團隊的開發者，也是下文將提到的 go1.19 GC 相關新特性的提出者，Michael Knyszek 進行了一段回覆（譯文）：

runtime 有一個簡單的啓發式方法來避免清零操作，但它遠非完美。因此，ballast 本質上總是會有一點風險。在某些平臺上尤其如此，例如 Windows，因爲無法避免將內存標記爲已提交（Windows 可以自由地對範圍內的內存使用按需分頁，因此整體系統內存壓力可能會增加，但您不能避免將其計爲特定進程的已提交）。

判斷的具體邏輯（Github 地址）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62


// allocNeedsZero checks if the region of address space [base, base+npage*pageSize),
// assumed to be allocated, needs to be zeroed, updating heap arena metadata for
// future allocations.
//
// This must be called each time pages are allocated from the heap, even if the page
// allocator can otherwise prove the memory it's allocating is already zero because
// they're fresh from the operating system. It updates heapArena metadata that is
// critical for future page allocations.
//
// There are no locking constraints on this method.
func (h *mheap) allocNeedsZero(base, npage uintptr) (needZero bool) {
	for npage > 0 {
		ai := arenaIndex(base)
		ha := h.arenas[ai.l1()][ai.l2()]

		zeroedBase := atomic.Loaduintptr(&ha.zeroedBase)
		arenaBase := base % heapArenaBytes
		if arenaBase < zeroedBase {
			// We extended into the non-zeroed part of the
			// arena, so this region needs to be zeroed before use.
			//
			// zeroedBase is monotonically increasing, so if we see this now then
			// we can be sure we need to zero this memory region.
			//
			// We still need to update zeroedBase for this arena, and
			// potentially more arenas.
			needZero = true
		}
		// We may observe arenaBase > zeroedBase if we're racing with one or more
		// allocations which are acquiring memory directly before us in the address
		// space. But, because we know no one else is acquiring *this* memory, it's
		// still safe to not zero.

		// Compute how far into the arena we extend into, capped
		// at heapArenaBytes.
		arenaLimit := arenaBase + npage*pageSize
		if arenaLimit > heapArenaBytes {
			arenaLimit = heapArenaBytes
		}
		// Increase ha.zeroedBase so it's >= arenaLimit.
		// We may be racing with other updates.
		for arenaLimit > zeroedBase {
			if atomic.Casuintptr(&ha.zeroedBase, zeroedBase, arenaLimit) {
				break
			}
			zeroedBase = atomic.Loaduintptr(&ha.zeroedBase)
			// Double check basic conditions of zeroedBase.
			if zeroedBase <= arenaLimit && zeroedBase > arenaBase {
				// The zeroedBase moved into the space we were trying to
				// claim. That's very bad, and indicates someone allocated
				// the same region we did.
				throw("potentially overlapping in-use allocations detected")
			}
		}

		// Move base forward and subtract from npage to move into
		// the next arena, or finish.
		base += arenaLimit - arenaBase
		npage -= (arenaLimit - arenaBase) / pageSize
	}
	return
}

注：原子操作 Casuintptr 的作用是，如果 p1 == p2，則 p1 = p3 並 return 1；否則無操作，return 0。

它會去遍歷分配該塊內存將涉及到的各個 arena（Go 內存分配中的一類大對象，詳見Go 語言原本），分別檢查它們的 zeroedBase（值越大說明無需清零的內存越少），判斷是否需要清零，並會增大 zeroedBase 的值。即，它的值可以理解爲已被分配過、需要清零的值。需要注意的是，只要有一個 arena 符合 arenaBase < zeroedBase，都是整體地返回 true。

因此我們可以看出，arena 裏已經被分配過的內存，再次分配給 ballast 時，就很容易出現開頭描述的問題。因爲 ballast 通常都是在啓動早期創建的，在它之前分配的內存一般並不多，因此這是個概率較小的事件，但確實存在。

建議

對於仍在繼續使用 ballast 的讀者，爲了預防此問題，建議考慮以下方案。

memory target

這是 1.19 的新功能，可以設置一個固定數值的，GC 觸發的目標堆大小。有兩種方法：

環境變量 GOMEMLIMIT=數字，單位 byte
debug.SetMemoryTarget(limit int64)，單位也是 byte

這個功能是爲了替代 ballast 設計的，當它被設置後，runtime 會通過多種方法，包括調整 GC 觸發頻率、返還內存給操作系統的頻率等，儘量使內存不超過它。它測量內存是否達到限制的指標是 go runtime 管理的所有內存，相當於 memStats 中 Sys - HeapReleased 的值。它的效果理論上類似且優於 ballast。

使用它限制內存時，可以關閉按比例的 GC（GOGC=off），或將其比例調大。

不過，它和 ballast 一樣，不是硬限制，不要把它的值設置爲環境允許的內存佔用極限。

gc tuner

對於舊版本的 golang，還有一個方案是由 uber 提出的的。思路是動態地調整 GC 觸發的比例。有兩個開源實現：cch123/gogctuner、bytedance/gopkg/util/gctuner。

仍然使用 ballast

如果想繼續使用 ballast ，我以下兩點可能有助於降低該問題發生的概率：

儘量早創建 ballast
在創建 ballast 前關閉 GC