对象模型与内存管理

Python 中"一切皆对象"不是口号，而是有严格的底层实现。理解对象模型，才能真正掌握 Python。

一切皆 PyObject

CPython 中每个 Python 对象在 C 层面都是 PyObject 结构体：

// 简化版 CPython 源码
typedef struct _object {
    Py_ssize_t ob_refcnt;    // 引用计数
    PyTypeObject *ob_type;   // 指向类型对象的指针
} PyObject;

这意味着：

每个对象都知道自己的类型（ob_type）
每个对象都有引用计数（ob_refcnt）
type(x) 本质是读取 x->ob_type

python

x = 42
print(type(x))          # <class 'int'>
print(type(x) is int)   # True
print(isinstance(x, object))  # True，int 继承自 object

类型也是对象

Python 的类型系统是自举的：

int        是 type 的实例
str        是 type 的实例
type       是 type 的实例（自身！）
object     是 type 的实例
int        继承自 object
type       继承自 object

python

print(type(int))     # <class 'type'>
print(type(type))    # <class 'type'>
print(type(object))  # <class 'type'>

print(int.__bases__)     # (<class 'object'>,)
print(object.__bases__)  # ()
print(type.__bases__)    # (<class 'object'>,)

引用计数机制

基本原理

python

import sys

a = "hello"
print(sys.getrefcount(a))  # 2（变量 a + 函数参数）

b = a
print(sys.getrefcount(a))  # 3

c = [a, a]
print(sys.getrefcount(a))  # 5（b + list[0] + list[1] + getrefcount参数 + a）

del b
del c
print(sys.getrefcount(a))  # 2

引用计数的优缺点

优点：

对象一旦引用计数归零，立即释放内存（确定性析构）
with 语句、文件关闭等依赖此特性

缺点：

循环引用无法自动回收
多线程下引用计数修改需要 GIL 保护（性能瓶颈之一）

垃圾回收：分代收集

python

import gc

# 查看 gc 配置
print(gc.get_threshold())  # (700, 10, 10) — 三代阈值

# 查看各代对象数量
print(gc.get_count())

# 手动触发
collected = gc.collect()
print(f"回收了 {collected} 个对象")

# 禁用 gc（高性能场景）
gc.disable()
# ... 性能敏感代码 ...
gc.enable()

三代模型：

第 0 代（新生代）：分配 700 次触发扫描
第 1 代（中生代）：第 0 代扫描 10 次触发
第 2 代（老生代）：第 1 代扫描 10 次触发

对象在每次 GC 存活后晋升到下一代，老对象扫描频率更低。

小整数缓存

CPython 预先创建 -5 到 256 的整数对象，复用而不重新分配：

python

a = 100
b = 100
print(a is b)   # True — 同一个对象

a = 1000
b = 1000
print(a is b)   # False — 不同对象（超出缓存范围）

# 字符串驻留（interning）
s1 = "hello"
s2 = "hello"
print(s1 is s2)  # True — 字符串驻留

s1 = "hello world"
s2 = "hello world"
print(s1 is s2)  # 可能 False（含空格的字符串不一定驻留）

重要

永远用 == 比较值，用 is 只比较身份（是否同一对象）。is 的行为依赖 CPython 实现细节，不可依赖。

`slots` 优化内存

默认情况下，每个实例都有 __dict__（一个字典），占用大量内存：

python

import sys

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

class PointSlots:
    __slots__ = ('x', 'y')
    def __init__(self, x, y):
        self.x = x
        self.y = y

p1 = Point(1, 2)
p2 = PointSlots(1, 2)

print(sys.getsizeof(p1))           # ~48 bytes（不含 __dict__）
print(sys.getsizeof(p1.__dict__))  # ~232 bytes
print(sys.getsizeof(p2))           # ~56 bytes（无 __dict__）

# 大量实例时内存差异显著
import tracemalloc
tracemalloc.start()

points_normal = [Point(i, i) for i in range(100_000)]
snapshot1 = tracemalloc.take_snapshot()

points_slots = [PointSlots(i, i) for i in range(100_000)]
snapshot2 = tracemalloc.take_snapshot()

弱引用

弱引用不增加引用计数，常用于缓存：

python

import weakref

class Resource:
    def __del__(self):
        print("Resource 被回收")

obj = Resource()
weak = weakref.ref(obj)

print(weak())   # <Resource object>
del obj
print(weak())   # None — 对象已被回收，打印 "Resource 被回收"

实际应用：

python

import weakref

# 弱引用字典缓存（值被回收时自动移除）
cache = weakref.WeakValueDictionary()

class BigData:
    def __init__(self, data):
        self.data = data

data = BigData([1, 2, 3])
cache['key'] = data

print('key' in cache)  # True
del data
print('key' in cache)  # False — 自动清理

对象协议（魔术方法）

Python 的运算符和内置函数都通过协议（魔术方法）实现：

python

class Vector:
    def __init__(self, x, y):
        self.x, self.y = x, y

    def __repr__(self):
        return f"Vector({self.x}, {self.y})"

    def __add__(self, other):
        return Vector(self.x + other.x, self.y + other.y)

    def __len__(self):
        return 2

    def __getitem__(self, index):
        return (self.x, self.y)[index]

    def __eq__(self, other):
        return self.x == other.x and self.y == other.y

    def __hash__(self):
        return hash((self.x, self.y))

v1 = Vector(1, 2)
v2 = Vector(3, 4)
print(v1 + v2)    # Vector(4, 6)
print(len(v1))    # 2
print(v1[0])      # 1
print(v1 == Vector(1, 2))  # True

内存视图 `memoryview`

零拷贝访问支持缓冲区协议的对象：

python

data = bytearray(b"Hello, World!")
view = memoryview(data)

# 切片不复制数据
sub = view[7:12]
print(bytes(sub))  # b'World'

# 修改原始数据
view[0] = ord('h')
print(data)  # bytearray(b'hello, World!')

性能提示

处理大型二进制数据时，优先使用 memoryview 避免不必要的内存拷贝。

对象模型与内存管理 ​

一切皆 PyObject ​

类型也是对象 ​

引用计数机制 ​

基本原理 ​

引用计数的优缺点 ​

垃圾回收：分代收集 ​

小整数缓存 ​

__slots__ 优化内存 ​

弱引用 ​

对象协议（魔术方法） ​

内存视图 memoryview ​

对象模型与内存管理

一切皆 PyObject

类型也是对象

引用计数机制

基本原理

引用计数的优缺点

垃圾回收：分代收集

小整数缓存

`slots` 优化内存

弱引用

对象协议（魔术方法）

内存视图 `memoryview`