跳到内容Skip to content

benchmarks / cost envelope

性能和 token 成本 Performance and token cost

Psyche 的目标不是“更复杂”,而是把复杂度压进本地状态机,把宿主可见的接口保持得很窄。它真正应该被担心的是行为质量,不是延迟。 Psyche is not trying to be more complex for its own sake. It pushes complexity into the local state machine while keeping the host-visible surface narrow. The thing to worry about should be behavior quality, not latency.

安装包 `psyche-ai`,源码仓库 `oasyce_psyche`,官网 `psyche.oasyce.com`。 Package `psyche-ai`, source repo `oasyce_psyche`, website `psyche.oasyce.com`.
0 额外模型调用 extra model calls 情绪与关系推理全部本地完成 all emotion and relation logic stays local
0.191ms hot path p50 hot path p50 本地 quick benchmark local quick benchmark
1.05ms hot path p95 hot path p95 processInput() 级别 at the processInput() level
15-180 compact tokens compact tokens 中性态到活跃态的典型注入区间 typical neutral-to-active injection envelope
为什么不是“更重” why this is not heavier 和 persona prompt / memory layer 对比 compare against persona prompts and memory layers 安装形态 install modes 不同安装形态的更新与运行差异 runtime and upgrade differences by install mode 看 live 行为 watch the live demo 确认这些成本换来了什么 see what these costs actually buy you
local only没有“先问一个模型判断情绪”这类隐性慢路径。No hidden “ask another model to infer emotion” path.
ABI first宿主主要消费结构化控制面,而不是吃长篇散文式 prompt。Hosts consume a structural control surface rather than long narrative prompts.
cost grows slowly长线程会增厚记忆和元认知,但不是爆炸式增长。Long threads thicken memory and metacognition gradually, not exponentially.
指标Metric 当前值Current value 说明Meaning
`processInput()` p50 `0.191ms` 热路径在毫秒以下,不会成为感知延迟来源。The hot path is sub-millisecond and not a noticeable source of latency.
`processInput()` p95 `1.05ms` 高位分位也仍然极轻,主要延迟仍来自模型和网络。Even the tail remains light; model/network still dominate latency.
Extra LLM calls `0` 状态更新、关系判断、元认知调节都在本地执行。State updates, relation logic, and regulation all execute locally.
模式Mode 典型成本Typical cost 何时出现When it appears
中性 compactNeutral compact `~15 tokens` 状态接近基线时只保留最小控制信息。When state is near baseline, Psyche emits only the smallest control surface.
活跃 compactActive compact `~100-180 tokens` 有明显主体偏置、关系余震或调节动作时。When there is meaningful residue, relation carry, or regulation action.
完整协议Full protocol `~550 tokens` 主要用于调试和研究,不是推荐默认值。Mostly for debugging and research, not the recommended default.

算法做计算Algorithms do the computation

LLM 不需要”理解自我状态系统”,只需要执行已经压好的行为约束。The LLM does not need to understand the self-state system; it simply obeys compressed behavioral constraints.

工作面不再误伤Task replies stay useful

双 profile 让“更像活着”和“仍然能工作”不再互相牺牲。Dual reply profiles stop “feels alive” from fighting “still gets work done.”

能解释真实差异The cost buys a visible difference

这点成本换来的不是文案花哨,而是跨轮残留、关系特异性和修复迟滞。These costs buy cross-turn residue, partner-specific meaning, and repair hysteresis rather than decorative language.