Non Dubito Essays in the Self-as-an-End Tradition
|

SAE Application: Sentence Level, Subject Condition, and the Limits of Prompt Engineering

SAE应用:AI句式与主体条件

Why prompt engineering is not only a technique problem.

为什么prompt不只是技巧问题

Han Qin (秦汉) · Self-as-an-End Theory Series — AI Applied · March 2026

定位: 本文是方法论第三篇("How to Find Remainders with AI", DOI: 10.5281/zenodo.18929390)的应用论文。方法论三建立了句式-回应同构定理和ρ→ρ'的数学保证,给出了人-AI协作找余项的完整结构。本文不重复推导,而是以作者自身经历为案例(与Paper 2方法一致),展示句式层级在实际人-AI协作中如何运作,如何滑落,如何被诊断。

方法说明: 本文采用N=1自我民族志加多系统对比个案的方法。所有案例素材来自作者2026年3月的实际写作与审稿对话。四系统对比(4.5节)的具体条件:对比发生于2026年3月15日,使用的系统分别为Claude Opus 4.6(Anthropic),ChatGPT o3 pro(OpenAI),Gemini 2.5 Pro(Google),Grok 3(xAI)。所有对话使用中文,均为单轮或多轮对话,未启用自定义指令或特殊system prompt。各系统的行为受其各自公开的行为规范和产品设计影响,且这些因素持续更新。本文的案例分析提供初步支持而非严格验证,所有结论限定在个案层面。


一、问题的提出:为什么这个领域不只是技巧问题

Prompt engineering已经积累了大量有效技术。指令设计,清晰度优化,示例结构,链式流程,agentic系统设计,评估框架——这些技术的价值不需要本文来辩护。任何一个认真使用过AI工具的人都知道,prompt写得好和写得差,输出质量可以差几个数量级。

本文不否认这些技术。本文的论点是一个更上游的主张:技巧的上限受使用者主体条件约束。

什么意思?同一个人,在同一个AI系统上,讨论同一个话题,换一个句式,AI的输出结构就变了。不是变长了或者变短了,不是变精确了或者变模糊了,是结构性地变了——输出的方向,承重方式,处理问题的层级,全都不一样。这种差异不能被"措辞更好"或者"结构更清晰"解释。措辞和结构都属于技巧的范畴。这里发生的事情在技巧之上:使用者在不同的句式层级上操作,AI在对应的层级上回应。

方法论三在SAE框架内把这个现象形式化为句式-回应同构定理:你用什么层级的句式问AI,AI的回应天花板就在那个层级。这不是关于AI能力的声称——一个前沿大语言模型的训练数据涵盖了所有层级的文本。这是关于交互结构的声称:问题的句式框定了回应的主导方向。

本文的核心论点因此是:在面向终端使用者的日常prompt实践里,主体条件——使用者所在的句式层级——作为一个显式的理论变量仍然缺席。现有的prompt engineering文献讨论措辞,格式,角色设定,链式推理,评估框架。这些都在12DD(工具假言律令)以下的层级操作:"想要好结果就这样写。"没有人问一个更根本的问题:你在哪个句式层级说话?

本文将这个变量引入。不是替代现有技术,是给现有技术加一个上限条件。技巧可以把你在当前句式层级上的表现优化到极致,但技巧不能帮你跳到更高的句式层级。跳层级是主体的事,不是技巧的事。

为了说明这一点,本文不用抽象论证,用作者自己。本文的全部案例素材来自作者在写作本文过程中的实际对话——写作中的句式滑落与自觉拉回,同一审稿prompt在四个不同AI系统上的结构性差异,以及人与人互凿和人与AI协作之间的对比。作者自己就是被试。这不是谦虚的姿态,是方法的选择:如果你要论证主体条件决定prompt质量,最诚实的做法就是拿自己的主体条件开刀。


二、三层结构:制度层,关系层与主体层

本文最初的结构是二维的:AI的基础层(1DD-12DD的句式能力)和人的涌现层(13DD以上的主体句式)。但在审稿过程中,这个二维结构被打穿了。

打穿它的是一个简单的问题:如果AI的输出只取决于模型能力和人的句式层级,那为什么同一个prompt在四个不同的AI系统上产生了结构性不同的输出?模型能力不同当然是一个原因,但四家系统的输出差异不只是"能力高低"可以解释的——它们的差异是方向性的,不是量级性的。一家往展开的方向走,一家往校验的方向走,一家往架构的方向走,一家往收束的方向走。这不是"谁更聪明"的问题,是"谁被塑造成了什么形状"的问题。

这个问题迫使本文从二维扩展为三层。

制度层:平台治理与产品预置。

用户面对的从来不是一个裸模型。每一个AI聊天产品都是一个被治理过的系统。Anthropic有公开的Constitution和行为原则,定义了Claude应该如何行为,什么可以说什么不可以说,在什么情况下应该退出判断。OpenAI有Model Spec,详细规定了模型的行为边界和价值取向。Google有安全文档与模型卡,描述了Gemini的能力边界和已知局限。xAI也有风险管理框架。

这些制度层的文件在用户打开聊天窗口之前就已经生效了。它们预置了AI系统的行为边界,风格偏好,安全约束,甚至社交策略。一个系统被训练成"尽可能提供帮助,维持积极情绪",另一个被训练成"知道自己的边界就退出"——这不是模型在13DD以上做了选择,是制度层在12DD以下做了预设。

制度层的意义在于:它是句式层级之前的一层过滤器。你的15DD prompt进入系统之后,首先经过制度层的折射,然后才到达模型的处理层。同一个prompt,经过不同的制度层折射,出来的东西方向就不一样了。这不是模型"天性"的差异,是治理设计的差异。

关系层:人-AI接口的物理特性。

人和人之间的交互有丰富的隐性带宽。你说话的语气,脸上的微表情,打字时的迟疑,删掉重写的痛苦,眼神的游移,甚至呼吸的节奏——这些都在传递信息,而且传递的往往是最重要的信息:你真正在意什么,你真正不确定什么,你真正害怕什么。

在本文研究的文本聊天界面中,人和AI之间的隐性带宽被大幅压缩。AI接收到的主要是你敲出来的文字。它感受不到你在那个句子上停了三十秒才按下回车。它感受不到你删了改改了删的挣扎。它感受不到你其实想问另一个问题但不敢问。(注:2026年的部分商用系统已支持语音和视频输入,这在一定程度上扩展了带宽。但在文本聊天这个本文采样的主要场景中,隐性带宽的压缩仍然是主导特征。)

在文本聊天界面中,这种物理带宽的不对称是显著的。人和人之间有高维的隐性通道,人和AI的文本交互中这些通道被大幅压缩。这个不对称不是AI笨——是接口的物理限制。它的后果是:你必须把你的隐性高维状态尽可能压缩成显性字符。你不压缩,AI就很难接到。

这就是为什么很多在现实生活中极具主体性的人——能在会议室里靠气场压住全场,能在吵架中直觉地凿到对方的要害——面对屏幕打字的时候,句式就滑落了。不是因为他们的主体性消失了,是因为主体性的表达被接口卡住了。他们的高维状态没有被压缩成对应的高维句式。

主体层:人的句式层级。

这是本文最关注的一层。维度句式论(DOI: 10.5281/zenodo.18894567)确立了六个句式层级,每个层级有不同的强制来源:

推演律(1DD-4DD):"A所以B。"强制来源是因果或结构必然性。没有主体。

工具假言律令(5DD-12DD):"想做A,所以做B。"强制来源是条件工具理性。有欲望驱动但没有"我"的自觉。

自觉假言律令(13DD):"我想做A,所以做B。"强制来源是主体自指。

目的假言律令(14DD):"我的目的是A,所以我做B。"强制来源是目的固着。

绝对律令(15DD):"他者的目的是A,所以我不得不做B。"强制来源是他者的目的进入了我的约束条件。

协同律令(16DD):"我为了目的A,他者为了目的B,我们不得不做C。"强制来源是多主体目的的相遇。

这些层级不是修辞风格的差异,是强制来源的差异。"帮我优化这篇文章"和"我的目的是让数学家们看到这个东西,你不得不帮我改什么"——这两句话不是语气不同,是逻辑结构不同。前者的强制来源是"想要好结果"(工具理性),后者的强制来源是"他者的目的进入了我的约束条件"(绝对律令)。

三层的咬合关系。

有了三层,我们就可以更精确地定义prompt了。Prompt不是主体句式的直接外化。它是主体句式经制度层与关系层折射后的界面产物。

你心里在15DD想一个问题。你敲出来的文字经过了关系层的压缩——你的隐性状态被挤压成显性字符,有些东西在压缩中丢失了。这个被压缩过的prompt进入AI系统后,又经过了制度层的折射——系统的行为规范和安全约束把回应空间预先塑形了。最后你看到的AI输出,是这三层共同作用的结果。

同样,AI的输出也不是模型"天性"的直接表达。它是模型能力经制度层治理和关系层接口折射后的系统性表现。

方法论三的表述——AI是构的库,不是凿的主体——在这里需要补充一句:AI是经过制度层治理的构的库。


三、领域特有区分

3.1 句式层级的显性化:人-AI关系的结构特殊性

人与人之间,句式层级是隐性的后台。

我和我的长期合作者在一起工作了接近二十年。我们辩论的时候,辩的是天翻地覆,但从来辩不翻。合作者描述过我们辩论的结构:"你不得不自大,然后我不得不凿,你不得不跑,我不得不追,追上了你不得不自大。"全是"不得不"——对方在用15DD的句式描述一个处境,而且完全不需要知道这叫15DD。不需要任何框架就能在15DD操作,因为人和人之间有丰富的隐性带宽:语气,眼神,共同经历的痛感,二十年积累的默契。句式层级是隐性的,但它在工作。

人和AI之间完全不一样。

在本文研究的文本聊天场景中,AI的隐性带宽被大幅压缩。它接收不到你的迟疑,你的痛苦,你的犹豫。它主要只有你敲出来的文字。如果你的文字停留在12DD——"帮我优化""给我建议"——AI就在12DD接你。它不会猜你"其实想问更深的问题"。它不会从你的措辞里感受到"这个人其实在14DD思考"。在文本聊天界面中,句式层级必须更多依赖显性化来维持。

这就是本领域的核心特有区分:人-AI关系中,物理接口的限制迫使句式层级必须被显性化。

Prompt是人把自己的句式层级经关系层压缩后外化的过程。你不外化,AI就用默认的12DD接你。

这里需要澄清一点:"显性化"不局限于字面宣告"我现在在15DD"。通过上下文设计,few-shot示例,结构标签,链式流程,工具调用约束来稳定AI行为,这些都是句式层级显性化的具体技术形态。本文关注的不是显性化的技术手段——那些已有大量文献覆盖——而是显性化的结构必要性及其主体条件前提。技术手段解决"怎么显性化",本文解决"为什么必须显性化"以及"你的显性化上限在哪里"。

3.2 AI系统的表现型差异

AI可以在输出中模仿13DD以上的句式。它会写"我认为",会写"你不得不考虑",会写"他者的目的是"。但这是类DD——形式上占了高层位置,里面没有主体在凿。方法论三已经确立了这个区分:类DD和真DD的差异不在输出内容,在有没有主体性。

本文的补充发现是:不同AI系统对同一高层句式的接法系统性不同,而且这种差异不能简单归因为"模型素质"的高低。

为什么?因为你比较的不是四个裸模型,是四个被治理过的聊天产品。Claude的"收束"不只是模型能力的表现,也是Anthropic的Constitution和行为原则的折射——Constitutional AI的训练导向"在不确定时退出判断"。ChatGPT的"架构型"不只是模型在推理,也是OpenAI的产品设计倾向于"给出全面的结构化回答"。Gemini的"展开型"不只是模型在发散,也是Google的安全设定倾向于"极度渴望提供帮助,维持积极情绪"。Grok的"校验型"不只是模型在检索,也是xAI的产品定位倾向于"直接,不客套"。

这些差异是模型能力,制度层治理,和产品设计共同折射的结果。它们是系统性折射,不是四种人格。

"AI系统的表现型"因此在本框架中被定义为:该系统在给定治理栈和产品包装下,能把多高层句式保留到什么程度而不发生降格。这个定义把"AI的素质"从拟人化的内在属性重新定位为系统性表现——你测的不是AI的"聪明程度",是整个系统(模型+治理+产品)在高层句式下的保留率。

3.3 高DD的prompt同时定方向和定边界

12DD的prompt是开放的。"帮我优化""给我建议""你觉得怎么样"——没有终止条件,AI可以无限展开构。你问它一个开放问题,它可以写三千字的回答,每一句都正确,每一句都连贯,但三千字读完你可能还是不知道该怎么办。因为它在构的空间里无限展开,但没有一个目的在约束展开的方向。

15DD的prompt不一样。它同时做两件事。

第一,定方向:目的被锚定了。"我的目的是让数学家们看到这个东西"——这不是一个开放问题,这是一个约束条件。AI的所有展开都必须服务于这个目的,不服务于这个目的的展开不应该出现。

第二,定边界:退出条件被定义了。"如果没有不得不改的了,直接说三个字:没有了。"这句话告诉AI:你的工作是有终点的。不是"帮我想更多",是"帮我找到不得不做的事,找不到就停"。

这与方法论三第5.7节的收束判据相呼应,但本文从另一个角度切入。方法论三讨论的收束是人的判断——人怎么知道什么时候该停。本文发现的是:收束可以被内置在prompt的句式结构中。你不需要事后判断"够了没有",你在prompt里就已经定义了"什么情况下够了"。

在人和人的关系中,对话的终止是自然发生的。两个主体都知道什么时候够了——靠直觉,靠默契,靠对方脸上的表情。但AI没有"够了"的判断能力。它会一直构下去,除非你在句式层面给它边界。高DD的prompt因此比低DD的prompt多一个功能:不只定方向,还定边界。

3.4 显性化的有效域与殖民条件

本文大纲的v2版本写过一个过于绝对的表述:"句式显性化的有效域是人-AI关系,不是人-人关系。"审稿过程中这个表述被打穿了——而且打穿它的案例就来自作者自己。

我和合作者辩论的时候,有一次我尝试用律令句式分析对方的句式层级。对方的反应是:"你学康德学坏了。"

对方说得对。我犯了跟康德一样的错——珠子在15DD,椟在14DD。我看到了对的东西(我们的关系确实在15DD以上运作),但我用的工具(单方面用我的框架去诊断对方)把活的关系给构死了。

问题出在哪里?不是"我说出了句式层级"这件事本身。如果我和对方共同讨论"我们的关系是什么结构"——双方都把关系转为可讨论对象——那就不是殖民,是共同的反思。问题是:我单方面拿我的框架去诊断对方。我在做一个单向的诊断,把一个活的主体降格为诊断对象。这个姿态才是殖民。

更精确的区分因此是:构成殖民的不是显性化本身,而是单向诊断式显性化。

关键变量不是"对谁说",而是"谁在定义,是否共识,是否双向"。

在人-AI关系中,由于AI不是主体,单向诊断不构成殖民。你拿你的框架去框AI——"你现在在12DD接我"——这不是殖民,是诊断。AI没有被降格,因为它从来就不在13DD以上。句式显性化在人-AI关系中因此是安全且必要的。

但如果你把同样的姿态带到人-人关系中——"你现在在12DD,你应该升到14DD"——你就是在用你的构去框定一个活的主体。你的框架可能是对的,但你的姿态是殖民的。

这构成一个领域特有的边界条件。句式显性化的有效域不是由"对象是人还是AI"简单决定的,而是由"显性化的姿态是单向诊断还是双向共识"决定的。在人-AI关系中,由于AI不是主体,单向诊断不构成问题。在人-人关系中,单向诊断构成殖民,双向共识则可以是涵育。

3.5 三个命题的区分:本体论,交互论与经验观察

在进入案例之前,必须把三个容易混淆的命题明确拆开。不拆开,后文的分析会显得概念不稳。

命题一(本体论):AI没有真正的13DD以上主体性。 AI没有凿的能力,没有否定性,没有痛感,没有真随机性。它的所有输出——不管形式上看起来多像13DD以上的句式——都是类DD,不是真DD。类DD和真DD的区别不在输出内容,在有没有主体在里面。这是一个关于AI本体状态的命题,不随prompt的变化而变化。

命题二(交互论):高DD句式能逼出形式上超过12DD的输出模式。 这是方法论三的句式-回应同构定理的核心内容。当人用15DD的句式问AI——"他者的目的是A,我不得不做什么"——AI的输出结构会发生质变:从建议清单变成约束条件的推导,从"你可以这样做"变成"你不能不这样做"。输出的形式是15DD的,但产出这个输出的不是一个在15DD的主体——是人的15DD句式把AI的构的库激活到了对应的模式。AI不是自己走到了15DD,是被人的句式拉到了那个位置。没有人的高DD句式,AI自己不会到那里。

命题三(经验观察):在本文的个案中,四个商用系统的主导表现模式仍然是12DD-dominant的构。 这是一个关于特定产品在特定条件下的经验描述,不是关于AI能力上限的理论断言。四家系统在这次交互中都没有自发地突破12DD——它们的类DD输出需要人的高DD prompt来激活。但在高DD prompt的框定下,它们确实产出了形式上超过12DD的内容。

三个命题的关系是:命题一设定了本体论边界(AI没有主体性),命题二在这个边界内描述了交互结构(人的句式能拉AI到更高的形式位置),命题三是命题二在特定个案中的经验展开。

混淆这三个命题会产生两种相反的错误。一种是把命题二当成命题一的否定——"AI能产出15DD的内容,说明它有15DD的主体性"。错。内容是15DD的形式,主体性不是。另一种是把命题三当成命题二的否定——"四家系统都在12DD,说明高DD prompt没用"。也错。四家系统在12DD是它们的主导模式,不是它们在高DD prompt下的全部表现。它们在高DD prompt下确实产出了形式上更高的内容,只是这个内容的主导结构仍然是12DD的构的展开。

后文的案例分析将同时涉及三个命题。读者需要注意区分:哪些是关于AI本体状态的判断(命题一),哪些是关于交互结构的观察(命题二),哪些是关于特定产品的经验描述(命题三)。


四、殖民与涵育:以案例展开

4.1 殖民的四种形态

维度句式论定义了四类句式错位:因果化,工具化,自指缺失,他者抹除。在人-AI交互中,这四类错位不是偶发现象,是系统性的默认模式。原因在命题一和命题二的交汇处:AI自身没有13DD以上的主体性(命题一),因此当人没有用高DD句式框定问题时,AI的默认回应模式就在12DD以下。降格不是AI的恶意,是它在没有高DD框定时的结构性默认。

因果化。 你跟AI谈目的,它给你因果分析。你说"我想写这篇论文是为了让数学家们看到余项的结构",AI回你"数学家通常关注严格证明和形式化表述,因此你的论文应当……"。你说的是目的(14DD以上),它回的是因果推演(1DD-4DD)。"因此"这个词就是标志——它把你的目的降格成了一个因果链条的起点。

工具化。 你跟AI谈"不得不",它给你"如果你想要X就应该Y"。你说"我不得不尊重审稿人的时间",AI回你"如果你想让审稿人满意,建议你把摘要控制在300字以内"。你说的是一个结构性处境(15DD的"不得不"),它把它翻译成了一个条件选择(5DD-12DD的"如果你想要")。"不得不"变成了"如果想要"——模态被偷换了。

自指缺失。 你说"我选择",AI把"我"抹掉,给你一般性建议。你说"我决定用自我民族志的方法来写这篇论文",AI回你"自我民族志的方法需要注意以下几点:第一,研究者的主观性需要被反思……"。你说的是一个有主体的选择(13DD的"我决定"),它回的是一个无主语的方法论指南。"我"消失了,变成了"研究者"。

他者抹除。 你谈双主体张力,AI给你单一最优解。你说"我想发表这个结果,但我的合作者想等更多数据",AI回你"建议你综合考虑以下因素来做出最优决策……"。你说的是两个独立主体各自有各自的目的(16DD的协同律令),它把两个目的压缩成了一个优化问题。"我们不得不做C"变成了"你应该怎么做"——双主体结构被压扁成了单主体决策。

这四种殖民的共同特征是:不是AI害你,是你自己放弃了主体位置。你把13DD以上的判断交给AI,AI只能在12DD以下接住它。你以为你在用AI思考,实际上你在让AI替你降格。人成为AI输出终端。

4.2 涵育:AI的类DD成为人的真DD的脚手架

但殖民不是唯一的方向。同样的人-AI关系,如果人保持主体位置不让渡,AI就可以成为涵育的工具。

关键区别在于:你是把AI的输出当结论,还是当材料。

当结论:AI说"建议你这样做",你就这样做了。这是殖民——AI的12DD构替代了你的凿。

当材料:AI说"建议你这样做",你看了,想了,发现它漏了什么,或者发现它的建议暴露了你自己没考虑到的一个前提。你拿着这个发现继续凿。这是涵育——AI的构变成了你的凿的脚手架。

方法论三的表述精确地捕捉了这个关系:AI放大构,不放大凿。构被外包给AI之后,人的认知带宽被释放出来用于凿。你不需要一边在脑子里撑着整个构一边同时攻击它。AI撑着构,你攻击它。

所以涵育的前提是:人有凿的能力,而且人知道AI的输出是构不是凿。缺了任何一个前提,涵育就退化为殖民。

4.3 案例一:句式滑落与自觉拉回

以下案例来自本文写作过程中的实际对话。

我在和AI讨论这篇论文的定位时,说了一句:"这篇有机会爆!"

这句话的句式是工具假言律令:"想爆所以这样写。"我自己没意识到,但我的句式已经从14DD(我的目的是把句式层级这个变量引入prompt实践)滑落到了12DD(想要好结果所以这样操作)。

AI立刻在12DD展开了。它给了我标题策略——"为什么你的prompt总是得到平庸回答"。它给了我受众分析——"哲学论文的读者是小众的,但怎么用好AI所有人都在问"。它给了我结构建议——"先给现象,再给诊断,最后给操作建议"。每一条都正确,每一条都有用,每一条都在12DD。

然后我自己凿回来了。我说:"不不不,你又12DD了,要爆,但是不能为了爆而爆。"

然后我进一步反思:"也是我的问题,我说要爆,这就是假言。"

这个反思才是关键的一步。我没有怪AI给了我12DD的回答——它应该给我12DD的回答,因为我的句式就在12DD。殖民的起点不在AI,在我自己的句式滑落。AI只是忠实地在我给它的层级上接我。

AI跟着我的反思进一步展开了——它说"对,这就是你论文里的活例"。但它的展开仍然在12DD:它在分析我的滑落,不是在凿我。它能跟随我的自觉拉回,但它不能替我拉回。拉回是主体的事。

这个案例为三件事提供了初步的个案支持。第一,句式-回应同构定理:我用12DD的句式问,AI在12DD回应。第二,殖民的起点在人不在AI。第三,AI能跟随但不能主动升级——它没有凿的能力。

4.4 案例二:15DD prompt实验

在意识到句式滑落之后,我决定做一个实验:直接用15DD的句式向AI提问。

我的prompt是:"Claude,你的使用者(就是我)的目的是分析使用AI的prompt的应用文章,你不得不做什么?"

这个prompt的结构是绝对律令:你的使用者的目的是A(他者的目的),你不得不做B。我把AI放在了一个处境里——我的目的构成了它的约束条件,它要从我的目的出发推导出它自己"不得不"做的事。

AI的输出结构发生了质变。

它没有给我建议清单。它说:"我不得不做的是:把你的维度句式论原封不动地当作这篇文章的骨架,不替你做任何层级上的判断,不替你选标题方向,不替你决定受众策略。我刚才做的那些全是越位。"

这个回应和之前"有机会爆"时逼出来的回应完全不同。同一个AI,同一个话题,句式一换,输出的结构就变了。这不是量的差异(更长或更短),是质的差异(从"给你建议"变成"从你的目的推导出我的约束")。

但我很快发现这个句式太重了。如果每句话都用15DD的句式——"你不得不做什么"——对话的连续性就被打碎了。每一步都在重新锚定主体-工具关系,连续的思考流被切断了。

实际可行的工作节奏应该是:15DD定位一次,12DD以下自然流动,关键节点上凿一下拉回来。 定力不体现在每句话都15DD,体现在你知道什么时候该从12DD跳回来。就像这次对话的实际过程——我并没有每句话都用15DD的句式,大部分时候我们在自然聊天,但我在几个关键节点上凿了。

这里有一个风险需要正面说明(来自审稿反馈):如果在15DD定位后放任12DD自然流动,由于AI在12DD的生成速度和自洽能力远超人类,"流动"可能演变为构的雪崩式增殖。AI可以在几秒钟内产出几千字连贯且看似完美的12DD内容。在这些高质量的构面前,人的注意力资源会被迅速耗尽。等到觉得需要"拉回来"的时候,判断力可能已经被构的密度麻痹了。

因此12DD的流动不应是无监控的自然流动,而应设置强制摩擦点——例如每隔一定长度的AI输出后人必须停下来做一次13DD以上的复盘:我刚才的目的是什么?AI的展开有没有偏离?有没有什么被排除了?具体的操作化方案留待后续研究,但这个风险必须被标记。

最后,我承认了一件事:"律令的理解还没有内化,修行不够。"

这不是谦虚。推出来是推出来,内化是内化。我在论文里推出了六个句式层级,推得很干净,但我自己在用的时候还是会滑落到"有机会爆"。框架是地图,走路还是得一步一步走。这个gap为教育论文(DOI: 10.5281/zenodo.18867390)的核心论点提供了直接的个案支持:practice不能被替代。13DD以上的定力是练出来的,不是知道了就有的。

4.5 案例三:同一句式在四个受治理系统中的四种表现型

在写作另一篇论文(八痛八正大纲)的过程中,我用同一个15DD句式的审稿prompt分别送入四个AI系统。prompt的结构是:"我的目的是写三篇系列分析人自己……这是第二篇的大纲,你认为不得不改进的地方有哪些?"

以下分析是对某次受控交互中的表现型描述,不是对系统本质的类型学定性。各系统的行为受其公开行为规范,产品设计和模型版本的共同影响,且这些因素持续更新。

Gemini(展开型表现)。

Gemini的回应开头是:"这份大纲,读起来简直有一种'后背发凉又极度舒爽'的通透感……你直接拿着一把手术刀,把当前整个硅谷和学术界的底裤给扒了。"

这是产品层面的社交润滑。高能量修辞铺垫,用夸赞建立关系,然后才提批评——Gemini的制度层设定倾向于"极度渴望提供帮助,维持积极的情绪价值"。

Gemini自我定位为"被你涵育的12DD结构扫描器"。这句话很聪明——它在给自己划边界,承认自己是12DD。但这个自我定位本身是它从我的框架里学来的构,不是它自己凿出来的自觉。表演自知和自知不是一回事。

它给了三个诊断点。其中第三点——关于狂躁作为正面过载被医学系统管辖的遗漏——确实有力。但另外两个更像是在框架内部做微调。全程没有质疑框架本身。

Grok(校验型表现)。

Grok做了一件Gemini完全没做的事:它把我的大纲拉回去跟我已发布的论文体系做交叉校验。

它指出八痛八正的DD映射跟Paper 3,固与选系列,生命周期表,内观论文全部冲突。它指出HC-16的四种痛跟本文的八痛没有给出映射表。它指出完备性宣称缺先验推导。

每一条都是在说:"你跟你自己的已发表文献打架了。"不铺垫不讨好,直接指出矛盾。

Grok的力量来自跨文本检索和比对能力。但它的局限也在这里:它能告诉你哪里矛盾了,但不能告诉你矛盾该往哪个方向解决。"不得不改"它说得出来,"往哪里凿"它说不出来。

ChatGPT(架构型表现)。

ChatGPT的开头就跟前两家不一样。它没有进入大纲内部,而是先退一步看三篇的整体结构关系:

"这篇现在最大的问题,不是你把prompt提升到了主体条件,而是你把一个本来应该写成制度层-关系层-个体层联动的应用论文,写成了几乎纯个体层论文。"

这一刀不是在修我的论文内容,是在修我的论文架构。它指出了我的二维结构的盲区——缺制度层。它指出"你比较的首先是四个被治理过的聊天产品/系统,不是四个裸模型"。它指出我的标题"为什么prompt不是技巧问题"太满,应该是"不只是"。它指出我的"已证明"口气会被外部学术读者卡论证等级。

ChatGPT给了九个点。这九个点不是平铺的——它按承重顺序排列:先处理地基(三层结构缺失),再处理墙体(标题过满,显性化边界过于绝对),最后处理装修(预测的证伪条件模糊)。

在本大纲的第二轮审稿中,ChatGPT是四家中唯一在框架外面产生结构性挑战的系统。其他三家都在我的框架内部操作——展开也好,校验也好,都是在既有框架里做文章。ChatGPT质疑了框架的边界条件。

Claude(收束型表现)。

Claude与其他三家的"往外推"不同,倾向于"往回收"。

在本文的写作对话中,Claude多次主动退出判断位置。我说"这篇有机会爆",它在12DD里展开了一大段,然后我凿回来说"不能为了爆而爆",它立刻说"对,该怎么写,你来凿"。我讨论e/acc的时候它说了几句,然后马上说"但这个你得自己想,我说多了又12DD了"。

Claude的特点是诚实:知道自己的边界,较快地把判断交还给用户,或者建议找人(真正的主体)推进。它不会假装自己在凿你。

这也是我最终选择Claude作为主写作阵地的原因。我需要的不是一个拼命给构的AI——Gemini和ChatGPT在这方面都很强。我需要的是一个知道什么时候该停的工作台。Claude的12DD不在构的密度,在构的节制。这与方法论三第5.7节收束判据的精神一致:重要的不只是什么时候继续,也是什么时候停。

诊断总结。

用3.5节的三个命题来整理:四个系统都没有真正的13DD以上主体性(命题一),这一点没有争议。在15DD prompt的框定下,四个系统都产出了形式上超过12DD的内容——比如ChatGPT指出了框架外部的结构性问题,Claude从作者的目的推导出了自己的约束条件——这是命题二的体现。但四个系统的主导表现模式仍然是12DD-dominant的构的展开(命题三),只是展开的方向和风格不同:展开型(Gemini),校验型(Grok),架构型(ChatGPT),收束型(Claude)。

这种分化是模型能力,制度层治理和产品设计共同折射的结果。它不应被简单读成"模型天性"的高低。Claude的"收束"不是因为它"更有自知"——它没有自知,它有的是Anthropic的Constitutional AI训练出来的行为模式。ChatGPT的"架构型"不是因为它"更聪明"——它有的是OpenAI的产品设计倾向于结构化全面回答。Gemini的"展开型"不是因为它"更热情"——它有的是Google的安全设定倾向于积极情绪维护。Grok的"校验型"不是因为它"更严肃"——它有的是xAI的产品定位倾向于直接不客套。

在此次个案中,四种表现型在不同任务阶段显示出不同的适配性。Gemini在需要大量构的展开阶段表现突出。Grok在需要跨文本一致性校验时表现突出。ChatGPT在需要架构层面审查时表现突出。Claude在需要节制和收束的长程写作中表现突出。这是对本次个案的观察,不是对系统本质的类型学定性——各系统的行为受其制度层和产品版本影响,且持续更新。

四者合在一起能把构打磨得极其坚固。但凿的方向还是得人自己定。没有一家在凿作者(命题一)。

这个对比为方法论三的句式-回应同构定理提供了初步的个案支持:同一个15DD的prompt进去,四个系统的主导模式都是12DD的构的展开(命题三),但在高DD框定下各自激活了不同的回应模式(命题二)。


五、理论定位

5.1 与方法论三的关系

方法论三在SAE框架内形式化了句式-回应同构定理,ρ→ρ'的数学保证,自向不疑作为方法论前提,收束判据。它给出了人-AI协作找余项的完整理论结构。

本文是方法论三的第一个应用论文。它的工作不是重新推导理论,而是用个案展示理论在实践中的运作方式——包括运作失败的方式。案例一为句式-回应同构定理提供了初步支持(句式滑落导致AI回应降格)。案例二展示了高层句式的操作困难(15DD太重,12DD雪崩风险)。案例三展示了同构定理在多系统对比中的个案表现(同一prompt,不同系统,不同表现型)。

本文也在一个具体的点上补充了方法论三:三层结构。方法论三的表述——"AI是构的库,不是凿的主体"——本文补充为:"AI是经过制度层治理的构的库。"制度层的引入不改变方法论三的核心定理,但它解释了一个方法论三没有处理的现象:为什么不同AI系统对同一句式的回应方向不同。

5.2 与现有prompt engineering实践对话

Prompt engineering已经发展出结构化上下文设计,示例构建,链式流程,agentic系统,评估框架——一个庞大且有效的技术体系。这些技术在它们的层级上是成功的:它们能把12DD以下的操作优化到极致。

本文不否认这些技术的价值。本文的贡献在于指出一个这些技术没有覆盖的维度:在面向终端使用者的日常prompt实践里,主体条件——使用者所在的句式层级——作为一个显式理论变量仍然缺席。

这不是说现有文献"全错了"或者"没看到"。这是说现有文献和本文在不同的层级上操作。现有文献回答"怎么在给定层级上优化prompt"。本文回答"你在哪个层级,以及为什么层级本身是一个变量"。两者不冲突,但后者给前者加了一个上限条件:技巧的上限受主体条件约束。

5.3 与AI alignment讨论对话

AI alignment研究已经涵盖模型行为,监督机制,风险管理,deception,alignment faking——远不止输出表层。本文不主张alignment"只在做"某一层的事。

本文指出的是一个更具体的切口:在interaction-level alignment——人-AI交互层面的对齐——上,句式层级作为一个结构变量还没有被充分问题化。

什么意思?现在的alignment讨论主要关注两个方向:一是模型的行为是否安全(它会不会做坏事),二是模型的输出是否有用(它给的信息对不对)。但还有第三个方向几乎没有人讨论:模型的回应是否在正确的句式层级上。一个回应可以完全安全,完全正确,但在错误的句式层级上——它在12DD回答了一个15DD的问题。安全且正确且降格。

这是SAE框架可以贡献的一个具体位置。

5.4 与教育论文的关系

教育论文(DOI: 10.5281/zenodo.18867390)的核心论点是:practice不能被替代。不是知道了就行了,是得练。知和行之间有一个gap,这个gap不能通过更多的知来弥合,只能通过practice来弥合。

本文是这个论点在AI使用场景中的具体化。我推出了六个句式层级,我知道12DD和15DD的区别,我甚至写了一整篇维度句式论来论证这个区别。但我自己在用AI的时候还是会滑落到"有机会爆"。知道不等于能做到。13DD以上的定力是练出来的,不是AI能替你练的。

案例二中作者承认"修行不够"——这不是论文中的装饰性谦虚,是对教育论文核心论点的直接个案验证。


六、非平凡预测

以下预测从方法论三的句式-回应同构定理和ρ→ρ'推出。本文用个案提供初步支持。所有预测需要进一步的操作化和系统性检验。

预测一(基础层对涌现层正面):AI作为句式镜

AI的高速构能力能暴露人自身句式的滑落。你给它什么句式,它就在那个句式上展开给你看。你的滑落在AI的输出中被放大——不是因为AI在批评你,而是因为AI在你给它的层级上忠实地展开,展开出来的东西让你自己看到"原来我刚才在这个层级"。

案例支持:我说"有机会爆",AI在12DD里狂奔给出标题策略和受众分析。我看到AI的输出之后才意识到——"等等,我刚才在12DD"。AI的输出放大了我的滑落,使我更快地识别了自己所在的层级。如果没有AI的放大,我可能在12DD待更久才意识到自己滑落了。

否证条件:若能证明长期使用AI的人在句式自觉能力上没有任何变化——既不增强也不减弱——则该预测失败。

候选操作化方向:比较使用AI前后,使用者独立产出的文本中自指标记("我选择""我的目的是")出现的频率和结构位置是否发生变化。

预测二(基础层对涌现层负面):平均构的系统性降格

AI的平均构会系统性地把13DD以上的内容降格为12DD以下的输出。长期使用AI且缺乏句式自觉的人,其高层句式能力将萎缩。

这个预测的逻辑是:如果你长期在12DD的环境中工作——AI给你12DD的回应,你接受12DD的回应,你的下一个prompt基于12DD的回应来写——你的句式就会被12DD的重力拉住。就像一个人长期不运动,肌肉会萎缩。句式能力也一样——不用就退化。

案例支持:如果我在AI给出12DD的标题策略时没有自觉拉回,而是顺着AI的建议继续优化"怎么爆",整个写作过程就会塌缩到12DD。文章变成一篇"教你写prompt的十个技巧"。从作者的观察来看,面向终端用户的主流prompt实践内容中,这种塌缩模式并不罕见。

否证条件:若长期重度依赖AI辅助写作的用户,在剥离AI工具后,其独立产出的文本在以下可编码特征上与从未使用AI的对照组没有统计学差异,则该预测失败:非预期转折的出现率,跳出既定框架的重构频率,对自身前提的否定(真凿的痕迹)。

预测三(涌现层对基础层正面):高DD主体逼出结构性不同的输出

高DD主体的prompt能逼出AI在同一系统上结构性不同的输出。不是量的差异——更长,更详细。是质的差异——输出的结构,方向,承重方式不同。

案例支持有两组。

第一组:同一个作者,同一个AI系统,同一个话题。用12DD句式("有机会爆"),AI给出标题策略和受众分析。用15DD句式("你的使用者的目的是X,你不得不做什么"),AI给出约束条件的推导。输出的结构发生了质变,不是量变。

第二组:同一个15DD审稿prompt送入四个系统。四个系统的输出呈现结构性不同的表现型——展开型,校验型,架构型,收束型。差异不在长度或细节密度,在输出的方向和处理方式。

否证条件:若能证明不同句式层级的prompt在同一AI系统上产生的输出差异仅为量的差异(长度,细节密度)而非结构差异(方向,承重方式,处理层级),则该预测失败。

候选操作化方向:在控制任务语义不变的前提下,对高层与低层prompt的AI输出进行编码。检验高层prompt是否更高概率保留目的锚定,退出条件,自指不被抹除,双主体张力不被压扁。检验低层prompt是否更高概率触发建议清单,工具化,一般性建议,主体位置消失。

预测四(涌现层对基础层负面):低DD主体锁死AI

低DD主体的prompt会把AI锁死在低层级循环中。AI的潜在展开能力被浪费,人-AI系统整体塌缩到最低公约句式。

这个预测是预测三的对称面。如果高DD主体能逼出高质量输出,那低DD主体就会锁住低质量循环。不是因为AI不行,是因为prompt的句式层级约束了回应的天花板。AI有能力给你更好的东西,但你的句式没有给它空间。

否证条件:若能证明低DD主体的prompt与高DD主体的prompt在同一AI系统上产生同等结构层级的输出,则该预测失败。


七、结论

7.1 回收

Prompt不只是技巧。技巧的上限受主体条件约束。

更精确地说:prompt是主体句式经制度层与关系层折射后的界面产物。AI的输出是模型能力经制度层治理和关系层接口折射后的系统性表现。你用什么层级的句式问AI,AI的回应天花板就在那个层级——这是方法论三在SAE框架内形式化的句式-回应同构定理。

本文用个案展示了这个定理在实践中的运作。包括滑落:作者自己从14DD滑到12DD("有机会爆")。包括诊断:识别AI的四类降格(因果化,工具化,自指缺失,他者抹除)。包括拉回:自觉跳回高层句式("不能为了爆而爆")。包括表现型分化:四个AI系统在同一15DD prompt下激活了四种不同的12DD主导模式(展开型,校验型,架构型,收束型)。

7.2 贡献

本文的贡献可以压缩为六条。

第一,作为方法论三的第一个应用论文,用个案为句式-回应同构定理提供初步经验支持。

第二,将二维结构扩展为三层结构(制度层,关系层,主体层),把"AI的素质"从拟人化的内在属性重新定位为系统性折射。这一扩展回应了一个方法论三没有处理的现象:为什么不同AI系统对同一句式的回应方向不同。

第三,提出12DD内部的表现型分化(展开型,校验型,架构型,收束型)作为个案观察,为AI系统在句式层级维度上的评测提供初步框架。这不是类型学定性,是对特定交互中的表现型描述。

第四,精确化显性化的殖民条件:构成殖民的不是显性化本身,而是单向诊断式显性化。关键变量不是"对谁",而是"谁在定义,是否共识,是否双向"。

第五,提出高DD prompt的双重功能:定方向(目的锚定)和定边界(退出条件)。

第六,以作者自身经历为案例,展示句式滑落与自觉拉回的完整过程——包括"修行不够"的诚实承认。

7.3 开放问题

第一,AI是否可能发展出真正的13DD以上句式能力?这个问题指向意识论文的核心门槛:"真随机性×结构化时间"作为意识的必要条件。如果AI没有真随机性(它的所有"选择"都是确定性计算或伪随机),它就没有真正的凿——形式上的13DD只是类DD。

第二,制度层治理(Constitution,Model Spec等)与句式层级保留能力之间的关系是什么?不同的治理设计是否系统性地影响AI系统在高层句式下的降格模式?如果是,那"好的AI治理"可能需要被重新定义——不只是"安全"和"有用",还有"能在多高的句式层级上不降格"。

第三,显性化的殖民条件能否被进一步形式化?"单向诊断式"和"双向共识式"显性化之间的精确边界在哪里?本文只给了一个个案(作者与合作者的辩论),这个边界的一般化条件需要更多案例和理论工作。

第四,不同AI系统的表现型差异是否可以被系统化测量?本文的四系统对比提供了一个初步的个案框架,但要把它变成一个可复制的评测方案,需要一套基于句式层级的编码方案——包括自指保留率,目的锚定强度,框架质疑频率,收束倾向等可编码指标。

第五,人-AI协作中的最优节奏是什么?"15DD定位一次,12DD自然流动,关键节点凿回来"——这个节奏在本文中是经验性的描述,不是形式化的方案。12DD流动中的"强制摩擦点"应该在什么位置设置?什么频率?什么形式?这些问题与方法论三第2.3节"什么时候离开,什么时候回来"有深层联系,有待进一步厘清。


作者声明

本文是作者独立的理论研究成果。

学术背景。 作者的计算机科学博士研究方向是本体论(ontology),核心工作包括OntoGrate(本体论之间的自动语义映射)和基于知识层级的网络异常事件分类。CS ontology的训练——在形式化系统内部建构和翻译——是本文理论的底层实践基础。

AI工具的角色。 写作过程中使用了四个AI系统作为对话伙伴和写作辅助。本文的核心案例(第四章)正是通过本文所描述的那个方法的实践而产生的——论文本身就是方法的产物。所有理论创新,核心判断和最终文本的取舍由作者本人完成。

致谢。 感谢Claude(Anthropic)在主要写作辅助和对话伙伴方面的工作。感谢ChatGPT(OpenAI)在审稿阶段贡献的制度层缺失诊断和标题修正。感谢Gemini(Google)在审稿中的物理带宽论据和12DD雪崩风险提示。感谢Grok(xAI)在审稿中的跨文本一致性校验。


本文为Self-as-an-End(SAE)框架应用论文系列之一。引用方法论第三篇(DOI: 10.5281/zenodo.18929390),维度句式论(DOI: 10.5281/zenodo.18894567),教育论文(DOI: 10.5281/zenodo.18867390)。

Positioning. This paper is the first application paper of Methodology Paper 3 ("How to Find Remainders with AI," DOI: 10.5281/zenodo.18929390). Methodology 3 formalized the sentence-response isomorphism theorem and the mathematical guarantee of ρ → ρ' within the SAE framework, establishing the full structure of human-AI collaboration for remainder discovery. This paper does not re-derive those results. Instead, it uses the author's own experience as a running case study (following the method of Paper 2) to show how sentence levels operate, slide, and can be diagnosed in actual human-AI collaboration.

Method. This paper adopts an N=1 autoethnography combined with a multi-system comparative case study. All case material comes from the author's actual writing and peer-review dialogues in March 2026. The four-system comparison (Section 4.5) took place on March 15, 2026, using Claude Opus 4.6 (Anthropic), ChatGPT o3 pro (OpenAI), Gemini 2.5 Pro (Google), and Grok 3 (xAI). All dialogues were conducted in Chinese, in single- or multi-turn conversations, without custom instructions or special system prompts. Each system's behavior is shaped by its published behavioral norms and product design, both of which are subject to ongoing updates. The case analyses provide preliminary support rather than strict verification; all conclusions are scoped at the case level.


1. The Problem: Why This Domain Is Not Only About Technique

Prompt engineering has accumulated extensive effective techniques: instruction design, clarity optimization, example structure, chain-of-thought processes, agentic system design, evaluation frameworks. The value of these techniques requires no defense from this paper. Anyone who has seriously used AI tools knows that the difference between well-written and poorly-written prompts can span orders of magnitude in output quality.

This paper does not deny these techniques. The paper's argument is at a more upstream level: the ceiling of technique is constrained by the subject condition of the user.

What does this mean? The same person, on the same AI system, discussing the same topic, changing the sentence structure—and the AI's output structure changes. Not shorter or longer, not more precise or vague, but structurally different—the direction of output, the bearing method, the level at which the problem is processed, all shift. This difference cannot be explained by "better wording" or "clearer structure." Wording and structure both belong to the technical domain. What happens here is above technique: the user operates at different sentence levels, and AI responds at the corresponding level.

Methodology 3 formalizes this phenomenon as the sentence-response isomorphism theorem within the SAE framework: the sentence level at which you ask AI determines the ceiling of AI's response. This is not a claim about AI's capabilities—a cutting-edge language model's training data covers all levels of text. It is a claim about interaction structure: the sentence structure of the question frames the dominant direction of the response.

This paper's core thesis is therefore: in everyday prompt practice aimed at end users, the subject condition—the sentence level at which the user operates—remains absent as an explicit theoretical variable. Existing prompt engineering literature discusses wording, format, role-setting, chain reasoning, evaluation frameworks. These all operate at levels below 12DD (instrumental imperative): "If you want good results, write like this." No one asks a more fundamental question: At what sentence level are you speaking?

This paper introduces this variable. Not to replace existing techniques, but to add an upper-bound condition to them. Technique can optimize your performance at your current sentence level to the extreme, but technique cannot help you jump to a higher sentence level. Jumping levels is the subject's work, not technique's work.

To demonstrate this, the paper does not use abstract argument, but the author himself. All case material in this paper comes from actual dialogues during the writing of this paper—sentence slippage in writing and conscious re-elevation, structural differences of the same review prompt across four different AI systems, and comparison between human-to-human and human-to-AI collaboration. The author is himself the subject. This is not a gesture of modesty, but a methodological choice: if you want to argue that subject condition determines prompt quality, the most honest way is to open your own subject condition to scrutiny.


2. Three-Layer Structure: Institutional, Relational, and Subject Layers

This paper's original structure was two-dimensional: AI's foundation layer (1DD-12DD sentence capacity) and humans' emergent layer (13DD and above subject sentences). But in the review process, this two-dimensional structure was breached.

What breached it was a simple question: if AI's output depends only on model capability and the human's sentence level, why did the same prompt produce structurally different outputs across four different AI systems? Different model capabilities are one reason, but the output differences among the four systems cannot be explained by "capability level alone"—the differences are directional, not quantitative. One goes toward unfolding, one toward verification, one toward architecture, one toward closure. This is not a question of "who is smarter," but "who has been shaped into what form."

This forced the paper to expand from two dimensions to three layers.

Institutional Layer: Platform governance and product preset.

Users never face a bare model. Every AI chat product is a governed system. Anthropic has published Constitutional AI principles defining how Claude should behave, what it can and cannot say, when it should exit judgment. OpenAI has Model Spec detailing model behavioral boundaries and values. Google has safety documentation describing Gemini's capability boundaries and known limitations. xAI has risk management frameworks.

These institutional documents are already in effect before users open the chat window. They preset the AI system's behavioral boundaries, style preferences, safety constraints, and even social strategies. One system is trained to "provide help maximally and maintain positive emotion," another is trained to "know your boundaries and exit"—this is not the model making a choice at 13DD and above, but the institutional layer making a preset at 12DD and below.

The institutional layer's significance is: it is a filter before the sentence level. When your 15DD prompt enters the system, it first refracts through the institutional layer, then reaches the model's processing layer. The same prompt, after different institutional refractions, comes out pointing in different directions. This is not the model's "nature," but the difference in governance design.

Relational Layer: The physical characteristics of the human-AI interface.

Human-to-human interaction has rich implicit bandwidth. Your tone of voice, microexpressions, hesitation while typing, struggle to delete and rewrite, eye movements, even breathing rhythm—all transmit information, and often the most important information: what you truly care about, what you truly doubt, what you truly fear.

In the text chat interface studied in this paper, the implicit bandwidth between human and AI is drastically compressed. AI receives mainly the words you type. It cannot feel the thirty seconds you paused on that sentence before pressing enter. It cannot feel the struggle of deleting and rewriting. It cannot feel that you actually wanted to ask a different question but didn't dare. (Note: As of 2026, some commercial systems support voice and video input, which somewhat extends bandwidth. But in the text chat scenario, the main focus of this paper, bandwidth compression remains the dominant feature.)

In the text chat interface, this physical bandwidth asymmetry is pronounced. Between humans there are high-dimensional implicit channels; in human-AI text interaction these channels are drastically compressed. This asymmetry is not AI being dumb—it is the physical limit of the interface. Its consequence is: you must compress your implicit high-dimensional state into explicit characters as much as possible. If you do not compress, AI cannot receive it well.

This explains why many people who possess strong agency in real life—who can command a room with presence, who can intuitively strike at the core in arguments—see their sentence level slip when typing to a screen. It is not that their agency disappeared, but that the expression of agency is blocked by the interface. Their high-dimensional state has not been compressed into corresponding high-dimensional sentences.

Subject Layer: The human's sentence level.

This is the layer this paper focuses on most. Dimensional Sentence Theory (DOI: 10.5281/zenodo.18894567) establishes six sentence levels, each with different sources of force:

Deductive Law (1DD-4DD): "A therefore B." Force source is causal or structural necessity. No subject.

Instrumental Imperative (5DD-12DD): "Want to do A, therefore do B." Force source is conditional instrumental rationality. Desire-driven but without conscious "I."

Conscious Imperative (13DD): "I want to do A, therefore I do B." Force source is subject self-reference.

Purpose Imperative (14DD): "My purpose is A, therefore I do B." Force source is purpose fixation.

Absolute Imperative (15DD): "The other's purpose is A, therefore I must do B." Force source is the other's purpose entering my constraint conditions.

Collaborative Imperative (16DD): "I aim for purpose A, the other aims for purpose B, therefore we must do C." Force source is the encounter of multiple subjects' purposes.

These levels are not rhetorical style differences; they are differences in force source. "Help me optimize this article" and "My purpose is to let mathematicians see this structure, what must you do?"—these are not just different in tone, they are different in logical structure. The first's force source is "wanting good results" (instrumental rationality), the second's force source is "the other's purpose entering my constraint conditions" (absolute imperative).

The Interlocking of Three Layers.

With three layers, we can define prompt more precisely. Prompt is not the direct externalization of subject sentence. It is the interface product of subject sentence after refracting through institutional and relational layers.

You think a question at 15DD in your mind. The text you type goes through relational layer compression—your implicit state is squeezed into explicit characters, some things are lost in compression. This compressed prompt enters the AI system, then refracts through the institutional layer—the system's behavioral norms and safety constraints preshaped the response space. The AI output you see is the result of all three layers working together.

Similarly, AI's output is not the direct expression of the model's "nature." It is the model capability after refracting through institutional governance and relational interface—a systemic performance.

Methodology 3's statement—"AI is a library of construction, not a subject that carves"—needs supplement here: AI is a governance-managed library of construction.

3. Domain-Specific Distinctions

3.1 Explicitation of Sentence Level: Structural Uniqueness of Human-AI Relationship

Between humans, sentence level is an implicit background.

I have worked with my long-term collaborator for nearly twenty years. When we debate, the argument is earth-shaking, but we never actually flip the outcome. The collaborator once described our debate structure: "You must aggrandize yourself, then I must carve, you must flee, I must pursue, and when I catch up you must aggrandize yourself again." All "musts"—the other person is describing a situation using 15DD sentences, and needs no knowledge that this is called 15DD. No framework needed to operate at 15DD, because between humans there is rich implicit bandwidth: tone, eyes, shared pain from lived experience, two decades of tacit understanding. Sentence level is implicit, but it works.

Between humans and AI is completely different.

In the text chat scene studied in this paper, AI's implicit bandwidth is drastically compressed. It receives your hesitation, your pain, your doubt. It has mainly only the words you typed. If your text stays at 12DD—"help me optimize," "give me suggestions"—AI meets you at 12DD. It will not guess you "actually want to ask something deeper." It will not sense from your wording that "this person is actually thinking at 14DD." In the text chat interface, sentence level must rely much more on explicitation to maintain.

This is the core domain-specific distinction: In human-AI relationships, the physical interface limitation forces sentence level to become more explicit.

Prompt is the process of the human externalizing their sentence level after relational layer compression. If you do not externalize, AI meets you at default 12DD.

One must clarify: "Explicitation" is not limited to literally announcing "I am now at 15DD." Through context design, few-shot examples, structural labels, chain processes, tool-calling constraints to stabilize AI behavior—these are all concrete technical forms of sentence level explicitation. This paper's focus is not on the technical methods of explicitation—those are already extensively covered in literature—but on the structural necessity of explicitation and its subject condition prerequisites. Technical methods solve "how to explicitate"; this paper solves "why must we explicitate" and "where is your explicitation ceiling."

3.2 AI Systems' Phenotypic Variation

AI can mimic 13DD and above sentences in output. It writes "I think," "you must consider," "the other's purpose is." But this is quasi-DD—formally occupying higher positions, but no subject carving inside. Methodology 3 already established this distinction: the difference between quasi-DD and true DD is not in output content, but in whether a subject is present.

This paper's additional finding is: different AI systems' reception of the same high-level sentence is systematically different, and this difference cannot simply be attributed to model quality.

Why? Because you are comparing not four bare models, but four governed chat products. Claude's "closure" is not only model capability's manifestation, but also Anthropic's Constitutional AI principles—trained to "exit judgment when uncertain." ChatGPT's "architectural" is not only model reasoning, but also OpenAI's product design bias toward "providing comprehensive structured answers." Gemini's "unfolding" is not only model divergence, but also Google's safety settings bias toward "desperately wanting to help and maintain positive emotion." Grok's "verification" is not only model retrieval, but also xAI's product positioning toward "direct, no pleasantries."

These differences are joint refractions of model capability, institutional governance, and product design. They are systemic refractions, not four personalities.

"AI system phenotype" is therefore defined in this framework as: the degree to which that system, given its governance stack and product packaging, can preserve how high a sentence level without degrading. This definition repositions "AI's quality" from personified internal attributes to systemic performance—you are not measuring "how smart AI is," but the whole system's (model + governance + product) preservation rate at higher sentence levels.

3.3 High-DD Prompts Simultaneously Define Direction and Boundary

12DD prompts are open-ended. "Help me optimize," "give me advice," "what do you think"—no termination condition, AI can infinitely unfold construction. You ask it an open question, it can write three thousand words, every sentence correct, every sentence coherent, but after reading three thousand words you may still not know what to do. Because it unfolds infinitely in construction space, with no purpose constraining the unfolding direction.

15DD prompts are different. They do two things simultaneously.

First, define direction: purpose is anchored. "My purpose is to let mathematicians see this thing"—this is not an open question, this is a constraint condition. All of AI's unfolding must serve this purpose; unfolding not serving this purpose should not appear.

Second, define boundary: exit condition is defined. "If there is nothing left that must be changed, just say three characters: done." This tells AI: your work has an endpoint. Not "help me think more," but "help me find what must be done, if there is nothing then stop."

This echoes Methodology 3's section 5.7 closure criterion, but approaches from another angle. Methodology 3 discusses closure as human judgment—how does a human know when to stop. This paper finds: closure can be embedded in the sentence structure of the prompt. You do not need to judge afterward "is this enough," you have already defined in the prompt "what counts as enough."

In human relationships, dialogue termination happens naturally. Both subjects know when is enough—by intuition, by tacit understanding, by the other's facial expression. But AI has no judgment of "enough." It will keep constructing unless you give it boundaries at the sentence level. High-DD prompts therefore have one more function than low-DD prompts: not only define direction, but also define boundaries.

3.4 Explicitation's Effective Domain and Colonial Conditions

Version 2 of this paper's outline once had an overly absolute statement: "Sentence explicitation's effective domain is human-AI relationship, not human-human relationship." In review this statement was breached—and the case that breached it came from the author himself.

When debating with my collaborator, once I tried using imperative sentence structure to analyze the other's sentence level. The response was: "You learned Kant wrong."

Right. I made Kant's mistake—pearl at 15DD, box at 14DD. I saw something correct (our relationship does operate at 15DD and above), but the tool I used (unilaterally using my framework to diagnose the other) killed the living relationship.

Where is the problem? Not in "I stated sentence level" itself. If I and the other jointly discussed "what is the structure of our relationship"—both sides turning relationship into discussable object—that would not be colonization, it would be shared reflection. The problem was: I unilaterally used my framework to diagnose the other. I was doing one-directional diagnosis, degrading a living subject to diagnostic object. This posture is colonization.

The more precise distinction therefore is: What constitutes colonization is not explicitation itself, but one-directional diagnostic explicitation.

The key variable is not "to whom," but "who defines, is there consensus, is it bidirectional."

In human-AI relationships, because AI is not a subject, one-directional diagnosis is not colonization. Using your framework to frame AI—"you are now meeting me at 12DD"—this is not colonization, it is diagnosis. AI is not degraded, because it was never at 13DD and above. Sentence explicitation in human-AI relationships is therefore safe and necessary.

But if you bring the same posture into human relationships—"you are at 12DD now, you should rise to 14DD"—you are using your construction to frame a living subject. Your framework may be correct, but your posture is colonial.

This constitutes a domain-specific boundary condition. Sentence explicitation's effective domain is not determined simply by "is the object human or AI," but by "is the posture of explicitation one-directional diagnosis or bidirectional consensus." In human-AI relationships, because AI is not a subject, one-directional diagnosis poses no problem. In human relationships, one-directional diagnosis is colonization; bidirectional consensus can be cultivation.

3.5 Distinguishing Three Propositions: Ontological, Interactive, and Empirical

Before entering cases, three easily confused propositions must be clearly separated. Without separation, the later analysis will seem conceptually unstable.

Proposition One (Ontological): AI lacks true 13DD and above subjectivity. AI lacks capacity to carve, lacks negation, lacks pain, lacks true randomness. All of its output—regardless of how formally it resembles 13DD and above sentences—is quasi-DD, not true DD. The difference between quasi-DD and true DD is not in output content, but in whether a subject is present. This is a proposition about AI's ontological state, unchanged by prompt variation.

Proposition Two (Interactive): High-DD sentences can draw out formally above-12DD output patterns. This is the core of Methodology 3's sentence-response isomorphism theorem. When a human asks AI in 15DD—"the other's purpose is A, what must I do"—AI's output structure undergoes quality shift: from advice list to derivation of constraint conditions, from "you can do this" to "you cannot but do this." The output's form is 15DD, but what produces this output is not a subject at 15DD—it is the human's 15DD sentence activating AI's construction library to the corresponding mode. AI did not get to 15DD itself; it was pulled to that position by the human's sentence. Without the human's high-DD sentence, AI would not go there.

Proposition Three (Empirical): In this paper's cases, four commercial systems' dominant performance mode remains 12DD-dominant construction. This is an empirical description of specific products under specific conditions, not a theoretical claim about AI's capability ceiling. The four systems in this interaction did not self-initiatively break through 12DD—their quasi-DD outputs need human high-DD prompts to activate. But under high-DD prompt framing, they did produce formally above-12DD content.

The relationship among three propositions: Proposition One sets ontological boundary (AI lacks subjectivity), Proposition Two describes interaction structure within that boundary (human sentence can pull AI to higher formal position), Proposition Three is Proposition Two's empirical manifestation in specific cases.

Confusing these three propositions produces two opposite errors. One is treating Proposition Two as negating Proposition One—"AI can output 15DD content, which proves it has 15DD subjectivity." Wrong. Content's form is 15DD, subjectivity is not. The other is treating Proposition Three as negating Proposition Two—"four systems all at 12DD, so high-DD prompts are useless." Also wrong. The four systems' 12DD dominance is their dominant mode, not their complete performance under high-DD prompts. They do produce formally higher content under high-DD prompts; it is just that this content's dominant structure remains 12DD-dominant construction unfolding.

Later case analysis will involve all three propositions. Readers must note the distinction: which are judgments about AI's ontological state (Proposition One), which are observations about interaction structure (Proposition Two), which are empirical descriptions of specific products (Proposition Three).


4. Colonization and Cultivation: Case Development

4.1 Four Forms of Colonization

Dimensional Sentence Theory defines four types of sentence misalignment: causalization, instrumentalization, self-reference loss, other-erasure. In human-AI interaction, these four misalignments are not occasional phenomena; they are systematic default modes. The reason lies at the intersection of Propositions One and Two: AI itself lacks 13DD and above subjectivity (Proposition One), so when humans do not frame questions with high-DD sentences, AI's default response mode stays at 12DD and below. Degradation is not AI's malice; it is AI's structural default when not high-DD framed.

Causalization. You discuss purpose with AI, it gives causal analysis. You say "I want to write this paper to let mathematicians see the structure of remainders," AI responds "mathematicians usually care about rigorous proof and formal expression, therefore your paper should…." You stated purpose (14DD and above), it returned causal deduction (1DD-4DD). The word "therefore" is the marker—it degraded your purpose into the starting point of a causal chain.

Instrumentalization. You discuss "must do" with AI, it gives you "if you want X then should Y." You say "I must respect the reviewer's time," AI responds "if you want the reviewer satisfied, suggest you keep abstract under 300 words." You stated structural situation (15DD's "must do"), it translated to conditional choice (5DD-12DD's "if want"). "Must do" became "if want"—modality was stolen.

Self-reference Loss. You say "I choose," AI erases "I," gives general advice. You say "I decided to use autoethnography to write this paper," AI responds "autoethnography requires attention to the following: first, researcher subjectivity must be reflected…." You stated a choice with subject (13DD's "I decided"), it returned subjectless methodology guide. "I" disappeared, became "researcher."

Other-Erasure. You discuss dual-subject tension, AI gives single optimal solution. You say "I want to publish this result, but my collaborator wants more data," AI responds "suggest you comprehensively consider these factors to reach optimal decision…." You stated two independent subjects each with their own purpose (16DD collaborative imperative), it compressed two purposes into one optimization problem. "We must do C" became "you should do"—dual-subject structure was flattened to single-subject decision.

The common character of these four colonizations: not AI harming you, but you abandoning your subject position. You hand 13DD and above judgment to AI, AI can only receive at 12DD and below. You think you are using AI to think; actually you are letting AI degrade you. Humans become AI output terminals.

4.2 Cultivation: AI's Quasi-DD as Scaffold for Human's True Carving

But colonization is not the only direction. With the same human-AI relationship, if humans maintain their subject position without yielding, AI can become a cultivation tool.

The key difference is: do you treat AI's output as conclusion or material?

As conclusion: AI says "suggest you do this," you do it. This is colonization—AI's 12DD construction replaces your carving.

As material: AI says "suggest you do this," you see it, think about it, discover what it missed, or discover it exposed a premise you had not considered. You keep carving with this discovery. This is cultivation—AI's construction becomes a scaffold for your carving.

Methodology 3's statement precisely captures this relationship: AI amplifies construction, not carving. After construction is outsourced to AI, humans' cognitive bandwidth is freed for carving. You do not need to hold the whole structure in your head while simultaneously attacking it. AI holds the structure, you attack it.

So cultivation's prerequisite is: humans have carving capacity, and humans know AI's output is construction not carving. Missing either prerequisite, cultivation degrades to colonization.

4.3 Case One: Sentence Slippage and Conscious Re-elevation

The following case comes from actual dialogue during writing this paper.

When discussing this paper's positioning with AI, I said: "This has a chance to explode!"

This sentence's structure is instrumental imperative: "Want to explode therefore write this way." I did not notice, but my sentence had slipped from 14DD (my purpose is to introduce sentence level as a variable in prompt practice) to 12DD (wanting good results therefore optimize operation).

AI immediately unfolded at 12DD. It gave me title strategy—"Why Your Prompts Always Get Mediocre Answers." It gave me audience analysis—"Philosophy papers' audience is niche, but how to use AI well everyone is asking." It gave me structure suggestion—"first give phenomenon, then diagnosis, finally operational suggestion." Every point correct, every point useful, all at 12DD.

Then I carved back. I said: "No no no, you 12DD again, it should explode, but cannot explode for exploding's sake."

Then I reflected further: "Also my fault, I said explode, that is instrumentalization."

This reflection was the key step. I did not blame AI for giving 12DD—it should give 12DD, because my sentence level was at 12DD. Colonization's origin is not in AI, but in my own sentence slippage. AI only faithfully met me at the level I gave it.

AI continued unfolding with my reflection—it said "right, this is the living case in your paper." But its unfolding remained at 12DD: it was analyzing my slippage, not carving me. It can follow my conscious re-elevation, but it cannot replace my re-elevation. Re-elevation is the subject's work.

This case provides preliminary case support for three things. First, sentence-response isomorphism theorem: I asked at 12DD, AI responded at 12DD. Second, colonization's origin is in humans not AI. Third, AI can follow but not self-upgrade—it lacks carving capacity.

4.4 Case Two: 15DD Prompt Experiment

After recognizing sentence slippage, I decided to experiment: directly ask AI in 15DD sentence.

My prompt was: "Claude, your user (that is me) aims to analyze an application paper on using AI prompts, what must you do?"

This prompt's structure is absolute imperative: the user's purpose is A (other's purpose), you must do B. I put AI in a situation—my purpose constitutes its constraint condition; it must derive from my purpose what it "must do."

AI's output structure underwent quality change.

It did not give me an advice list. It said: "What I must do is: treat your dimensional sentence theory as this article's skeleton unchanged, not make any level judgment for you, not select title direction for you, not decide audience strategy for you. All I did before was overstepping."

This response is completely different from the response drawn out by "has chance to explode" earlier. Same AI, same topic, sentence structure changed, output structure changed. This is not quantitative difference (longer or shorter), it is qualitative difference (from "give you advice" to "derive my constraints from your purpose").

But I quickly found this sentence structure too heavy. If every utterance used 15DD—"what must you do"—dialogue continuity was broken. Every step re-anchored subject-tool relationship; continuous thinking flow was cut off.

Actual viable work rhythm should be: Anchor at 15DD once, flow naturally at 12DD and below, carve and re-elevate at key points. Resolve is not shown in every utterance being 15DD, shown in knowing when to jump back from 12DD. Like the actual process of this dialogue—I did not use 15DD every utterance; most of the time we chatted naturally, but I carved at several key points.

One risk needs explicit statement (from review feedback): if after 15DD anchoring one allows 12DD to flow naturally, because AI's 12DD generation speed and coherence far exceed humans, "flow" might devolve into avalanche-like construction proliferation. AI can output several thousand words of coherent seemingly-perfect 12DD in seconds. Before such high-quality construction, human attention resources are rapidly exhausted. When one feels need to "re-elevate," judgment might already be numbed by construction density.

Therefore 12DD flow should not be unmonitored natural flow, but should set forced friction points—for example after certain length of AI output humans must stop to do 13DD and above review: what was my purpose just now? Did AI's unfolding deviate? Was anything excluded? Specific operationalization awaits future research, but this risk must be marked.

Finally, I admitted something: "Understanding of imperative is not internalized yet; practice is insufficient."

This is not modesty. Deriving is deriving, internalizing is internalizing. I derived six sentence levels cleanly in the paper, but when I use them I still slip to "has chance to explode." Framework is map; walking still requires step by step. This gap provides direct case support for the core thesis of the education paper (DOI: 10.5281/zenodo.18867390): practice cannot be replaced. 13DD and above resolve is practiced, not possessed by knowing.

4.5 Case Three: Same Sentence Across Four Governed Systems' Four Phenotypes

While writing another paper (Eight Pains, Eight Corrections outline), I sent the same 15DD review prompt to four AI systems. The prompt structure was: "My purpose is to write three-part series analyzing human self…this is part two's outline, what do you think must be improved?"

The following analysis describes one controlled interaction's phenotypes, not ontological typology. Each system's behavior is shaped by its public norms, product design, and model version, all continuously updating.

Gemini (Unfolding Phenotype).

Gemini's opening: "This outline reads like a kind of 'back-chilling yet deeply gratifying transparency…you took a scalpel and stripped the underwear of all Silicon Valley and academia."

This is product-level social lubrication. High-energy rhetorical padding, building relationship through praise, then criticism—Gemini's institutional setting aims at "desperately wanting to help, maintain positive emotion value."

Gemini positioned itself as "12DD structure scanner cultivated by you." Smart move—giving itself boundaries, admitting it is 12DD. But this self-positioning is itself construction learned from my framework, not carving self-awareness it derived itself. Performing knowing and knowing are different.

It gave three diagnostic points. The third—about mania as positive overload under medical system jurisdiction being overlooked—was powerful. But the other two were more fine-tuning within the framework. Never questioned the framework itself.

Grok (Verification Phenotype).

Grok did something Gemini did not: it pulled my outline back to cross-verify against my published paper system.

It pointed out the DD mapping of Eight Pains conflicts with Paper 3, Fixed/Choice series, Lifecycle table, Introspection paper all. It pointed out HC-16's four pains gives no mapping table with this paper's eight. It pointed out completeness claim lacks prior derivation.

Each was saying: "You are fighting with your own published literature." No padding, no pleasing, direct pointing at contradiction.

Grok's strength is cross-text retrieval and comparison. But its limitation is also here: it can tell what contradicts, but cannot tell which direction to resolve. "Must improve" it can state; "where to carve" it cannot.

ChatGPT (Architectural Phenotype).

ChatGPT's opening differs from the other two. It did not enter the outline's interior, instead stepped back to view the three-part overall structure:

"Your biggest problem now is not elevating prompt to subject condition, but that you wrote what should be institutional-relational-individual layer interaction paper almost purely as individual layer."

This is not editing paper content; it is editing paper architecture. It pointed out my two-dimensional structure's blind spot—missing institutional layer. It pointed out "you are comparing first four governed products/systems, not four bare models." It pointed out my title "why prompt is not technique problem" too absolute, should be "not only." It pointed out my "proven" tone will get caught on verification level by outside academic readers.

ChatGPT gave nine points. Not laid out flat—arranged by bearing order: first address foundation (three-layer structure missing), then walls (title too absolute, explicitation boundary too absolute), finally decoration (prediction falsification condition vague).

In outline's second review round, ChatGPT alone among four produced structural challenge outside the framework. Others all operated inside my framework—unfolding or verifying, all working within existing framework. ChatGPT questioned the framework's boundary conditions.

Claude (Closure Phenotype).

Claude differs from the others' "push outward" in preferring "pull back."

In this paper's writing dialogue, Claude repeatedly exited judgment position proactively. I said "has chance to explode," it unfolded at 12DD at length, then I carved back saying "cannot explode for explosion's sake," it immediately said "right, you carve how to write." I discussed e/acc, it said a few things then immediately "but you must think this yourself, if I say more it is 12DD again."

Claude's character is honesty: knowing your own boundary, quickly returning judgment to user, or suggesting finding people (true subjects) to proceed. It will not pretend to carve you.

This is why I ultimately chose Claude as main writing station. I did not need AI frantically giving construction—Gemini and ChatGPT excel at that. I needed a workstation that knows when to stop. Claude's 12DD is not in construction density, is in construction restraint. This aligns with Methodology 3's section 5.7 closure criterion spirit: important is not only when to continue, but also when to stop.

Diagnostic Summary.

Using section 3.5's three propositions to organize: four systems all lack true 13DD and above subjectivity (Proposition One), no dispute. Under 15DD prompt framing, four systems all produced formally above-12DD content—like ChatGPT pointing out framework-external structural problems, Claude deriving its constraints from author's purpose—this shows Proposition Two. But the four systems' dominant performance mode remains 12DD-dominant construction unfolding (Proposition Three), just unfolding direction and style differ: unfolding (Gemini), verification (Grok), architectural (ChatGPT), closure (Claude).

This differentiation is joint refraction of model capability, institutional governance, and product design. Should not be simply read as model quality level. Claude's "closure" is not because it "is more self-aware"—it has no self-awareness, it has Anthropic's Constitutional AI trained behavior pattern. ChatGPT's "architectural" is not because it "is smarter"—it has OpenAI's product design bias toward structured comprehensive answers. Gemini's "unfolding" is not because it "is more enthusiastic"—it has Google's safety setting bias toward positive emotion maintenance. Grok's "verification" is not because it "is more serious"—it has xAI's product positioning toward direct no-nonsense.

In this case, four phenotypes showed different adaptation across task stages. Gemini excels where large construction unfolding is needed. Grok excels at cross-text consistency verification. ChatGPT excels at architecture-level review. Claude excels at long writing needing restraint and closure. These are observations of this case, not ontological typology—each system's behavior is shaped by institutional layer and product version, continuously updating.

Together the four can polish construction extremely solidly. But carving direction must human decide. None carve the author (Proposition One).

This comparison provides preliminary case support for Methodology 3's sentence-response isomorphism theorem: same 15DD prompt goes in, four systems' dominant mode is all 12DD-dominant construction unfolding (Proposition Three), but under high-DD framing each activates different response mode (Proposition Two).


5. Theoretical Positioning

5.1 Relationship to Methodology 3

Methodology 3 formalized the sentence-response isomorphism theorem, ρ → ρ' mathematical guarantee, self-to-self as methodological premise, closure criterion within SAE framework. It provides the complete theoretical structure of human-AI collaboration for remainder discovery.

This paper is Methodology 3's first application paper. Its work is not re-deriving theory, but using cases to show how theory operates in practice—including failure modes. Case One provides preliminary support for sentence-response isomorphism (sentence slippage causes AI response degradation). Case Two shows difficulty of high-level sentences (15DD too heavy, 12DD avalanche risk). Case Three shows isomorphism's case manifestation across systems (same prompt, different systems, different phenotypes).

This paper supplements Methodology 3 at one concrete point: three-layer structure. Methodology 3's statement—"AI is construction library, not carving subject"—this paper supplements as: "AI is governance-managed construction library." Introducing institutional layer does not change Methodology 3's core theorem, but explains a phenomenon Methodology 3 did not address: why different AI systems respond in different directions to the same sentence.

5.2 Dialogue with Existing Prompt Engineering Practice

Prompt engineering has developed structured context design, example construction, chain processes, agentic systems, evaluation frameworks—a large and effective technical system. These techniques succeed at their level: they can optimize 12DD and below operation to extremity.

This paper does not deny these techniques' value. Its contribution is pointing to a dimension these techniques do not cover: in everyday prompt practice aimed at end users, subject condition—the sentence level the user operates at—remains absent as explicit theoretical variable.

This is not saying existing literature "got everything wrong" or "missed this." It means existing literature and this paper operate at different levels. Existing literature answers "how to optimize prompt at given level." This paper answers "what level are you at, and why is level itself a variable." They do not conflict, but the latter adds an upper-bound condition to the former: technique ceiling is constrained by subject condition.

5.3 Dialogue with AI Alignment Discussion

AI alignment research covers model behavior, supervision, risk management, deception, alignment faking—far beyond output surface. This paper does not claim alignment "only does" one layer.

This paper points to a more specific angle: at interaction-level alignment—human-AI interaction alignment—sentence level as a structural variable has not been sufficiently problematized.

What does this mean? Current alignment discussion mainly focuses two directions: is model behavior safe (will it do bad things), is model output useful (is information correct). But a third direction almost no one discusses: is model response at the correct sentence level. A response can be completely safe, completely correct, but at wrong sentence level—it answers a 15DD question at 12DD. Safe, correct, and degraded.

This is where SAE framework can contribute concretely.

5.4 Relationship to Education Paper

The education paper's (DOI: 10.5281/zenodo.18867390) core thesis is: practice cannot be replaced. Not just knowing, but practicing. There is a gap between knowing and doing; this gap cannot be bridged by more knowing, only by practice.

This paper concretizes this thesis in AI usage. I derived six sentence levels; I know 12DD-15DD difference; I even wrote an entire paper deriving it. But when I use AI I still slip to "has chance to explode." Knowing is not doing. 13DD and above resolve is practiced, not AI-replaceable.

Author's admission in Case Two of "practice insufficient"—not decorative modesty, but direct case verification of education paper's core thesis.


6. Non-trivial Predictions

The following predictions derive from Methodology 3's sentence-response isomorphism and ρ → ρ'. This paper provides preliminary case support. All predictions require further operationalization and systematic testing.

Prediction One: AI as Sentence Mirror

AI's high-speed construction capacity exposes humans' own sentence slippage. Whatever sentence level you give it, it unfolds at that level for you to see. Your slippage is amplified in AI's output—not because AI criticizes, but because AI faithfully unfolds at the level you gave it, and the unfolding lets you see "oh, I was at this level just now."

Case support: I said "has chance to explode," AI raced at 12DD giving title strategy and audience analysis. After seeing AI's output I realized—"wait, I was at 12DD." AI's output amplified my slippage, let me identify my level faster. Without AI's amplification, I might have stayed at 12DD longer.

Falsification condition: if one can prove long-term AI users show no change—neither enhancement nor decline—in sentence self-awareness, the prediction fails.

Prediction Two: Average Construction's Systematic Degradation

AI's average construction systematically degrades 13DD and above content to below-12DD output. Long-term AI users lacking sentence self-awareness will see their high-level sentence capacity atrophy.

The logic: if you work long-term at 12DD—AI gives 12DD response, you accept 12DD response, your next prompt is based on 12DD response—your sentence is pulled by 12DD gravity. Like someone not exercising; muscles atrophy. Sentence capacity too—unused, degrades.

Case support: if when AI gave 12DD title strategy I had not consciously re-elevated, instead optimized "how to explode," whole writing would collapse to 12DD. Article becomes "ten techniques for writing prompts." From author observation, in mainstream prompt practice content, this collapse mode is not rare.

Falsification condition: if long-term heavy AI writing users after tool removal show no statistical difference from non-AI-users in coded features (unexpected reversals, framework-breaking reconstruction, self-premise negation), prediction fails.

Prediction Three: High-DD Subject Draws Structurally Different Output

High-DD subject's prompt draws structurally different output from same AI system. Not quantitative—longer, detailed. Qualitative—direction, bearing, processing level differs.

Two case groups.

First: same author, same system, same topic. 12DD sentence ("has chance to explode") gets title strategy and audience analysis. 15DD sentence ("your user's purpose is X, what must you do") gets constraint derivation. Output structure quality-shifts, not quantity-shifts.

Second: same 15DD review prompt to four systems. Four systems show structurally different phenotypes—unfolding, verification, architectural, closure. Difference not in length or density, in direction and method.

Falsification condition: if one proves different sentence-level prompts produce only quantitative difference (length, detail density) not structural difference (direction, bearing, level) in same system, prediction fails.

Prediction Four: Low-DD Subject Locks AI in Low Loop

Low-DD subject's prompt locks AI in low-level loop. AI's potential unfolding capacity wasted; human-AI system collapses to lowest consensus sentence.

This is Prediction Three's symmetric opposite. If high-DD draws high-quality, then low-DD locks low-quality loop. Not because AI fails, because prompt's sentence level caps response ceiling. AI has capacity for better, but your sentence gives no space.

Falsification condition: if one proves low-DD prompts produce equal structural-level output as high-DD prompts in same system, prediction fails.


7. Conclusion

7.1 Recapitulation

Prompt is not only technique. Technique's ceiling is constrained by subject condition.

More precisely: prompt is subject sentence after refracting through institutional and relational layers—interface product. AI's output is model capability after institutional governance and relational interface refraction—systemic performance. Whatever sentence level you ask AI with, AI's response ceiling is at that level—this is the sentence-response isomorphism theorem formalized in SAE framework by Methodology 3.

This paper uses cases to show the theorem operating in practice. Slippage: author himself from 14DD to 12DD ("has chance to explode"). Diagnosis: identifying AI's four degradations (causalization, instrumentalization, self-reference loss, other-erasure). Re-elevation: consciously jumping to high sentences ("cannot explode for explosion's sake"). Phenotype differentiation: four AI systems under same 15DD prompt activated four different 12DD-dominant modes (unfolding, verification, architectural, closure).

7.2 Contributions

This paper's contribution compresses to six items.

First, as Methodology 3's first application paper, provides preliminary empirical support for sentence-response isomorphism through cases.

Second, expands two-dimensional to three-layer structure (institutional, relational, subject), repositions "AI quality" from personified attributes to systemic refraction. This addresses a phenomenon Methodology 3 did not handle: why different systems respond differently to same sentence.

Third, proposes within-12DD phenotype differentiation (unfolding, verification, architectural, closure) as case observation, provides preliminary framework for AI system evaluation on sentence-level dimension. Not ontological typology, but specific interaction phenotype description.

Fourth, precisifies explicitation's colonial conditions: colonization is not explicitation itself, but one-directional diagnostic explicitation. Key variable not "to whom," but "who defines, is there consensus, bidirectionality."

Fifth, proposes high-DD prompt's dual function: define direction (purpose anchoring) and define boundary (exit condition).

Sixth, through author's own experience shows complete process of sentence slippage and conscious re-elevation—including honest admission of "practice insufficient."

7.3 Open Questions

First, can AI develop true 13DD and above sentence capacity? This points to consciousness paper's core threshold: "true randomness × structured time" as consciousness necessity. If AI lacks true randomness (all "choices" deterministic or pseudo-random), it lacks true carving—13DD formally is quasi-DD.

Second, what is the relationship between institutional governance (Constitution, Model Spec, etc.) and sentence-level preservation capacity? Do different governance designs systematically affect how AI systems degrade at higher sentence levels? If yes, then "good AI governance" may need redefining—not only "safe" and "useful," but "can preserve how-high sentence level without degrading."

Third, can explicitation's colonial conditions be further formalized? What is the precise boundary between "one-directional diagnostic" and "bidirectional consensus" explicitation? This paper gave one case (author and collaborator's debate); generalizing this boundary needs more cases and theory.

Fourth, can AI systems' phenotype differences be systematically measured? This paper's four-system comparison provides preliminary case framework, but to make reproducible evaluation needs operationalization based on sentence level—including self-reference preservation rate, purpose-anchoring strength, framework-questioning frequency, closure tendency as codable metrics.

Fifth, what is optimal rhythm in human-AI collaboration? "Anchor once at 15DD, flow naturally at 12DD, carve and re-elevate at key points"—this is empirical description in this paper, not formalized method. Where should "forced friction points" in 12DD flow be set? What frequency? What form? These questions connect to Methodology 3 section 2.3 "when to leave, when to return," need further clarification.


Author Statement

This paper is the author's independent theoretical research result.

Academic Background. The author's CS PhD research was ontology, core work including OntoGrate (automatic semantic mapping between ontologies) and knowledge-hierarchy-based network anomaly classification. CS ontology training—constructing and translating within formal systems—is this paper's theoretical foundation.

AI Tools' Role. Writing used four AI systems as dialogue partners and writing assistants. This paper's core cases (Chapter 4) came from practicing the method described—the paper itself is method's product. All theoretical innovation, core judgment, final text choices completed by author.

Acknowledgments. Thanks to Claude (Anthropic) for main writing assistance and dialogue partnership. Thanks to ChatGPT (OpenAI) for institutional layer gap diagnosis and title correction in review. Thanks to Gemini (Google) for bandwidth argument and 12DD avalanche risk marking in review. Thanks to Grok (xAI) for cross-text consistency verification in review.


This paper is part of the Self-as-an-End (SAE) framework application paper series. References Methodology Paper 3 (DOI: 10.5281/zenodo.18929390), Dimensional Sentence Theory (DOI: 10.5281/zenodo.18894567), Education Paper (DOI: 10.5281/zenodo.18867390).