Non Dubito Essays in the Self-as-an-End Tradition
← 凿与构:时间性艺术 ← Chisel and Construct
凿与构 · 时间性艺术的通用结构 · 第二篇
Chisel and Construct · The Universal Structure of Temporal Arts · Essay II

戏曲与歌剧

Opera and Chinese Opera

「梅兰芳与卡拉斯的同一个操作」

"The Same Operation of Mei Lanfang and Maria Callas"

秦汉 Han Qin | 2026

第一篇我们在纯听觉通道内建立了"生定展固"模型。音乐的凿构循环发生在一条线上——你的耳朵接收声音,你的大脑建立预测模型,模型被打破,然后带痕闭合。

但如果你去看一场京剧,或者看一场歌剧,你会发现一件事:你同时在处理好几条线。唱腔是一条,身体动作是一条,舞台视觉是一条,如果有叙事的话,故事情节还是一条。每一条线都可以独立运行自己的凿构循环。

这就引出了本篇的核心问题:当多条凿构循环同时运行时,发生了什么新的事情?

答案是:跨通道凿。

一、什么是跨通道凿

纯音乐的凿构循环是单通道的。"展"发生在听觉内部——旋律打破了你对旋律的期待,节奏打破了你对节奏的期待。

戏曲和歌剧引入了一个新机制:一个通道在"定"的同时,另一个通道在"展"。

什么意思?想象一下:你在听一段唱腔,旋律线非常规整,板式完全可预测——你的听觉预测模型是稳定的,你处于"定"的状态。但同时,舞台上的身体动作做了一个你没预料到的事情——一个突然的转身,一个迟疑的手势,一个不合常理的停顿。你的视觉预测模型被打破了。

你的听觉在"定",你的视觉在"展"。两条凿构循环之间出现了相位差。

这个相位差本身就制造余项——而且是纯音乐制造不了的余项。因为这个余项不在任何单一通道内,它在通道之间。你的认知系统同时处理两条线,两条线的不同步产生了一个剩余:你感觉到了什么,但你不知道它属于你听到的还是你看到的。

这就是跨通道凿。它是戏曲和歌剧特有的结构资源,纯音乐没有。

二、京剧:板式提供锚点,身体制造凿

京剧的编码系统高度程式化。板式(西皮、二黄、各种导板、摇板、散板)规定了唱腔的节奏框架和旋律走向。行当(生旦净丑)规定了声音的质地和表演的规范。一个熟悉京剧的观众,在唱腔层面上的预测模型是极其稳定的——他几乎可以提前半句预测下一个字的落音位置。

这看起来像是纯粹的"定"。如果京剧只有唱腔,它就是一个程式化极强、凿空间极小的形式。

但京剧不只有唱腔。它有身段。

唱腔在做什么?南梆子的板式高度规整,旋律线优美但可预测。你的听觉处于"定"——稳定,舒适,你知道下一句会怎样收。

舞剑在做什么?每一个剑花的轨迹、速度、角度都在你的视觉预测之外。你不知道下一个动作是什么。剑的运动是流动的、即兴感的(虽然其实也是经过严格训练的),它的轨迹超出了你的视觉模型。

两条线同时运行。听觉在"定",视觉在"展"。你的认知系统在两个不同的状态之间被拉扯,这个拉扯本身就是余项。

而且这个余项有一个特殊性质:它不可能在任何单一通道内被完全消化。你不能只听录音来获得它(录音里没有身体动作),你也不能只看无声的舞剑录像来获得它(没有唱腔提供的锚点,视觉就失去了参照系)。它只存在于两个通道的交汇处。

这就是为什么京剧的现场和京剧的录音是两种完全不同的体验。录音把跨通道凿砍掉了,只留下单通道的唱腔。

另一个例子。梅兰芳在这出戏里做的事情更微妙。唱腔本身就有凿(四平调的一些非常规处理),但核心的跨通道凿发生在身段与唱腔的反向运动上:唱腔在往"收"的方向走的时候,身体动作在往"放"的方向走。声音在闭合,身体在打开。两个通道传递的信号是矛盾的。

这种矛盾不是错误,是设计。它制造了一种你用语言很难描述的余项——你知道你感受到了什么,但你说不清楚它到底是"听到的"还是"看到的",因为它既不完全属于任何一个通道。

三、歌剧:同一个操作,不同的编码

现在把京剧放在一边,看歌剧。编码系统完全不同——西方调性和声、管弦乐配器、意大利语或德语的歌词、欧洲舞台传统。但跨通道凿的操作是同一个。

这是跨通道凿在歌剧中的极致案例。

两个声部——特里斯坦和伊索尔德——做的事情是:一个在完成乐句(固),另一个在开启新乐句(生)。旋律线交织在一起,但它们的凿构循环是错位的。当你跟着伊索尔德的线走,你感到闭合即将到来;但同时特里斯坦的线把你拉向一个新的开始。你永远处于半完成状态。

这已经是听觉内部的跨通道凿——两条旋律线之间的相位差。但瓦格纳还叠加了另一层:歌词叙事与音乐情感的错位。特里斯坦唱的文字内容是关于黑夜和死亡的,但音乐的走向是上升的、趋向光明的。叙事通道在说一件事,音乐通道在说另一件事。

余项在通道之间不断产生,但永远不被解决。整个第二幕就是一个巨大的"展",没有"固"。闭合被无限推迟——直到第三幕的Liebestod(爱之死),所有积累的余项在最后的和声解决中一次性闭合。那个闭合之所以有那么大的重量,就是因为前面两个多小时的跨通道凿积累了无法计量的余项。

比瓦格纳简单得多,但跨通道凿依然在运行。

旋律线极其明确,大线条,可预测——这是听觉层面的"定"。卡拉夫在唱一首"我一定会赢"的咏叹调。但乐队织体在底下做着另一件事:和声的走向暗示着不确定性,配器的色彩在明暗之间游移。旋律说"确定",乐队说"不一定"。

再加一层:如果你懂歌词,你知道卡拉夫在赌命。文本叙事的紧张感和旋律的"豪迈感"之间有一个缝隙。这个缝隙就是跨通道凿制造的余项。

四、同构对照:虞姬舞剑与特里斯坦二重唱

这两个段落的编码系统几乎没有任何共同点。一个是京剧的南梆子加身段,一个是德国浪漫主义歌剧的半音和声加双人声部。语言不同,音乐体系不同,表演传统不同,观众的文化背景不同。

但在跨通道凿的层面上,它们做着同一个操作:

一个通道提供锚点(唱腔/其中一条旋律线在"定"),另一个通道在那个锚点之上制造打破(身体动作/另一条旋律线在"展")。两条凿构循环的相位差产生了不可在单一通道内消化的余项。

虞姬舞剑的余项来自听觉(定)与视觉(展)的相位差。特里斯坦二重唱的余项来自两条旋律线之间的相位差,以及叙事与音乐之间的相位差。通道组合不同,但"相位差制造余项"这个操作是同一个。

这就是同构:不同传统,同一凿构操作。

五、异构同效:昆曲游园惊梦与莫扎特唐·乔瓦尼

如果说虞姬舞剑和特里斯坦是"同一个操作的不同编码",那昆曲《牡丹亭·游园惊梦》和莫扎特《唐·乔瓦尼》最后一幕是另一个层面的证明:"完全不同的跨通道凿方式,产生同一种余项效果"。

昆曲是中国戏曲中程式化程度最高的形式之一。曲牌体规定了每一句唱词的格律、旋律框架甚至字词的声调走向。身段的规范精细到手指的角度。在这个极度程式化的construct里,"展"的空间看起来几乎为零。

但杜丽娘游园这一折恰恰在程式化的极致中制造了一种独特的跨通道凿:唱词在说"春天真美"(construct,确认性的),但昆曲特有的缓慢节奏和水磨腔的处理方式把每一个字都拉成了一个漫长的时间体验。时间本身成了一个通道。你在文字层面得到了确认(这是春天,这是美),但在时间体验层面得到了打破——这个"春天真美"被拉得太长了,长到你开始在那个延展的时间里感觉到别的东西。杜丽娘对春天的感受下面藏着的那个孤独和欲望,不是唱词说出来的,是时间的拉伸暴露出来的。

余项不在文字里,不在旋律里,在时间和文字的相位差里。

完全不同的跨通道凿方式。石像(骑士长的鬼魂)来赴宴的场景。莫扎特做的事情是:叙事通道在construct(鬼魂来审判唐·乔瓦尼,这是一个道德故事的闭合),但音乐通道在chisel。

音乐的处理超出了叙事的逻辑。石像出现时的和声(d小调,长号的阴暗音色)不只是"配合"剧情,它制造了一种超出叙事框架的恐惧。叙事说"坏人受到惩罚",音乐说"这里有一种你无法用道德叙事框定的力量"。你在叙事层面得到了闭合(坏人倒了),但在音乐层面感受到了打开——那个和声的黑暗不是"正义得到伸张"可以解释的。

余项在叙事闭合和音乐打开的缝隙里。这就是为什么这一幕不只是一个道德故事的结尾,而是音乐史上最令人不安的场景之一。

两个作品的跨通道凿方式完全不同——昆曲用时间拉伸暴露文字下面的东西,莫扎特用音乐的黑暗超出叙事的框定。但余项效果是等价的:你得到了一个无法在任何单一通道内消化的剩余。这个剩余让两个作品穿越了几百年。

异构同效:不同的凿,同一种不可穷尽。

5a、与Wagner的Gesamtkunstwerk对话

跨通道凿这个概念有一个天然的对话对象:Wagner的Gesamtkunstwerk(总体艺术品)。

1849年,Wagner在《未来的艺术品》里提出:自古希腊以来,音乐、诗歌、戏剧、舞蹈被割裂了,歌剧的使命是把它们重新统一成一个整体。各艺术门类应该消融边界,汇入同一条河,服务于一个共同的目标。

表面上,这跟本篇讨论的"多通道"很像。但底层逻辑完全相反。

Wagner追求的是通道的统一(unity)——所有通道说同一件事,传递同一个信息,融为一个无缝的整体。本篇的分析指向的是通道的相位差(phase difference)——跨通道凿的力量恰恰在于通道之间不同步。虞姬舞剑之所以有力,不是因为唱腔和身体"融为一体",是因为唱腔在定的时候身体在展。如果所有通道完美同步、传递同一个信息,那就是"多通道的定",不是跨通道凿。多通道的定只会让construct更厚,但不产生新的余项。

而且有一个讽刺:Wagner自己最好的作品恰恰违反了他自己的理论。特里斯坦第二幕之所以伟大,不是因为它实现了Gesamtkunstwerk的统一理想——文字说死亡,音乐说上升;叙事在闭合,和声在打开。通道之间的矛盾才是那个场景的力量来源。Wagner的理论说要融合,Wagner的实践做的是凿。

这其实是Self-as-an-End框架一以贯之的立场:不追求和谐统一,追求在construct内部制造真实的否定,然后看什么活下来。余项不是在融合中产生的,是在矛盾中暴露的。通道之间的相位差就是一种矛盾——听觉告诉你一件事,视觉告诉你另一件事,你的认知系统在两者之间被撕开,撕开的缝隙就是余项。

附带提一句Brecht的间离效果(Verfremdungseffekt)。他跟Wagner恰好站在对面:Wagner要让观众沉浸在统一的幻觉中,Brecht要打破幻觉,让观众意识到自己在看戏。用本文的语言说,Brecht做的是元层面的凿——打破的不是某个通道内部的预测模型,而是"这些通道应该融为一体"这个更高层级的预测模型。他暴露了通道本身。这是另一种跨通道凿,只不过凿的对象不是通道内容之间的相位差,而是"通道存在"这个事实本身。

三个位置由此清晰:Wagner要统一通道,Brecht要暴露通道,本文要利用通道之间的相位差。三者都在处理多通道的问题,但操作方向完全不同。

六、程式化与凿的张力:大师和匠人的结构区别

戏曲和歌剧有一个纯音乐没有的特征:高度程式化。

京剧的板式、行当、身段都有严格规范。歌剧有咏叙调和咏叹调的分工、声部类型的角色对应、乐队的convention。昆曲的曲牌体把每一个音几乎都规定好了。能剧的程式化更极端——面具、步法、扇子的角度都是固定的。

程式化是一种极强的construct。它让观众的预测模型在开场之前就已经建立了——你知道西皮原板的节奏会怎样走,你知道咏叹调到最后会有一个高音。

这看起来像是"展"的敌人。程式化越强,可凿的空间越小。但事实恰恰相反。

程式化越强,在其内部制造的微观凿就越有力。因为观众的预测模型极其精确,任何微小的偏离都会被立刻感知到。在一个随便什么都能发生的框架里,偏离一点没人注意。在一个每一个音都被规定好的框架里,偏离半个音就是地震。

这就是匠人和大师的结构区别。

匠人完成程式的闭合。他精确地执行了所有规定动作,生-定-固,干净完整,无可挑剔。观众鼓掌,因为技术完美。

大师在闭合内部制造最小但真实的偏离。梅兰芳的一个眼神——在程式规定你应该看向某个方向的时候,他的目光迟疑了零点几秒。卡拉丝的一个气息位置——在所有人都会在同一个地方换气的时候,她把换气的位置挪了半拍,让一个乐句的呼吸结构微妙地改变了。

这些偏离小到你几乎意识不到。但你的预测模型感知到了。你不知道发生了什么,但你觉得"跟别人不一样"。这个"不一样"就是大师在程式化的construct内部制造的展。它小到不破坏程式,但真实到产生了余项。

匠人的表演你看一遍就够了——因为他完美地确认了你的预测,没有余项。大师的表演你看十遍还有东西——因为那些微观的凿制造了不可穷尽的余项。

这是一个可检验的结构判断,不是品味判断。你不需要"懂"京剧或歌剧就能感受到这个差异。你只需要在看两个不同演员的同一出戏时注意:一个让你觉得"很好",另一个让你觉得"说不出哪里不一样但就是不一样"。后者就是在程式化construct内部做了微观凿的那个人。

七、反例:当"展"被系统性删除

前面的反例都是个体层面的——某个演员技术完美但不动人。现在看一个更大尺度的反例:当一个国家的权力机器系统性地删除"展",会发生什么?

二十世纪有两个独立的大规模实验。

纳粹德国,1933-1945。 1938年的"堕落音乐"(Entartete Musik)展览把勋伯格的无调性、爵士乐、一切"不协和的、混乱的、知识分子气的"音乐定义为种族污染。只允许贝多芬到布鲁克纳这条线的德国浪漫主义传统。纳粹对音乐的要求几乎全部是否定性的:不能不协和,不能无调性,不能十二音,不能"混乱",不能受犹太或爵士影响。留下来的是什么?只有"生-定-固"。凿被等同于种族退化。

苏联,1930s-1950s。 社会主义现实主义要求音乐具备"人民性"(narodnost)——实际上就是保守的调性和声,人人都能理解。"形式主义"是最高罪名。肖斯塔科维奇的歌剧《姆钦斯克县的麦克白夫人》1936年被斯大林亲自封杀,因为它的和声太黑暗太讽刺——用本文的语言说,因为它做了真正的凿。此后肖斯塔科维奇再也没写过严肃的歌剧或芭蕾。凿被等同于资产阶级腐化。

两个政权的意识形态完全对立——一个是极右种族主义,一个是极左共产主义。但在音乐政策上做了同一个结构操作:删除"展",只保留"生-定-固"。

这不是巧合。威权体制本能地把"展"视为威胁,因为"展"打破预测模型,而威权需要一切都可预测可控制。凿是不可控的——你不知道余项会暴露出什么。一个不协和音可能只是一个不协和音,也可能是对现存秩序的质疑。威权无法承受这个不确定性,所以它的选择是:把凿本身定义为罪。

结果呢?两个体制下的官方艺术都高度程式化、技术精良、而且极其无聊。它们完美地完成了"生-定-固",但没有人自愿反复聆听。它们穿不过周期,因为余项在体制层面就被预防了。

而讽刺的是,两个体制下最好的作品恰恰是那些在审查框架内偷偷做了微观凿的。肖斯塔科维奇的交响曲表面符合社会主义现实主义的要求,但在配器、和声的暗处藏着讽刺和颠覆——那些你第一遍听不出来但第十遍开始感觉到的东西。它穿过了周期。官方批准的那些作品没有。

这是"生定展固"模型在制度层面的验证:你可以用国家权力删除"展",但你删不掉人类认知系统对凿的需求。删凿的结果不是更好的艺术,是更无聊的艺术。

八、为什么现场和录制是两种体验

跨通道凿还解释了另一个普遍现象:为什么戏曲和歌剧的现场和录制版本是完全不同的体验。

一张京剧唱片——哪怕是梅兰芳的录音——砍掉了视觉通道,也就砍掉了跨通道凿。你只剩下单通道的听觉凿构循环。它仍然可以是伟大的音乐,但它不再是京剧的完整体验。那些发生在唱腔(定)和身体动作(展)之间的余项消失了。

歌剧的情况类似。一张帕瓦罗蒂的录音可以让你感受到声音本身的凿构循环,但你听不到他在舞台上的身体存在与音乐之间的相位差。卡拉丝现场的传奇性很大程度上来自她的身体表演——那些在程式化的歌剧身段里做的微观偏离——这些在录音里完全丢失了。

这不是说录音不好。录音是单通道的完整体验。但它跟现场不是"同一个东西的不同版本",它是一个通道更少的不同形态。就像你不能说黑白照片和彩色照片是"同一个东西"——它们记录了同一个对象,但可用的通道不同,能制造的余项也不同。

这也解释了为什么在视频时代,戏曲和歌剧的传播反而遇到了一个悖论:视频看起来保留了视觉通道,但屏幕上的身体动作和现场的身体动作在观众的认知系统中的权重完全不同。现场的身体是三维的、有重量感的、占据真实空间的——你的知觉系统对它的预测模型跟对屏幕上的二维影像的预测模型不一样。所以视频版本的跨通道凿是衰减的,不是完整的。

这不是怀旧,不是"现场就是比录音好"的空洞感叹。这是一个结构事实:通道越少,可制造的跨通道余项就越少。

九、从单通道到跨通道:第一篇到第二篇的进展

回顾一下我们到目前为止建立的东西。

第一篇证明了:在纯听觉通道内,"生定展固"是所有音乐的通用结构。经久不衰的音乐 = 四步完整 + 余项真实且不可穷尽。

第二篇(本篇)证明了:当多个感官通道同时参与时,凿构循环不仅可以在每个通道内部独立运行,还可以跨通道运行。跨通道凿制造的余项不存在于任何单一通道内,只存在于通道之间的相位差中。这是戏曲和歌剧特有的结构资源。

程式化不是凿的敌人,而是凿的最好地基——预测模型越精确,微观偏离的效果越大。大师和匠人的区别不在技术,在于是否在程式化的construct内部制造了真实的展。

这就引出了下一个问题:如果凿构循环可以跨通道运行,那它是不是依赖于某个特定的通道组合?如果我们把听觉通道完全去掉,只留身体动作,凿构循环还能不能成立?

下一篇,我们进入芭蕾和舞蹈——身体作为凿的主通道。当音乐降为背景甚至完全消失时,广播体操和皮娜·鲍什的差别在哪里?

Han Qin | 2026

Essay I established the Arise-Settle-Unfold-Fix model within a single auditory channel. Music's chisel-construct cycle runs on one line — the ear receives sound, the brain builds a predictive model, the model is broken, and re-closure carries the trace.

But if you attend a performance of Peking opera, or a performance of Western opera, you notice something: you are processing several lines simultaneously. The singing is one line; bodily movement is another; stage visuals are another; if narrative is present, the story is yet another. Each line can independently run its own chisel-construct cycle.

This leads to the core question of this essay: when multiple chisel-construct cycles run simultaneously, what structurally new phenomenon occurs?

The answer is: cross-channel chisel.

I. What Is Cross-Channel Chisel

The chisel-construct cycle of pure music is single-channel. Unfold occurs within audition — melody breaks your melodic expectation, rhythm breaks your rhythmic expectation.

Opera and Chinese opera introduce a new mechanism: one channel is in Settle while another channel is simultaneously in Unfold.

Consider: you are listening to a vocal passage whose melodic line is entirely regular, its rhythmic-modal framework fully predictable — your auditory predictive model is stable; you are in Settle. But simultaneously, the bodily movement on stage does something you did not anticipate — a sudden turn, a hesitant gesture, an illogical pause. Your visual predictive model is broken.

Your audition is in Settle; your vision is in Unfold. A phase difference has appeared between two chisel-construct cycles.

This phase difference itself produces remainder — and remainder that pure music cannot produce. Because this remainder exists in neither single channel; it exists between channels. Your cognitive system simultaneously processes two lines, and the desynchronization of the two produces a residue: you sense something, but you do not know whether it belongs to what you heard or what you saw.

This is cross-channel chisel. It is a structural resource specific to opera and Chinese opera that pure music does not possess.

II. Peking Opera: Modal Framework Provides Anchor, Body Produces Chisel

Peking opera's encoding system is highly formalized. Modal-rhythmic frameworks (xipi, erhuang, various introductory and free-rhythm modes) prescribe the rhythmic structure and melodic direction of vocal passages. Role types (sheng, dan, jing, chou) prescribe vocal quality and performance norms. A viewer familiar with Peking opera has an extremely stable predictive model at the vocal level — they can predict the landing pitch of the next syllable almost half a phrase in advance.

This appears to be pure Settle. If Peking opera consisted only of singing, it would be a form with extremely strong formalization and minimal space for chiseling.

But Peking opera has more than singing. It has body work (身段).

What is the singing doing? The nanbanzi modal framework is highly regular, the melodic line beautiful but predictable. Your audition is in Settle — stable, comfortable; you know how the next phrase will cadence.

What is the sword dance doing? Each arc of the blade — its trajectory, speed, angle — falls outside your visual prediction. You do not know what the next movement will be. The sword's motion is fluid, with an improvisatory quality (though in fact rigorously trained); its trajectory exceeds your visual model.

Two lines run simultaneously. Audition in Settle, vision in Unfold. Your cognitive system is pulled between two different states, and this pulling is itself remainder.

Moreover, this remainder has a special property: it cannot be fully absorbed in any single channel. You cannot obtain it from a recording alone (the recording lacks bodily movement), nor can you obtain it from a silent video of the sword dance alone (without the vocal anchor, the visual loses its reference frame). It exists only at the intersection of the two channels.

This is why the live experience of Peking opera and its audio recording are two entirely different experiences. The recording amputates cross-channel chisel, leaving only single-channel vocal cycle.

Another example. What Mei Lanfang does in this piece is subtler. The singing itself contains chiseling (certain non-standard treatments in the sipingdiao mode), but the core cross-channel chisel occurs in the counter-motion between body work and singing: when the vocal line moves toward "closing," the bodily movement moves toward "opening." Sound is converging; the body is diverging. The two channels transmit contradictory signals.

This contradiction is not error; it is design. It produces a form of remainder that is difficult to describe in language — you know you sensed something, but you cannot determine whether it was "heard" or "seen," because it belongs entirely to neither channel.

III. Western Opera: The Same Operation, Different Encoding

Set Peking opera aside and consider Western opera. The encoding system is entirely different — Western tonal harmony, orchestral instrumentation, Italian or German text, European stage tradition. But cross-channel chisel operates identically.

This is the extreme case of cross-channel chisel in opera.

The two vocal parts — Tristan and Isolde — do this: one completes a phrase (Fix) while the other opens a new phrase (Arise). The melodic lines interweave, but their chisel-construct cycles are offset. When you follow Isolde's line, you feel closure approaching; simultaneously Tristan's line pulls you toward a new beginning. You are permanently in a half-completed state.

This is already cross-channel chisel within audition — a phase difference between two melodic lines. But Wagner layers another level: a disjunction between textual narrative and musical emotion. Tristan's sung text concerns night and death, but the music's trajectory is ascending, tending toward light. The narrative channel says one thing; the music channel says another.

Remainder is continuously produced between channels but never resolved. The entire second act is a vast Unfold without Fix. Closure is indefinitely deferred — until the Liebestod (Love-Death) of Act III, when all accumulated remainder resolves in the final harmonic cadence. That cadence carries such weight precisely because the preceding two-plus hours of cross-channel chisel accumulated an immeasurable quantity of remainder.

Far simpler than Wagner, but cross-channel chisel is still operating.

The melodic line is extremely clear, broad, predictable — this is Settle at the auditory level. Calaf is singing a "I will surely triumph" aria. But the orchestral texture beneath does something else: harmonic movement suggests uncertainty; orchestral color oscillates between brightness and shadow. The melody says "certainty"; the orchestra says "not certain."

Add another layer: if you understand the text, you know Calaf is wagering his life. The tension of textual narrative and the "triumphant feeling" of the melody create a gap. That gap is remainder produced by cross-channel chisel.

IV. Isomorphic Comparison: Consort Yu's Sword Dance and the Tristan Duet

The encoding systems of these two passages share almost no common features. One is Peking opera's nanbanzi mode plus body work; the other is German Romantic opera's chromatic harmony plus dual vocal parts. Language differs, musical system differs, performance tradition differs, audience cultural background differs.

But at the level of cross-channel chisel, they perform the same operation:

One channel provides an anchor (the vocal/one melodic line in Settle); another channel creates breaking atop that anchor (body movement/the other melodic line in Unfold). The phase difference between two chisel-construct cycles produces remainder that cannot be absorbed within any single channel.

The remainder in the sword dance comes from the phase difference between audition (Settle) and vision (Unfold). The remainder in the Tristan duet comes from the phase difference between two melodic lines, and between narrative and music. The channel combinations differ, but the operation "phase difference produces remainder" is the same.

Isomorphic: different traditions, same chisel-construct operation.

V. Heteromorphic Equivalence: Kunqu's "Dream in the Garden" and Mozart's Don Giovanni

If the sword dance and Tristan are "the same operation in different encodings," then Kunqu opera's The Peony Pavilion: Dream in the Garden and the final scene of Mozart's Don Giovanni are proof at another level: "entirely different cross-channel chisel methods producing the same remainder effect."

Kunqu is among the most highly formalized forms in Chinese opera. The曲牌 (fixed-tune) system prescribes the prosody, melodic framework, and even tonal contour of every sung syllable. Body-work norms specify the angle of fingers. Within this extremely formalized construct, the space for Unfold appears to be nearly zero.

Yet the "Garden Stroll" scene produces a unique form of cross-channel chisel precisely at the extreme of formalization: the lyrics say "spring is beautiful" (construct, confirmatory), but Kunqu's characteristic slow tempo and the水磨腔 (water-polished singing) treatment stretches every syllable into an extended temporal experience. Time itself becomes a channel. At the textual level you receive confirmation (this is spring, this is beauty), but at the level of temporal experience you receive breaking — this "spring is beautiful" is stretched too long, so long that within the stretched time you begin to feel something else. The loneliness and desire beneath Du Liniang's perception of spring is not stated by the lyrics; it is exposed by the stretching of time.

Remainder exists not in the text, not in the melody, but in the phase difference between time and text.

An entirely different cross-channel chisel method. The stone statue (the Commendatore's ghost) comes to dine. What Mozart does is: the narrative channel is in construct (the ghost comes to judge Don Giovanni — this is a moral story's closure), but the music channel is in chisel.

The musical treatment exceeds the logic of the narrative. The harmony at the statue's appearance (D minor, the dark timbre of trombones) does not merely "accompany" the plot; it produces a terror that exceeds the narrative framework. The narrative says "the wicked man is punished"; the music says "there is a force here that your moral narrative cannot frame." At the narrative level you receive closure (the villain falls); at the musical level you feel opening — the harmonic darkness is not something "justice is served" can explain.

Remainder exists in the gap between narrative closure and musical opening. This is why this scene is not merely the conclusion of a moral story but one of the most unsettling scenes in the history of music.

The two works' cross-channel chisel methods are entirely different — Kunqu uses temporal stretching to expose what lies beneath text; Mozart uses musical darkness to exceed the narrative frame. But the remainder effect is equivalent: you receive a residue that cannot be absorbed within any single channel. This residue has kept both works crossing centuries.

Heteromorphic equivalence: different chiseling, same inexhaustibility.

5a. Dialogue with Wagner's Gesamtkunstwerk

Cross-channel chisel has a natural interlocutor: Wagner's concept of Gesamtkunstwerk (total work of art).

In 1849, Wagner argued in The Artwork of the Future that since ancient Greece, music, poetry, drama, and dance had been severed from one another, and that opera's mission was to reunify them into a whole. All art forms should dissolve their boundaries and flow into a single river, serving a common purpose.

On the surface, this resembles the "multi-channel" discussion of this essay. But the underlying logic is precisely the opposite.

Wagner pursues the unity of channels — all channels saying the same thing, transmitting the same message, fusing into a seamless whole. The present analysis points toward the phase difference between channels — the power of cross-channel chisel lies precisely in the desynchronization of channels. The sword dance is powerful not because singing and body "fuse into one," but because singing is in Settle while the body is in Unfold. If all channels were perfectly synchronized, transmitting the same information, that would be "multi-channel Settle," not cross-channel chisel. Multi-channel Settle only thickens the construct without producing new remainder.

And there is an irony: Wagner's own finest work violates his own theory. The second act of Tristan is great not because it achieves the Gesamtkunstwerk ideal of unity — text speaks of death while music speaks of ascent; narrative closes while harmony opens. The contradiction between channels is the source of that scene's power. Wagner's theory says to fuse; Wagner's practice chisels.

This is, in fact, the consistent stance of the Self-as-an-End framework: not to pursue harmonious unity, but to produce genuine negation within the construct and observe what survives. Remainder is not produced in fusion; it is exposed in contradiction. The phase difference between channels is a form of contradiction — audition tells you one thing, vision tells you another, your cognitive system is torn between the two, and the tear is remainder.

A brief note on Brecht's Verfremdungseffekt (alienation effect). He stands opposite Wagner: Wagner wants the audience immersed in a unified illusion; Brecht wants to shatter the illusion and make the audience aware they are watching a performance. In the language of this essay, Brecht performs chisel at the meta-level — what he breaks is not the predictive model within any single channel, but the higher-order predictive model that "these channels should fuse into one." He exposes the channels themselves. This is another form of cross-channel chisel, only the object of chiseling is not the phase difference between channel contents, but the fact of "channel existence" itself.

Three positions are thus clear: Wagner unifies channels, Brecht exposes channels, this essay exploits the phase difference between channels. All three address the problem of multiple channels, but in entirely different operational directions.

VI. Formalization and the Tension of Chisel: The Structural Difference Between Master and Artisan

Opera and Chinese opera share a feature that pure music does not: a high degree of formalization.

Peking opera's modal-rhythmic frameworks, role types, and body-work norms are all strictly codified. Western opera has the division between recitative and aria, voice-type casting, and orchestral convention. Kunqu's fixed-tune system prescribes nearly every note. Noh theater's formalization is even more extreme — masks, foot patterns, fan angles are all fixed.

Formalization is an extremely strong construct. It allows the audience's predictive model to be established before the performance even begins — you know how xipi yuanban's rhythm will proceed; you know the aria will end with a high note.

This appears to be the enemy of Unfold. The stronger the formalization, the smaller the space for chiseling. But the truth is precisely the reverse.

The stronger the formalization, the more powerful any micro-chisel within it becomes. Because the audience's predictive model is extremely precise, any minute deviation is immediately perceived. In a framework where anything might happen, a small deviation goes unnoticed. In a framework where every note is prescribed, a half-note deviation is an earthquake.

This is the structural difference between artisan and master.

The artisan completes the formalized closure. Arise-Settle-Fix, clean, complete, impeccable. The audience applauds because the technique is perfect.

The master produces a minimal but real deviation within that closure. Mei Lanfang's single glance — at the moment formalization prescribes that you should look in a certain direction, his gaze hesitates for a fraction of a second. Callas's single breath placement — where everyone else would breathe at the same point, she shifts the breath by half a beat, subtly altering the respiratory structure of a phrase.

These deviations are so small you are barely conscious of them. But your predictive model registers them. You do not know what happened, but you feel "this is different from others." That "difference" is the Unfold the master has produced within the formalized construct. It is small enough not to destroy the formalization, but real enough to produce remainder.

The artisan's performance: watch once and that suffices — because it perfectly confirmed your prediction, with no remainder. The master's performance: watch ten times and there is still something — because those micro-chiselings produced inexhaustible remainder.

This is a verifiable structural diagnosis, not a judgment of taste.

VII. Counter-Example: When "Unfold" Is Systematically Deleted

The preceding counter-examples were at the individual level — a technically perfect but unmoved performer. Now consider a larger-scale counter-example: what happens when a state's power apparatus systematically deletes Unfold?

The twentieth century offers two independent large-scale experiments.

Nazi Germany, 1933–1945. The 1938 "Degenerate Music" (Entartete Musik) exhibition defined Schoenberg's atonality, jazz, and all music deemed "dissonant, chaotic, intellectual" as racial contamination. Only the German Romantic tradition from Beethoven to Bruckner was permitted. Nazi requirements for music were expressed almost entirely in negatives: music must not be dissonant, must not be atonal, must not be twelve-tone, must not be "chaotic," must not be jazz-influenced. What remained? Only Arise-Settle-Fix. Chisel was equated with racial degeneration.

The Soviet Union, 1930s–1950s. Socialist Realism required music to exhibit "narodnost" (populism) — which in practice meant conservative tonal harmony comprehensible to all. "Formalism" was the highest charge. Shostakovich's opera Lady Macbeth of the Mtsensk District was personally banned by Stalin in 1936 because its harmony was too dark, too satirical — in the language of this essay, because it performed genuine chiseling. Shostakovich never again wrote a serious opera or ballet. Chisel was equated with bourgeois corruption.

The two regimes' ideologies were diametrically opposed — one far-right racial nationalism, the other far-left communism. But in music policy they performed the same structural operation: delete Unfold, retain only Arise-Settle-Fix.

This is not coincidence. Authoritarian systems instinctively regard Unfold as a threat, because Unfold breaks the predictive model, and authoritarianism requires everything to be predictable and controllable. Chisel is uncontrollable — you do not know what remainder will expose. A dissonant chord might be merely a dissonant chord, or it might be a questioning of the existing order. Authoritarianism cannot tolerate that uncertainty, so its choice is: define chisel itself as a crime.

The result? Official art under both regimes was highly formalized, technically accomplished, and profoundly tedious. It completed Arise-Settle-Fix perfectly, but no one voluntarily returns to it. It cannot cross cycles because remainder was prevented at the institutional level.

And the irony: the best works under both regimes were precisely those that performed micro-chisel within the censorship framework. Shostakovich's symphonies superficially satisfied Socialist Realist requirements, but in orchestration and harmonic shadow harbored irony and subversion — things you cannot hear on the first listen but begin to sense on the tenth. They crossed cycles. The officially approved works did not.

This is the validation of the Arise-Settle-Unfold-Fix model at the institutional level: you can use state power to delete Unfold, but you cannot delete the human cognitive system's need for chisel. The result of deleting chisel is not better art — it is more boring art.

VIII. From Single Channel to Cross-Channel: The Progression from Essay I to Essay II

Reviewing what has been established.

Essay I proved: within the single auditory channel, Arise-Settle-Unfold-Fix is the universal structure of all music. Enduring music = four steps complete + remainder real and inexhaustible.

Essay II (this essay) has proved: when multiple sensory channels participate simultaneously, the chisel-construct cycle can not only run independently within each channel but also run across channels. Cross-channel chisel produces remainder that exists in no single channel but only in the phase difference between channels. This is a structural resource specific to opera and Chinese opera.

Formalization is not the enemy of chisel but its finest foundation — the more precise the predictive model, the greater the effect of micro-deviation. The structural difference between master and artisan lies not in technique but in whether real Unfold has been produced within the formalized construct.

This raises the next question: if the chisel-construct cycle can run across channels, does it depend on a particular channel combination? If we remove the auditory channel entirely, leaving only bodily movement, can the chisel-construct cycle still hold?

In the next essay, we enter ballet and dance — the body as the primary channel of chisel. When music recedes to the background or vanishes entirely, where lies the difference between military drill and Pina Bausch?