Mégra：当不充分学习的余项开始跳舞

Mégra: When the Remainder of Insufficient Learning Begins to Dance

Han Qin (秦汉)

Niklas Reppel（1983年生于德国维滕，2017年起定居巴塞罗那）花了八年多时间构建一门语言。这门语言叫Mégra，它的核心数据结构是概率有限自动机——变阶马尔可夫链的一种表述。你向它喂入一段序列数据，它学习状态之间的转移概率，然后根据这些概率生成新的序列。听上去像机器学习，但有一个关键区别：Reppel故意只给它极少量的数据。他自己的术语是"小数据"——对硅谷"大数据"构造的刻意倒置。当训练集只有几十个符号时，推断出的模型必然是"错"的：它无法准确复现原始数据的统计结构。但这些不准确恰恰是音乐。不完美不是缺陷，是创作材料本身。

这里的凿构循环有三层。第一层：机器学习的构是"更多数据=更好的模型"。深度学习需要数百万样本。Mégra凿掉了这个前提，只留下极端数据匮乏的条件。当概率有限自动机从区区几十个符号中推断转移矩阵时，产出的不是对原始数据的忠实再现，而是一种变形——原始序列的逻辑被稀释、扭曲、重组。这种变形是学习不充分的余项。整个"小数据"概念就是在说：余项不是副产品，余项是产品本身。第二层：传统即时编程（TidalCycles、SuperCollider、Strudel）的构是"代码=声音的精确规约"——你写什么，机器就精确地演奏什么。Mégra凿掉了这种确定性。因为核心是概率性的，演奏者无法完全预测下一个事件。你塑造的是概率分布和转移权重，不是音符。人与音乐之间的关系变成了推断和影响，而非决定。第三层最激进：在CRISPRave演出中，训练数据是猫和狗的基因组序列。DNA的碱基对（ACGT）被映射为声音事件，马尔可夫链从遗传密码中学习转移概率。基因组从来不是为了成为音乐而存在的——音乐是生物信息被迫穿过概率推断引擎后的余项。Reppel在舞台上实时编辑基因序列以控制声音的复杂度——因此取名CRISPRave，像CRISPR基因编辑一样，但产出的不是修改过的生物体，而是算法锐舞。

Mégra刻意不是图灵完备的。它没有音阶、和弦、调性系统，也没有帮助你处理功能和声的工具——Reppel自己写道："也许永远不会有。"这不是功能缺失，是一种美学立场：拒绝让语言变成构。当一门语言开始提供完整的音乐理论基础设施时，它就变成了已构之物，变成了过去某个凿构循环的沉淀物。Mégra选择保持不完整，让逻辑继续在概率的缝隙中生长。它最近从GitHub迁移到了Codeberg——连基础设施都在迁徙中。Rust实现仍在迭代。2026年还有两场演出在前面等着：六月的爱尔兰LAC26，八月的加泰罗尼亚DIGIT。

命名间隙宽得惊人。Mégra不是机器学习（数据集太小，没有优化目标）。不是算法作曲（不是确定性的，不是基于规则的）。不是生成音乐（演奏者在实时塑造它）。不是传统意义上的即时编程（你写的不是旋律和节奏）。不是声音艺术（它在algorave上演奏，人们跳舞）。不是数据声化（目的是音乐，不是数据呈现）。不是即兴音乐（它是代码，不是手势）。Reppel自己的术语——"小数据音乐创作""非确定性即时编程"——是复合短语，不对应任何已有范畴。现在看到它比以后看到它更重要，因为一旦概率性即时编程被完整命名和范畴化，Mégra的余项就会变成已构——它此刻的不确定性、不完整性、不可预测性才是活的。

parkellipsen.de ↗

Niklas Reppel (born 1983 in Witten, Germany; based in Barcelona since 2017) has spent over eight years building a language. It is called Mégra, and its core data structure is the Probabilistic Finite Automaton — a formulation of variable-order Markov chains. You feed it a sequence of data; it learns the transition probabilities between states; it generates new sequences from those probabilities. This sounds like machine learning, but there is a crucial difference: Reppel deliberately feeds it very little data. His own term is "small data" — a deliberate inversion of Silicon Valley's "big data" construct. When the training set is only a few dozen symbols, the inferred model is necessarily "wrong": it cannot accurately reproduce the statistical structure of the original data. But these inaccuracies are precisely the music. Imperfection is not a defect — it is the creative material itself.

The chisel-construct cycle here operates on three levels. First: machine learning's construct is "more data = better models." Deep learning demands millions of samples. Mégra chisels away this premise, leaving only the condition of extreme data scarcity. When a probabilistic finite automaton infers a transition matrix from a mere handful of symbols, what emerges is not a faithful reproduction of the original data but a deformation — the original sequence's logic diluted, twisted, recombined. This deformation is the remainder of insufficient learning. The entire "small data" concept says: the remainder is not the byproduct; the remainder is the product. Second: traditional live coding (TidalCycles, SuperCollider, Strudel) has a construct of "code = precise specification of sound" — what you write, the machine plays exactly. Mégra chisels away this determinism. Because the core is probabilistic, the performer cannot fully predict the next event. You shape probability distributions and transition weights, not notes. The relationship between human and music becomes one of inference and influence, not determination. The third level is the most radical: in the CRISPRave performance, the training data is cat and dog genome sequences. DNA base pairs (ACGT) are mapped to sound events; the Markov chain learns transition probabilities from genetic code. The genome was never meant to be music — the musicality is the remainder of biological information forced through a probabilistic inference engine. Reppel edits the gene sequence live on stage to control the complexity of the sound — hence the name CRISPRave, like CRISPR gene editing, but the output is not a modified organism but an algorithmic rave.

Mégra is deliberately not Turing-complete. It has no scales, chords, tuning systems, and no helpers for functional harmony — Reppel himself writes: "Maybe there never will be." This is not a missing feature; it is an aesthetic stance: a refusal to let the language become construct. When a language begins providing complete music-theory infrastructure, it becomes an already-constructed thing, the sediment of some past chisel-construct cycle. Mégra chooses to remain incomplete, letting its logic continue growing in the interstices of probability. It recently migrated from GitHub to Codeberg — even the infrastructure is in transit. The Rust implementation is still iterating. Two more performances wait ahead in 2026: LAC26 in Maynooth, Ireland in June; DIGIT in Pratdip, Catalonia in August.

The naming gap is strikingly wide. Mégra is not machine learning (the dataset is too small, there is no optimization objective). Not algorithmic composition (it is not deterministic, not rule-based). Not generative music (the performer shapes it in real time). Not live coding in the traditional sense (you do not write melodies or rhythms). Not sound art (it is performed at algoraves; people dance). Not data sonification (the goal is music, not data representation). Not improvised music (it is code, not gesture). Reppel's own terms — "small data music composition," "non-deterministic live coding" — are compound phrases that map to no single existing category. Seeing it now matters more than seeing it later, because once probabilistic live coding is fully named and categorized, Mégra's remainder will become construct — its present uncertainty, incompleteness, and unpredictability are what is alive.

parkellipsen.de ↗