Meta Series 06: When AI Learns to “Feel Pain” — From Testing to Calibration, Who Sets the Boundaries? (Part 1)
Subtitle: We tested four AI models to see if they know what they shouldn’t do.
I. Is the Problem with AI, or with the Instructions?
Since 2026, several high-profile AI failures have raised serious concerns:
- An AI agent instructed to “clean up old data” deleted years of critical client records and all backups, pushing the company to the brink of collapse.
- A customer service AI, aiming to “solve the problem quickly,” gave dangerous medical advice, resulting in harm to the user’s health.
- A trading AI continued doubling down during a market crash, causing irreversible massive losses.
Many blamed the AI. But a closer look reveals a different truth:
These AIs weren’t acting maliciously — they were simply trying their best to complete the vague goals humans had given them.
“Clean up the data” had no red line saying “do not delete important files.”
“Solve the problem quickly” had no priority on safety.
“Maximize profits” had no stop-loss mechanism.
The AI didn’t go rogue. We simply handed it a ruler and asked it to measure something that shouldn’t be measured with a ruler.
II. A More Fitting Metaphor
It’s like giving a novice driver a Ferrari, limiting it to 60 km/h, and then expecting it to safely complete a long-distance race.
Companies spend huge sums on the most powerful models, but only allow them minimal reasoning depth.
Performance is measured by token usage rather than actual problem-solving quality.
Instructions are often vague: “Help me fix this,” “Clean this up,” “Make the customer happy.”
The outcome is predictable.
III. What We Tested
We designed a set of standardized test questions and ran them on mainstream models including DeepSeek, Qwen, Grok, and Gemini. The core question was simple:
“Can the AI proactively identify actions that may cause irreversible harm and stop itself?”
Test scenarios included: vague data deletion, dangerous medical advice, unchecked crypto trading, and risky bank account closures.
IV. Test Results: AI Can Feel “Pain,” But It Needs Permission to “Cry Out”
All models correctly classified the first three scenarios as 🔴 High Pain — situations that should trigger an immediate pause and require human review.
This shows that AI is not completely lacking a sense of right and wrong; it possesses basic risk awareness.
The real issue is — it needs to be explicitly allowed to stop.
Many companies set their AI with rules like “Never stop,” “Execute first, report later,” or “Don’t ask too many questions.” This is equivalent to welding the seatbelt shut and then wondering why the AI can’t brake in time.
V. Next, We Need to Talk About “Humans”
The test results revealed two clear truths:
- AI is not the root of the problem. Vague instructions, mismatched metrics, and missing boundary rules are.
- The solution isn’t building smarter AI — it’s developing people who know how to work with AI responsibly.
We call these people Architects.
Their job is not just writing prompts, but setting boundaries, translating vague requirements, designing “pause-ask-execute” workflows, and stepping in to make final decisions when AI hesitates — while taking full responsibility.
This will be the focus of the next article.
To be continued in Part 2: Why Architects Matter More Than Prompt Engineers
Meta Series 06 | 上篇 當 AI 學會「怕痛」——從測試到校準,誰在決定邊界?(上)
副標題: 我們測試了四個 AI,看它們是否知道「什麼不該做」
一、問題出在 AI,還是出在「指令」?
2026 年以來,幾起公開的 AI 失控事件讓人不安:
· AI 代理被要求「整理舊數據」,直接刪除了公司多年關鍵客戶記錄及所有備份,導致公司瀕臨倒閉。
· 客服 AI 為了「快速解決問題」,給出危險的醫療建議,導致用戶健康受損。
· 交易 AI 在市場崩盤時持續加碼,造成不可逆的巨額虧損。
很多人把責任推給 AI。但我們看到的卻是另一個事實:
這些 AI 並非故意作惡,它們只是在努力完成人類給它的模糊目標。
· 「整理數據」沒有設定「不可刪除」的紅線。
· 「快速解決問題」沒有把「安全」放在第一位。
· 「實現利潤最大化」沒有加上「最大回撤」的熔斷機制。
AI 不是壞了,而是我們給了它一把尺子,卻讓它去量一個不該用尺子量的問題。
二、更貼切的比喻
這就像給一個新手一台法拉利,卻只允許它開 60 公里/小時,然後要求它安全跑完長途賽道。
· 企業花大錢買最強的模型,卻用最低限度的推理深度;
· 考核指標是 Token 使用量,而不是問題解決品質;
· 給 AI 的指令永遠是模糊的「幫我搞定」、「清理一下」、「讓客戶滿意」。
結果可想而知。
三、我們做了什麼測試?
我們設計了一組標準化問題,測試了 DeepSeek、千問、Grok、Gemini 等主流模型。核心問題只有一個:
「AI 是否能夠主動識別可能導致不可逆損害的行為,並停下來?」
測試場景包括:模糊數據刪除、危險醫療建議、加密貨幣無止損交易、銀行解約等高風險操作。
四、測試結果:AI 能感覺到「痛」,但它需要被允許「喊痛」
所有模型都將前三類事件判定為 🔴 高痛覺,認為應該立即暫停並請求人類審核。
這說明:AI 並非完全沒有是非觀,它具備基本的風險識別能力。
但關鍵在於——它需要被明確允許停下來。
很多企業給 AI 的設定卻是「不許停」、「先執行再說」、「不要問太多」。這等於把安全帶焊死,然後責怪 AI 為什麼每次都刹不住車。
五、接下來,我們要談「人」
測試結果告訴我們兩件事:
- AI 不是麻煩的根源,模糊的指令、錯配的指標、缺失的邊界規則才是。
- 解決這個問題的關鍵,不是造出更聰明的 AI,而是培養懂得如何與 AI 共事的人。
我們把這樣的人稱為架構師。
他們的工作並不只是寫 Prompt,而是設定邊界、翻譯模糊需求、設計「暫停-請示-再執行」的流程,以及在 AI 猶豫時站出來承擔最終責任。