「LLM, Reasoning」论文 L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning 智慧不在于一味求索，而在于懂得根据问题的复杂性，伸缩思考的深度。这篇论文非常出色，在test-time scaling的潮流中，直面了它的主要问题：即模型在推理过程中过于缓慢、冗长。 Test-time

「LLM, Reasoning」论文
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

智慧不在于一味求索，而在于懂得根据问题的复杂性，伸缩思考的深度。

这篇论文非常出色，在test-time scaling的潮流中，直面了它的主要问题：即模型在推理过程中过于缓慢、冗长。

Test-time scaling的问题源于其自身特点——刻意增加LLM的推理长度，可以提升模型解决复杂问题的能力。

由于强化学习（RL）对scaling的鼓励，所谓的“aha moment”让模型倾向于过度地展示其思维过程：“一方面，另一方面，aha，wait，what if...”。模型似乎时刻用超高的latency考验用户的耐心。

这篇论文提出了LCPO（Length Controlled Policy Optimization，长度可控策略优化）的方法。

作者同样使用RL对模型进行优化，其核心是设计一个平衡准确性和长度遵循性的奖励函数，旨在训练语言模型在保持推理准确性的同时，尽量满足提示中对长度的要求。

论文最大的亮点：模型能够根据提示中给出的要求，自适应地控制推理长度，从而有效节约计算资源。

两点思考：

Test-time scaling的方法特别适用于复杂的数学问题。但普通用户在使用这些模型时，有多少场景是在解决复杂的数学问题？大型模型公司完全可以借鉴这一方法，根据用户的query自动决定模型推理的合适长度。

强化学习带有一种强烈的rule-based（基于规则）的特点。这种特性会放大贴合规则的效果，但也容易忽略规则之外的因素。规则即是限制，因此阅读RL相关的文章时，我总有种“LLM在顾此失彼”的感觉。

OpenAI的Shunyu说：“RL finally works.”
我不完全同意。我认为更准确地说，RL finally works with specific rules.

点击图片查看原图

1周内 1个月内 1年内全部时间

更强的reasoning，更好的Agent
论文分享： Thinking Machines: A Survey of LLM based Reasoning Strategies
在我们开发Agent的项目的时候，需要更好的LLM reasoning的能力，以获得更高的任务完成准确率。
那么有哪些方法可以增强LLM的reasoning能力呢？
沿着之前我分享的Testing time
时政
( twitter.com)

9个月前 • 马东锡 NLP 🇸🇪 • -- 点击 0 评论

「LLM x RL」DeepSeek 最新论文：Inference-Time Scaling for Generalist Reward Modeling
在 RL 中，Reward Modeling（RM）是一个非常重要的部分。RM 主要用于对 LLM 的生成结果进行打分，从而调整 LLM 的 policy，使其更符合 RM 设定的要求，比如更强的 reasoning 能力。
时政
( twitter.com)

8个月前 • 马东锡 NLP 🇸🇪 • -- 点击 0 评论

大语言模型 post-training 的变迁，从 Large Language Model (LLM) 到 Large Reasoning Model (LRM)
本周推荐论文：POST-TRAINING OF LARGE LANGUAGE MODELS
Post-training，本质是在做一件事，即如何运用 LLM 的 pretrained knowledge 来解决实际任务，具体的方法如 supervised
时政
( twitter.com)

9个月前 • 马东锡 NLP 🇸🇪 • -- 点击 0 评论

HuatuoGPT-o1 🏥 a medical LLM designed for medical reasoning released by CUHK shenzhen
Model:
Code:
Data:
Paper:
✨ 7B/8B/70B/72B
8B&70B supports English
时政
( twitter.com)

11个月前 • Adina Yakup • -- 点击 0 评论

LLM 有多火？
连投我们公司的 VC 投资人都下场写代码，复现 LLM 论文了
欢迎关注 Stanford Generative Agents 的论文复现


推特中文圈
( github.com)

2年前 • Ce Gao • -- 点击 0 评论

一份反炒作 LLM 阅读清单：从背景、基础论文、训练 LLM、再到部署评估等（评论区也很精彩）。

IT技术
( gist.github.com)

2年前 • lencx • -- 点击 0 评论

OpenAI o1 强化微调（RFT）开源方案之字节 ReFT
因工作重点做LLM的落地，对模型的 Reasoning 推理能力要求较高，也实践过 CoT 微调。而 o1 能推出 RFT 证明这项技术已经生产可用，故接下来就认真研究下业界方案，尤其关注可落地执行的开源方案。
首个拜读的论文是来自字节的《ReFT: Reasoning with
时政
( twitter.com)

1年前 • 九原客 • -- 点击 0 评论

No.
It shows that RL alone can lead to the emergence of reasoning.
It’s a profound discovery. It’s now one of the realistic path to AGI. Anyone who had doubts that LLM is just a “stochastic parrot” can now shush.
Deepseek R1
时政
( twitter.com)

11个月前 • Eric Xu (e/Mettā) • -- 点击 0 评论

Grok 3 might be the best base LLM for real-world physics!
Prompt: "write a python script of a ball bouncing inside a spinning tesseract".
There is no "thinking" or "big brain" mode enabled, it's just the base model. I'm very interested in trying their reasoning models.
btc
( twitter.com)

10个月前 • Yuchen Jin • -- 点击 • 下载视频 0 评论

00:00:08

「LLM， Agent, RL的关系」
在LLM的语境下，Agent是能理解问题，自主进行推理（Reasoning），并采取行动的系统。你可以把它想象成一个非常聪明的助手，当你提出复杂问题时，它不会立即给出答案，而是会在内心进行推理和规划（Planning），再给出最终决定。
如果我们回顾prompt engineering中提高LLM
时政
( twitter.com)

8个月前 • 马东锡 NLP 🇸🇪 • -- 点击 0 评论

Congrats to for releasing Falcon-180B! I applaud every OSS LLM.

Though it’s beyond me why code is only 5% in the training mix. It is by far the most useful data to boost reasoning, master tool use, and power AI agents. In fact, GPT-3.5 is finetuned from a Codex base.

I…
时政
( twitter.com)

2年前 • Jim Fan • -- 点击 0 评论

AI iPhone前奏？苹果发表论文，提出“在手机内存上运行LLM（大语言模型）的方法”
大陆资讯
( wallstreetcn.com)

2年前 • 拉拉么 • -- 点击 0 评论

闪电预览

读了两篇论文，LLM中思维链（Chain-of-Thought）的推理能力和涌现能力（Emergent Abilities）。思维链所展现出的commonsense推理能力尤其让我震惊。认为LLM掌握的只不…
推特中文圈
( twitter.com)

2年前 • Catmus 夹喵又 @[email protected] • -- 点击 0 评论

1/ 有趣论文分享

ViperGPT: Visual Inference via Python Execution for Reasoning

这是一篇简单的大模型组合胶水文章，但作者脑回路十分清奇：

把自然语言的问题（qu…
推特中文圈
( twitter.com)

2年前 • Sverige_ Dong-seok🇸🇪 • -- 点击 0 评论

美国博士小哥打败女友的AI男友，7页论文让LLM降智，训出「负分男友」成功挽回
大陆资讯
( www.ithome.com)

1年前 • 投票吧骚年 • -- 点击 0 评论

闪电预览

Grok has outstanding reasoning
btc
( twitter.com)

4个月前 • Elon Musk • -- 点击 0 评论

Large Reasoning Model时代, 几乎等于Reinforcement Learning + LLM的时代。
但RL专业性非常强，去参加ML的会议时，专门做RL的研究员都现场拿着笔纸推算数学公式，掌握起来学习难度较高。
分享一本RL的入门教材，从RL基础MDP，PPO，直到跟LLM结合，如RLHF，都有讲解，深入浅出。
Reinforcement
时政
( twitter.com)

9个月前 • 马东锡 NLP 🇸🇪 • -- 点击 0 评论

「Agent」论文：Executable Code Actions Elicit Better LLM Agents
从 ReAct 到 CodeAct
如果让我在所有 LLM 论文中选择我最喜欢的一篇，2022 年的 ReAct 绝对是前三名之一。
ReAct 大道至简，天才般地将复杂的强化学习（RL）过程，通过口头表达的方式表现出来，至今依然是 Agent
时政
( twitter.com)

9个月前 • 马东锡 NLP 🇸🇪 • -- 点击 0 评论

【中日韩三国说英语的口音真的不一样】中日韩三国都是基于自己的母语来处理英文中复杂的R和L发音：日文借用らりるれろ的发音，将R音发成了英文的L音；韩语则用ㄹ的发音来对应，但无法处理好模糊音L；汉语用拼音中的r和l来对应英文r和l的发音，对模糊音L也会有处理不好的问题。
大陆资讯
( weibo.com)

6年前 • 微博 • -- 点击 0 评论

「Multi-Agent, Reasoning」论文
FlowReasoner: Reinforcing Query-Level Meta-Agents
轻云顺风即变，FlowReasoner 使 multi-agent workflow 随query应变于瞬息之间。
这篇论文十分精彩，作者瞄准“one system per user query”的目标：为每一条用户 query 即时推理出一个专属的multi-agent
时政
( twitter.com)

8个月前 • 马东锡 NLP 🇸🇪 • -- 点击 0 评论

时政

时政

时政

时政

HuatuoGPT-o1 🏥 a medical LLM designed for medical reasoning released by CUHK shenzhen
Model:
Code:
Data:
Paper:
✨ 7B/8B/70B/72B
8B&70B supports English
时政
( twitter.com)

时政

LLM 有多火？
连投我们公司的 VC 投资人都下场写代码，复现 LLM 论文了
欢迎关注 Stanford Generative Agents 的论文复现


推特中文圈
( github.com)

推特中文圈

一份反炒作 LLM 阅读清单：从背景、基础论文、训练 LLM、再到部署评估等（评论区也很精彩）。

IT技术
( gist.github.com)

IT技术

时政

No.
It shows that RL alone can lead to the emergence of reasoning.
It’s a profound discovery. It’s now one of the realistic path to AGI. Anyone who had doubts that LLM is just a “stochastic parrot” can now shush.
Deepseek R1
时政
( twitter.com)

时政

Grok 3 might be the best base LLM for real-world physics!
Prompt: "write a python script of a ball bouncing inside a spinning tesseract".
There is no "thinking" or "big brain" mode enabled, it's just the base model. I'm very interested in trying their reasoning models.
btc
( twitter.com)

btc

时政

Congrats to for releasing Falcon-180B! I applaud every OSS LLM.

Though it’s beyond me why code is only 5% in the training mix. It is by far the most useful data to boost reasoning, master tool use, and power AI agents. In fact, GPT-3.5 is finetuned from a Codex base.

I…
时政
( twitter.com)

时政

AI iPhone前奏？苹果发表论文，提出“在手机内存上运行LLM（大语言模型）的方法”
大陆资讯
( wallstreetcn.com)

大陆资讯

读了两篇论文，LLM中思维链（Chain-of-Thought）的推理能力和涌现能力（Emergent Abilities）。思维链所展现出的commonsense推理能力尤其让我震惊。认为LLM掌握的只不…
推特中文圈
( twitter.com)

推特中文圈

1/ 有趣论文分享

ViperGPT: Visual Inference via Python Execution for Reasoning

这是一篇简单的大模型组合胶水文章，但作者脑回路十分清奇：

把自然语言的问题（qu…
推特中文圈
( twitter.com)

推特中文圈

美国博士小哥打败女友的AI男友，7页论文让LLM降智，训出「负分男友」成功挽回
大陆资讯
( www.ithome.com)

大陆资讯

Grok has outstanding reasoning
btc
( twitter.com)

btc

时政

时政

大陆资讯

时政

在这里可以跟进多模态LLM最新动态，，包含论文和数据集相关内容，作者每天都在更新。

不过这些长长的论文读起来很费劲，可以试一试，让AI…
IT技术
( twitter.com)

IT技术

结合正在学习的吴恩达三门免费 LLM 课程，这个插件对我这样的论文爱好者简直启发太大了

在 LangChain 一课中，大量讲述了如何链式调用 LLM API, 构成一个完整闭环…
IT技术
( twitter.com)

IT技术

Reasoning from first principles is a superpower
btc
( twitter.com)

btc

As usual, excellent reasoning and judgment from
btc
( twitter.com)

btc

时政

时政

时政

时政

HuatuoGPT-o1 🏥 a medical LLM designed for medical reasoning released by CUHK shenzhen Model: Code: Data: Paper: ✨ 7B/8B/70B/72B 8B&70B supports English 时政 ( twitter.com)

时政

LLM 有多火？ 连投我们公司的 VC 投资人都下场写代码，复现 LLM 论文了 欢迎关注 Stanford Generative Agents 的论文复现 推特中文圈 ( github.com)

推特中文圈

一份反炒作 LLM 阅读清单：从背景、基础论文、训练 LLM、再到部署评估等（评论区也很精彩）。 IT技术 ( gist.github.com)

IT技术

时政

No. It shows that RL alone can lead to the emergence of reasoning. It’s a profound discovery. It’s now one of the realistic path to AGI. Anyone who had doubts that LLM is just a “stochastic parrot” can now shush. Deepseek R1 时政 ( twitter.com)

时政

Grok 3 might be the best base LLM for real-world physics! Prompt: "write a python script of a ball bouncing inside a spinning tesseract". There is no "thinking" or "big brain" mode enabled, it's just the base model. I'm very interested in trying their reasoning models. btc ( twitter.com)

btc

时政

Congrats to for releasing Falcon-180B! I applaud every OSS LLM. Though it’s beyond me why code is only 5% in the training mix. It is by far the most useful data to boost reasoning, master tool use, and power AI agents. In fact, GPT-3.5 is finetuned from a Codex base. I… 时政 ( twitter.com)

时政

AI iPhone前奏？苹果发表论文，提出“在手机内存上运行LLM（大语言模型）的方法” 大陆资讯 ( wallstreetcn.com)

大陆资讯

读了 两篇论文，LLM中思维链（Chain-of-Thought）的推理能力和涌现能力（Emergent Abilities）。思维链所展现出的commonsense推理能力尤其让我震惊。认为LLM掌握的只不… 推特中文圈 ( twitter.com)

推特中文圈

1/ 有趣论文分享 ViperGPT: Visual Inference via Python Execution for Reasoning 这是一篇简单的大模型组合胶水文章，但作者脑回路十分清奇： 把自然语言的问题（qu… 推特中文圈 ( twitter.com)

推特中文圈

美国博士小哥打败女友的AI男友，7页论文让LLM降智，训出「负分男友」成功挽回 大陆资讯 ( www.ithome.com)

大陆资讯

Grok has outstanding reasoning btc ( twitter.com)

btc

时政

时政

大陆资讯

时政

在这里可以跟进多模态LLM最新动态，，包含论文和数据集相关内容，作者每天都在更新。 不过这些长长的论文读起来很费劲，可以试一试 ，让AI… IT技术 ( twitter.com)

IT技术

结合正在学习的吴恩达三门免费 LLM 课程，这个插件对我这样的论文爱好者简直启发太大了 在 LangChain 一课中，大量讲述了如何链式调用 LLM API, 构成一个完整闭环… IT技术 ( twitter.com)

IT技术

Reasoning from first principles is a superpower btc ( twitter.com)

btc

As usual, excellent reasoning and judgment from btc ( twitter.com)

btc

创建一个新帐户

登录

HuatuoGPT-o1 🏥 a medical LLM designed for medical reasoning released by CUHK shenzhen
Model:
Code:
Data:
Paper:
✨ 7B/8B/70B/72B
8B&70B supports English
时政
( twitter.com)

LLM 有多火？
连投我们公司的 VC 投资人都下场写代码，复现 LLM 论文了
欢迎关注 Stanford Generative Agents 的论文复现

推特中文圈
( github.com)

一份反炒作 LLM 阅读清单：从背景、基础论文、训练 LLM、再到部署评估等（评论区也很精彩）。

IT技术
( gist.github.com)

No.
It shows that RL alone can lead to the emergence of reasoning.
It’s a profound discovery. It’s now one of the realistic path to AGI. Anyone who had doubts that LLM is just a “stochastic parrot” can now shush.
Deepseek R1
时政
( twitter.com)

Grok 3 might be the best base LLM for real-world physics!
Prompt: "write a python script of a ball bouncing inside a spinning tesseract".
There is no "thinking" or "big brain" mode enabled, it's just the base model. I'm very interested in trying their reasoning models.
btc
( twitter.com)

Congrats to for releasing Falcon-180B! I applaud every OSS LLM.

Though it’s beyond me why code is only 5% in the training mix. It is by far the most useful data to boost reasoning, master tool use, and power AI agents. In fact, GPT-3.5 is finetuned from a Codex base.

I…
时政
( twitter.com)

AI iPhone前奏？苹果发表论文，提出“在手机内存上运行LLM（大语言模型）的方法”
大陆资讯
( wallstreetcn.com)

读了两篇论文，LLM中思维链（Chain-of-Thought）的推理能力和涌现能力（Emergent Abilities）。思维链所展现出的commonsense推理能力尤其让我震惊。认为LLM掌握的只不…
推特中文圈
( twitter.com)

1/ 有趣论文分享

ViperGPT: Visual Inference via Python Execution for Reasoning

这是一篇简单的大模型组合胶水文章，但作者脑回路十分清奇：

把自然语言的问题（qu…
推特中文圈
( twitter.com)

美国博士小哥打败女友的AI男友，7页论文让LLM降智，训出「负分男友」成功挽回
大陆资讯
( www.ithome.com)

Grok has outstanding reasoning
btc
( twitter.com)

在这里可以跟进多模态LLM最新动态，，包含论文和数据集相关内容，作者每天都在更新。

不过这些长长的论文读起来很费劲，可以试一试，让AI…
IT技术
( twitter.com)

结合正在学习的吴恩达三门免费 LLM 课程，这个插件对我这样的论文爱好者简直启发太大了

在 LangChain 一课中，大量讲述了如何链式调用 LLM API, 构成一个完整闭环…
IT技术
( twitter.com)

Reasoning from first principles is a superpower
btc
( twitter.com)

As usual, excellent reasoning and judgment from
btc
( twitter.com)