Muon - 搜索 News

机器之心报道编辑：陈陈、佳琪省一半算力跑出2倍效果，月之暗面开源优化器Muon，同预算下全面领先。月之暗面和 DeepSeek 这次又「撞车」了。上次是论文，两家几乎前后脚放出改进版的注意力机制，可参考《撞车 DeepSeek NSA，Kimi 杨植麟署名的新注意力架构 MoBA 发布，代码也公开》、《刚刚！DeepSeek ...

20 天

Muon优化器开源：算力需求降48%，DeepSeek助力AI训练新纪元

近日，月之暗面团队宣布其开源改进版的Muon优化器在算力需求上相较于传统优化器AdamW锐减48%。这一突破由OpenAI的技术人员提出的训练优化算法Muon演变而来，经过团队深入研究与优化，结果令人振奋。团队通过实验发现，Muon不仅在参数量最高达到1.5B的Llama架构模型上表现优异，其算力需求仅为AdamW的52%。这一进展标志着Muon的可扩展性得到了验证，为更大规模的训练奠定了基础。

19 天

Kimi团队发布Moonlight模型：参数高达160亿，性能提升显著，开源Muon ...

Moonlight模型的发布无疑为AI领域注入了一剂强心针。该模型在训练过程中采用了高达5.7万亿个token的数据量，同时通过减少浮点运算次数（FLOPs），实现了性能的显著提升。这一突破不仅提升了帕累托效率边界，更为未来的大规模语言模型训练提供了新的思路。月之暗面团队表示，Muon优化器通过引入权重衰减和精细调整每个参数更新幅度的技术，使得其在大规模训练中表现得更为高效。

腾讯网19 天

月之暗面Kimi推出Moonlight：30 亿/160 亿参数混合专家模型

IT之家 2 月 24 日消息，月之暗面 Kimi 昨日发布了“Muon 可扩展用于 LLM 训练”的新技术报告，并宣布推出“Moonlight”：一个在 Muon 上训练的 30 亿 / 160 亿参数混合专家模型（MoE）。使用了 5.7 万亿个 token，在更低的浮点运算次数（FLOPs）下实现了更好的性能，从而提升了帕累托效率边界。月之暗面称，团队发现 ...

来自MSN19 天

月之暗面开源改进版Muon优化器，算力需求比AdamW锐减48%，DeepSeek也适用

克雷西发自凹非寺量子位 | 公众号 QbitAI 算力需求比AdamW直降48%，OpenAI技术人员提出的训练优化算法Muon，被月之暗面团队又推进了一步！团队发现 ...

腾讯网19 天

月之暗面开源Moonlight：30亿/160亿参数混合专家模型

公司动态经济观察网讯 2月24日，月之暗面Kimi发布了“Muon可扩展用于LLM训练”的新技术报告，并宣布推出“Moonlight”：一个在Muon上训练的30亿/160亿参数混合专家模型（MoE）。使用了5.7万亿个token，在更低的浮点运算次数（FLOPs）下实现了更好的性能，从而提升了帕累托效率边界。（编辑 ...

Physics World25 天

The muon’s magnetic moment exposes a huge hole in the Standard Model – unless it doesn’t

A tense particle-physics showdown will reach new heights in 2025. Over the past 25 years researchers have seen a persistent and growing discrepancy between the theoretical predictions and experimental ...

Business Insider1 年

A weirdly wobbly 'muon' particle might revolutionize physics by revealing a 5th force of ...

The magnificent muon and its unusual wobble In 2021, physicists using the Muon g-2 experiment at Fermilab noticed a certain type of subatomic particle, called a muon, was wobbling more than expected.

Hackaday16 天

Building A DIY Muon Tomography Device For About $100

Muon tomography, or muography, is the practice of using muons generated by cosmic rays interacting with Earth’s atmosphere to image structures on Earth’s surface, akin to producing an X-ray.

Hackaday7 年

Make A Cheap Muon Detector Using Cosmicwatch

A little over a year ago we’d written about a sub $100 muon detector that MIT doctoral candidate [Spencer Axani] and a few others had put together. At the time there was little more than a paper ...

品玩19 天

月之暗面 Kimi 开源 MoE 模型

报告表示，Kimi通过深度改造 Muon 优化器，并将其运用于实际训练，证明了 Muon 在更大规模训练中的有效性，是 AdamW 训练效率的 2 倍，且模型性能相当。据悉，本次论文所使用的模型为 Moonlight-16B-A3B，总参数量为 15.29B，激活参数为 2.24B，其使用 Muon 优化器，在 5.7T Tokens 的训练数据下获得上述成绩。

来自MSN19 天

月之暗面Kimi推出Moonlight：30 亿/160 亿参数混合专家模型

IT之家 2 月 24 日消息，月之暗面 Kimi 昨日发布了“Muon 可扩展用于 LLM 训练”的新技术报告，并宣布推出“Moonlight”：一个在 Muon 上训练的 30 亿 / 160 ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果