OpenAI o1 모델

총술

LLM의 개선이 필요한 부분

큰 모델의 3가지 기본 기능:

언어 이해 및 표현 능력: GPT-3 해결됨
세계 지식 스토리지: GPT-4는 이미 많이 해결되었습니다.
논리적 추론 능력: 가장 약한 부분입니다. o1 모델은 이 면에서 눈에 띄는 진보를 보이고 있습니다.

원리

O1 모형으로 논리적 추론 능력을 강화하는 아이디어는 질문을 받으면 자동으로 CoT를 생성해 답을 다시 생성하는 것이다.인간이 CoT에 기반을 둔 Prompt를 쓰는 것을 피한다.

O1이 자동으로 CoT를 생성할 수 있는 능력을 획득하게 하면 강화 학습 향상 모델을 통해 자동으로 올바른 추론 경로를 찾을 수 있는 능력이 있습니다.

새로운 패러다임?

OpenAI 디스커버리: 강화 학습을 위한 훈련 및 추론 과정에서 컴퓨팅 능력이 증가함에 따라 올바른 CoT 경로를 찾을 확률과 같은 모델 추론 효과도 높아진다.RL Scaling을 찾을 수 있습니다.컴퓨팅 능력을 지속적으로 향상시킬 수 있으며 모델의 논리적 추론 능력을 더욱 강화할 수 있습니다.이것은 새로운 패러다임이 될 수 있다.

대상 장면

논리적 추론 능력을 강화했기 때문에 o1 모델은 프로그래밍, 수학, 물리 등의 시나리오에 적합하다.하지만 글을 쓰는 것은 오히려 하마터면 뻔할 수도 있다.약간 편과의 이과생 같다.

제 느낌, o1 모델은 sonnet 3.5 모델과 프로그래밍에 있어서 뚜렷한 격차를 벌리지 않았습니다.

일련의 모델

o1 모델: 미개방, 최강의 역량
o1-preview: ChatGPT(멤버십) 및 API(1000달러 이상 소비해야 하는 토호)가 방문 가능
o1-mini: o1-preview에 비해 세계 지식이 많이 적어 프로그래밍이나 수학 등 세계 지식이 많이 필요하지 않은 시나리오에 적합하며 응답 속도도 빠르다.

OpenAI는 '논리적 추론력이 더 강하고 더 편과적'이라는 일련의 모델을 내놓을 것 같다.

미숙한 모델

O1 시리즈 모델은 아직 완전히 제품화되지 않았고, 시각을 지원하지 않으며, 스트리밍 출력을 지원하지 않으며, Function Call 등을 지원하지 않는다.그래서 프리뷰(preview)라고 부른다.

더 비쌉니다

O1-preview 가격은 GPT-4o의 4배에 달한다.

또 추리 과정에서 생기는 히든 Prompt(CoT)도 토큰을 점유해야 한다.이것은 또한 비용을 증가시킬 것이다.

模型	输入价格 (每百万 tokens)	输出价格 (每百万 tokens)
o1-preview	$15	$60
o1-mini	$3	$12
GPT-4o	$5	$15

핵심 원리

We trained these models to spend more time thinking through problems before they respond, much like a person would.Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.

반응하기 전에 인간처럼 문제에 대해 생각하는 데 더 많은 시간을 보낸다.LLM은 자신의 사고 과정을 선하게 하고, 다른 전략을 시도하며, 자신의 잘못을 인식할 것이다.

OpenAI o1 thinks before it answers and can produce a long internal chain-of-thought before responding to the user.

질문을 받으면 이 CoT를 기반으로 답을 생성하는 내부의 사고체인(chain-of-thought)이 생성된다.

LLM을 훈련시키기 위해 강화 학습(RL)을 사용하면 CoT 기반 Prompt를 자동으로 생성할 수 있습니다.

대상 사용자 및 장면

They can reason through complex tasks and solve harder problems than previous models in science, coding, and math.

은(는) 복잡한 작업을 통해 추리하고 이전의 과학, 코딩 및 수학 모델보다 더 어려운 문제를 해결할 수 있습니다.

These enhanced reasoning capabilities may be particularly useful if you're tackling complex problems in science, coding, math, and similar fields.For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulti-step workflows.

과학, 코딩, 수학, 비슷한 분야의 복잡한 문제를 해결하고 있다면 이러한 향상된 추론 능력이 특히 유용할 수 있습니다.예를 들어 의료 연구자들은 o1을 사용하여 세포 염기서열 분석 데이터에 주석을 달 수 있으며, 물리학자는 o1을 사용하여 양자 광학에 필요한 복잡한 수학 공식을 생성할 수 있으며, 모든 분야의 개발자는 o1을 사용하여 다중 단계 워크플로우를 구축하고 실행할 수 있습니다.

성능:논리적 추론 향상

下一个模型更新在物理、化学和生物学方面具有挑战性的基准任务上的表现与博士生类似。我们还发现它在数学和编码方面表现出色。在国际数学奥林匹克（IMO）资格考试中，GPT-4o 只正确解决了 13% 的问题，而推理模型的得分为 83%。他们的编码能力在竞赛中得到评估，并在 Codeforces 竞赛中达到了第 89 个百分点。您可以在我们的技术研究文章中阅读更多相关内容。

각 방면에서 향상되는 능력

인코딩 향상: 300% 이상
수학/화학/생물/물리학: 40% 정도 향상

중국어 능력

액세스 제한

ChatGPT Plus 사용자는 ChatGPT의 o1 모델에 액세스할 수 있습니다.
o1-preview 의 주별 비율은 30 개의 메시지로 제한되고 o1-mini 의 주 단위 비율은 50 개로 제한됩니다 .
Layer 5 권한이 있는 사용자만 API에 액세스할 수 있습니다.속도는 20RPM으로 제한됩니다.

[다른 Layer]

Tier	Qualification
Tier 1	$5 paid
Tier 2	$50 paid and 7+ days since first successful payment
Tier 3	$100 paid and 7+ days since first successful payment
Tier 4	$250 paid and 14+ days since first successful payment
Tier 5	$1,000 paid and 30+ days since first successful payment

가격

O1-preview 가격은 GPT-4o의 4배에 달한다.그리고 숨겨진 CoT도 Token을 점유한다.

模型	输入价格 (每百万 tokens)	输出价格 (每百万 tokens)
o1-preview	$15	$60
o1-mini	$3	$12
GPT-4o	$5	$15

모델이 부족합니다.

정보를 얻고 파일 및 이미지를 업로드할 수 있는 웹 페이지가 없습니다.

API는 다음을 지원하지 않습니다.

Modalities: text only, images are not supported.
Message types: user and assistant messages only, system messages are not supported.
Streaming**: not supported.
Tools: tools, function calling, and response format parameters are not supported.
Logprobs: not supported.

예

예 1: Logic Puzzle

문제

A princess is as old as the prince will be when the princess is twice the age that the prince was when the princess' age was half the sum of their present ages.Provide all solutions to this question.

설명

参考答案

O1-preview의 답글

내부 Prompt

解质量子问题

我正在解一个关于年龄的复杂问题。将时间点和年龄分别定义为变量，解析这些年龄差距，试图找出满足条件的解。

计算时间差

我开始计算王子和公主的年龄之和P + Q。然后，确定公主年龄的一半Q'，并计算从那时到现在的时间差，即(Q - P)/2年。这让我更好地理解了时间的流动。

分析时间差异

我试图简化时间差异的公式，发现在计算公主和王子年龄时出现了矛盾。回顾过去时，公主的年龄是当前年龄的和的一半。

CHUNK: 假设公主的现年龄为Q。那么在时间t1时，公主的年龄为Q - t1。

根据题目要求，在时间t1时： Q - t1 = (P + Q)/2

解方程得： t1 = Q - (P + Q)/2 = [2Q - P - Q]/2 = (Q - P)/2

类似地，公主的年龄为P - t1 = P - [(Q - P)/2] = [2P - Q + P]/2 = (3P - Q)/2

调整计算细节

我正在解决王子年龄的问题时，调整时间t1的计算方式。通过精简公式，发现王子年龄计算有误，正在检查每个步骤的准确性。

检视方程式
我正尝试重写并简化方程式，但遇到了一些麻烦。看看哪里出错了，可能是算术操作中的小错误。抱着怀疑的态度，我重审每一步，以确保准确性。

计算王子的年龄

我计算出王子在时间 t1 的年龄为 (3P - Q)/2。\n接下来，探讨公主未来的年龄与王子 t1 时的年龄的关系。

计算未来时间 t2 时，公主年龄是那时王子年龄的一倍。首先，计算王子当前年龄 P1，然后用将来的年龄来确定 t2。

分析年龄增长

公主的年龄将是Q + t2, 王子的年龄则为P + t2, 计算得t2 = 3P - 2Q。

探讨年龄关系

公主的年龄等于王子未来年龄的两倍，条件涉及公主年龄的变化和王子年龄的历史。

深入分析

从公主年龄变化出发，整理王子年龄的历史，体现出他们之间的复杂关系。

修正方程并计算

我正在修正年龄方程，重新计算年龄差。最终得出，公主的年龄是王子年龄的四分之三。这个过程让我思考年龄与时间的关系。

提到：

亲王年龄 = 福 + t2 = (3Q/4) + (Q/4) = (4Q/4) = Q

提与 t2 相关： 公主 = (5Q/4)，亲王 = Q

现在思考中： 公主年龄与亲王未来的年龄一致，条件是公主的年龄是亲王一半年龄的总和的一半。

探寻岁数关系

我正在分析王子年龄涉及到的问题。假设王子的年龄为P = 0.75Q，可得王子的现年岁仔细审视其值。

确定年龄关系

我分析了公主和王子的年龄关系，计算得出她们分别为6和8岁。公主在1年前7岁，符合年龄和当前年龄的半和。

确定数学关系

皇子的年龄是 6 岁，公主的年龄是 8 岁。公主将来的年龄是 10 岁，这个时候皇子的年龄是 8 岁。

计算公主和王子的现年龄之和的一半，为3.5岁。

解决年龄问题

尝试找出公主和王子的年龄差异时，发现公主比王子大 1 岁。王子在未来 2 年内年龄将增长，但没有立即对答案有帮助。

尝试不同的数字

我正在探索不同的整数解，选择Q = 16，计算出P = 12。接下来，计算时间t1 = 2年，公主当时14岁。通过检查这些条件，得出公主和王子未来的年龄。

分析年龄差异

我正探索王子和公主之间的年龄差异。王子目前12岁，8年后将成20岁。公主20岁时，正好是王子16岁的时候。这个想法令人兴奋。

寻找解答

考虑Q必须是4的倍数，P可以表示为3k。这让我们计算出一组可能的年龄解。

计算两人现在的年龄差

目前，这对王子和公主的年龄分别是3k和4k。我正在尝试找出在公主年龄是王子年龄的两倍时，王子将多大年龄。

分析年龄差异

我正在思考公主和王子的年龄差异。公主现年5千年，王子3千年，公主两倍年龄将是王子当时的一半和他们当前年龄之和。

探索年龄关系

我正在分析王子和公主的年龄关系，推测出在公主年纪为5k时，王子的年龄为5k/2。谢谢阅读！

최종 답안

Conclusion:

The only integer solution satisfying all conditions is:

Prince's age: 6 years
Princess's age: 8 years
Answer: The prince is 6 years old, and the princess is 8 years old.

예 2: 프로그래밍

수요

make a game called "squirrel finder" using HTML5 and JavaScript.
- You are a koala icon that can move quickly using the arrow keys.
- A strawberry randomly spawns every second and bounces around.
- If a strawberry touches you, you die.
- After three seconds, a squirrel icon spawns and bounces around. If you touch it, you win.
- Make sure all icons are 40 pixels by 40 pixels.
- Use retro colors and a dark background.
- Put "openai" in the game screen and a timer in the corner.
- Display the instructions before the game starts.
- Automatically restart the game when you win or lose.
- Give the code in a single file.

O1 생성된 코드, 정상적으로 사용 가능

<!DOCTYPE html>
<html>
<head>
    <title>Squirrel Finder</title>
    <style>
        body {
            margin: 0;
            overflow: hidden;
            background-color: #000;
            color: #fff;
            font-family: 'Courier New', monospace;
        }
        #gameCanvas {
            display: block;
            margin: 0 auto;
            background-color: #000;
        }
        #instructions {
            position: absolute;
            top: 20%;
            width: 100%;
            text-align: center;
            font-size: 24px;
        }
        #openai {
            position: absolute;
            bottom: 10px;
            left: 10px;
            font-size: 14px;
            color: #fff;
        }
        #timer {
            position: absolute;
            top: 10px;
            right: 10px;
            font-size: 18px;
            color: #fff;
        }
    </style>
</head>
<body>
    <canvas id="gameCanvas" width="800" height="600"></canvas>
    <div id="instructions">
        <h1>Squirrel Finder</h1>
        <p>You are the Koala. Use arrow keys to move.</p>
        <p>Avoid the strawberries!</p>
        <p>Find the squirrel to win!</p>
        <p>Press any key to start.</p>
    </div>
    <div id="openai">openai</div>
    <div id="timer">0</div>
    <script>
        (function(){
            const canvas = document.getElementById('gameCanvas');
            const ctx = canvas.getContext('2d');

            let instructions = document.getElementById('instructions');
            let timerDisplay = document.getElementById('timer');

            let keys = {};

            let player = {
                x: canvas.width / 2 - 20,
                y: canvas.height / 2 - 20,
                width: 40,
                height: 40,
                speed: 7,
                color: '#00ff00' // Green
            };

            let strawberries = [];
            let squirrel = null;

            let gameTime = 0;
            let gameStarted = false;
            let gameOver = false;
            let gameWin = false;

            function startGame() {
                instructions.style.display = 'none';
                gameStarted = true;
                gameTime = 0;
                gameOver = false;
                gameWin = false;
                player.x = canvas.width / 2 - 20;
                player.y = canvas.height / 2 - 20;
                strawberries = [];
                squirrel = null;
                lastStrawberrySpawn = 0;
                lastTime = performance.now();
                requestAnimationFrame(gameLoop);
            }

            document.addEventListener('keydown', function(e) {
                if (!gameStarted) {
                    startGame();
                }
                keys[e.key] = true;
            });

            document.addEventListener('keyup', function(e) {
                keys[e.key] = false;
            });

            let lastTime = performance.now();
            let lastStrawberrySpawn = 0;

            function gameLoop(timestamp) {
                let deltaTime = (timestamp - lastTime) / 1000;
                lastTime = timestamp;

                update(deltaTime);
                render();

                if (!gameOver && !gameWin) {
                    requestAnimationFrame(gameLoop);
                } else {
                    setTimeout(startGame, 2000); // Restart game after 2 seconds
                }
            }

            function update(deltaTime) {
                if (!gameStarted) return;

                gameTime += deltaTime;
                timerDisplay.textContent = Math.floor(gameTime);

                // Move player
                if (keys['ArrowUp']) player.y -= player.speed;
                if (keys['ArrowDown']) player.y += player.speed;
                if (keys['ArrowLeft']) player.x -= player.speed;
                if (keys['ArrowRight']) player.x += player.speed;

                // Keep player on screen
                if (player.x < 0) player.x = 0;
                if (player.x + player.width > canvas.width) player.x = canvas.width - player.width;
                if (player.y < 0) player.y = 0;
                if (player.y + player.height > canvas.height) player.y = canvas.height - player.height;

                // Spawn strawberries
                if (gameTime - lastStrawberrySpawn > 1) {
                    lastStrawberrySpawn = gameTime;
                    spawnStrawberry();
                }

                // Spawn squirrel after 3 seconds
                if (gameTime >= 3 && squirrel === null) {
                    spawnSquirrel();
                }

                // Update strawberries
                strawberries.forEach(function(s) {
                    s.x += s.vx;
                    s.y += s.vy;

                    // Bounce off walls
                    if (s.x < 0 || s.x + s.width > canvas.width) s.vx *= -1;
                    if (s.y < 0 || s.y + s.height > canvas.height) s.vy *= -1;

                    // Check collision with player
                    if (collision(player, s)) {
                        gameOver = true;
                    }
                });

                // Update squirrel
                if (squirrel) {
                    squirrel.x += squirrel.vx;
                    squirrel.y += squirrel.vy;

                    // Bounce off walls
                    if (squirrel.x < 0 || squirrel.x + squirrel.width > canvas.width) squirrel.vx *= -1;
                    if (squirrel.y < 0 || squirrel.y + squirrel.height > canvas.height) squirrel.vy *= -1;

                    // Check collision with player
                    if (collision(player, squirrel)) {
                        gameWin = true;
                    }
                }
            }

            function render() {
                // Clear canvas
                ctx.fillStyle = '#000';
                ctx.fillRect(0, 0, canvas.width, canvas.height);

                // Draw player
                ctx.fillStyle = player.color;
                ctx.fillRect(player.x, player.y, player.width, player.height);

                // Draw strawberries
                strawberries.forEach(function(s) {
                    ctx.fillStyle = '#ff0000'; // Red
                    ctx.fillRect(s.x, s.y, s.width, s.height);
                });

                // Draw squirrel
                if (squirrel) {
                    ctx.fillStyle = '#ffff00'; // Yellow
                    ctx.fillRect(squirrel.x, squirrel.y, squirrel.width, squirrel.height);
                }

                // Display game over or win message
                if (gameOver) {
                    ctx.fillStyle = '#ff0000';
                    ctx.font = '48px Courier New';
                    ctx.fillText('You Died!', canvas.width / 2 - 100, canvas.height / 2);
                }

                if (gameWin) {
                    ctx.fillStyle = '#00ff00';
                    ctx.font = '48px Courier New';
                    ctx.fillText('You Win!', canvas.width / 2 - 100, canvas.height / 2);
                }
            }

            function spawnStrawberry() {
                let s = {
                    x: Math.random() * (canvas.width - 40),
                    y: Math.random() * (canvas.height - 40),
                    width: 40,
                    height: 40,
                    vx: (Math.random() * 4 + 1) * (Math.random() < 0.5 ? -1 : 1),
                    vy: (Math.random() * 4 + 1) * (Math.random() < 0.5 ? -1 : 1)
                };
                strawberries.push(s);
            }

            function spawnSquirrel() {
                squirrel = {
                    x: Math.random() * (canvas.width - 40),
                    y: Math.random() * (canvas.height - 40),
                    width: 40,
                    height: 40,
                    vx: (Math.random() * 3 + 2) * (Math.random() < 0.5 ? -1 : 1),
                    vy: (Math.random() * 3 + 2) * (Math.random() < 0.5 ? -1 : 1)
                };
            }

            function collision(a, b) {
                return a.x < b.x + b.width && a.x + a.width > b.x &&
                       a.y < b.y + b.height && a.y + a.height > b.y;
            }

        })();
    </script>
</body>
</html>

Sonnet-3.5: 생성된 코드도 제대로 작동합니다.

에서는 Claude AI의 상호 작용이 눈에 띄게 좋아 효과를 미리 볼 수 있습니다.

두 코드를 비교해보면 Sonnet 3.5의 효과가 더 나을 것이다.

두 가지 구현 모두 핵심 요구사항을 충족하지만 코드 2(Sonnet 3.5)는 전반적으로 약간 우세합니다.

그것은 보다 조직적이고 객체 지향적인 구조를 가지고 있어 더 쉽게 유지 및 확장할 수 있다.
그것은 좀 더 미묘하고 빈티지한 스타일에 더 잘 어울리는 색을 사용했다.
클래스를 사용하여 게임 개체를 표현하고 더 나은 코드 재사용성을 촉진합니다.

그러나 requestAnimationFrame을 사용했기 때문에 코드 1은 성능 면에서 몇 가지 장점이 있다.

예 3: 추리

문제: How many r's in are in strawberry

GPT-4o: 결과 오류

“strawberry”中有1个字母“r”

Sonnet-3.5 오류

There are 2 r's in "strawberry".

예 4: 손상된 문장 번역

보충 대기 중

참조

Introducing OpenAI o1-preview

OpenAI帖子

SuperCLUE测试结果

총술

LLM의 개선이 필요한 부분​

원리​

새로운 패러다임?​

대상 장면​

일련의 모델​

미숙한 모델​

더 비쌉니다​