๋ฐ˜์‘ํ˜•

์ƒ์„ฑํ˜•(Generative) AI๋ž€?

๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒˆ๋กœ์šด ์ฝ˜ํ…์ธ ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ธฐ์ˆ ๋กœ, ์ด๋ฏธ ์กด์žฌํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€, ํ…์ŠคํŠธ, ์Œ์•… ๋“ฑ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ์ฝ˜ํ…์ธ ๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ์ธ๊ณต์ง€๋Šฅ์ž…๋‹ˆ๋‹ค.

'์ •์˜'๋งŒ ๋“ค์–ด๋ณด๋ฉด ์ž˜ ์™€๋‹ฟ์ง€ ์•Š์„ ํ…๋ฐ์š”. OpenAI์˜ ChatGPT๋Š” ์ž˜ ์•„์‹ค ๊ฒ๋‹ˆ๋‹ค. ์ด ChatGPT๋Š” ํ…์ŠคํŠธ ์ƒ์„ฑ์— ์ดˆ์ ์„ ๋งž์ถ˜ ์ƒ์„ฑํ˜• AI๋กœ, ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜์—ฌ ์‚ฌ์šฉ์ž์™€ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋Œ€ํ™”๋ฅผ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ˆ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์กฐ์‹ฌํ•ด์•ผ ํ•  ๋ถ€๋ถ„์€, ChatGPT์˜ ๋‹ต๋ณ€์€ 100% ์ •๋‹ต์ด ์•„๋‹ˆ๋ฉฐ ํ—ˆ์ƒ ๋ฐ์ดํ„ฐ(Hallucination)๋ฅผ ๊ตฌ๋ถ„ํ•  ์ค„ ์•Œ์•„์•ผ ํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์ฆ‰, ๋งน์‹ ํ•˜๊ธฐ๋ณด๋‹ค๋Š” ์˜์‹ฌํ•˜๋Š” ์ž์„ธ๋ฅผ ์ทจํ•˜๊ณ  ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์š”์ฆ˜์€ ๋Œ€๊ธฐ์—…๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์Šคํƒ€ํŠธ์—… ๊ธฐ์—…์—์„œ๋„ ์ƒ์„ฑํ˜• AI๋ฅผ ํƒ‘์žฌํ•œ ์„œ๋น„์Šค๋ฅผ ๋‚ด๋†“๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ €๋Š” ํŠนํžˆ, ์ตœ๊ทผ 3D ์ œํ’ˆ ์ œ์กฐ์‚ฌ์—์„œ ์ž์‹ ๋“ค์˜ ๊ธฐ์ˆ ๊ณผ ๋งค๋‰ด์–ผ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒ์„ฑํ˜• AI ์„œ๋น„์Šค๋ฅผ ์ถœ์‹œํ•˜์—ฌ, ์‚ฌ์šฉ์ž๋กœ ํ•˜์—ฌ๊ธˆ ๊ธฐ์ˆ  ์ง€์› ๋นˆ๋„๋ฅผ ์ค„์ด๊ณ , ์›ํ•˜๋Š” ๋‹ต๋ณ€์„ ๋งค๋‰ด์–ผ ์ •๋ณด๋กœ ํ‘œ์‹œํ•ด ์‹ ๋ขฐ์„ฑ์„ ๋†’์ธ ๋ถ€๋ถ„์ด ๋Œ€๋‹จํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์ด์ œ ๋‹จ์ˆœํžˆ ์ •ํ˜•ํ™”๋œ ๋Œ€๋‹ต๋งŒ ํ•˜๋Š” ์ฑ—๋ด‡์„ ๋งˆ์ฃผ์น˜๋ฉด ์‹ฑ๊ฑฐ์šด ๊ธฐ๋ถ„์ด ๋“ญ๋‹ˆ๋‹ค.

์ž, ์ด๋ฒˆ ๋ณธ๋ฌธ์—์„œ๋Š” C# ๊ธฐ๋ฐ˜์ธ LLaMaSharp ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์„ ์‚ฌ์šฉํ•ด ์งˆ๋ฌธํ•˜๊ณ  ๋‹ต๋ณ€์„ ์–ป์–ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. (์‚ฌ์‹ค, ๋‚˜๋งŒ์˜ LLM์„ ํŠน์ • ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šต์‹œ์ผœ๋ณด๊ณ  ์‹ถ์ง€๋งŒ, ๊นŠ์€ ์ง€์‹๊ณผ ๊ณ ์„ฑ๋Šฅ ํ•˜๋“œ์›จ์–ด๊ฐ€ ๋ถ€์กฑํ•˜์—ฌ ๋„์ „ํ•˜์ง€ ๋ชปํ•˜๊ณ  ์žˆ์–ด ์•„์‰ฝ๋„ค์š”.)

LLaMaSharp ์†Œ๊ฐœ

LLaMaSharp์€ Meta์—์„œ ๊ฐœ๋ฐœํ•œ LLaMA ๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ ํฌ๋กœ์Šค ํ”Œ๋žซํผ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋ชจ๋ธ๊ณผ CPU ๋˜๋Š” GPU๋ฅผ ์„ ํƒํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋™์ž‘ ์„ฑ๋Šฅ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

GitHub - SciSharp/LLamaSharp: A C#/.NET library to run LLM (๐Ÿฆ™LLaMA/LLaVA) on your local device efficiently.

 

GitHub - SciSharp/LLamaSharp: A C#/.NET library to run LLM (๐Ÿฆ™LLaMA/LLaVA) on your local device efficiently.

A C#/.NET library to run LLM (๐Ÿฆ™LLaMA/LLaVA) on your local device efficiently. - SciSharp/LLamaSharp

github.com

์•„๋ž˜ ๋ฐ๋ชจ ์˜์ƒ์„ ๋ณด๋ฉด, ์‚ฌ์šฉ์ž๊ฐ€ ์งˆ๋ฌธ(์ดˆ๋ก์ƒ‰)ํ•˜๋ฉด ๋‹ต๋ณ€(ํฐ์ƒ‰)์„ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ ์ƒ์„ฑ์„ ์š”์ฒญํ–ˆ๋Š”๋ฐ ์ž˜ ์•Œ๋ ค์ฃผ๋„ค์š”.

LLaMa-๋ฐ๋ชจ
LLaMa ๋ฐ๋ชจ

์•„๋ž˜๋Š” ์‹œ๊ฐ์  ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” LLaVa์˜ ๋™์ž‘ ๋ฐ๋ชจ ์˜์ƒ์ž…๋‹ˆ๋‹ค.

LLaVa-๋ฐ๋ชจ
LLaVa ๋ฐ๋ชจ

๋”ฐ๋ผ ํ•˜๊ธฐ

1. CUDA 12 SDK ์„ค์น˜ํ•˜๊ธฐ

LLaMaSharp์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ CUDA๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด CUDA 11 ๋˜๋Š” CUDA 12 SDK๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ €๋Š” CUDA Toolkit 12.4๋ฅผ ์„ค์น˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

CUDA Toolkit 12.4 ์„ค์น˜๋œ ๋ชจ์Šต

2. ํ”„๋กœ์ ํŠธ ์ƒ์„ฑํ•˜๊ธฐ

์ €๋Š” Visual Studio 2022๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ˜์†”(.Net Core 8.0) ํ”„๋กœ์ ํŠธ๋ฅผ ์ƒ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

Visual Studio 2022 .Net 8.0 ํ”„๋กœ์ ํŠธ ์ƒ์„ฑ ํ™”๋ฉด

3. Nuget ์„ค์น˜ํ•˜๊ธฐ

Nuget ํŒจํ‚ค์ง€ ๊ด€๋ฆฌ์—์„œ LLaMaSharp์„ ๊ฒ€์ƒ‰ํ•˜๊ณ , ๋‹ค์Œ์˜ ํ•ญ๋ชฉ์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

  • LLaMaSharp
  • LLaMaSharp.Backend.Cpu (CPU ์‚ฌ์šฉ ์‹œ)
  • LLaMaSharp.Backend.Cuda11 or 12 (GPU, CUDA 11 ๋˜๋Š” 12 ์‚ฌ์šฉ์— ๋”ฐ๋ผ ์„ ํƒ)
  • Spectre.Console

LLaMaSharp Nuget ๊ฒ€์ƒ‰ ํ™”๋ฉด
Spectre Nuget ๊ฒ€์ƒ‰ ํ™”๋ฉด

4. ๋ชจ๋ธ ํŒŒ์ผ ์ค€๋น„ํ•˜๊ธฐ

LLM์˜ ๋ชจ๋ธ ํŒŒ์ผ์—๋Š” PyTorch ํ˜•์‹(.pth)๊ณผ Huggingface ํ˜•์‹(.bin)์ด ๋งŽ์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. LLamaSharp๋Š” GGUF ํ˜•์‹ ํŒŒ์ผ์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ด ๋‘ ํ˜•์‹์—์„œ ๋ณ€ํ™˜ํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ GGUF ํŒŒ์ผ์„ ์–ป๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

4-1. Hugging Face์—์„œ ์–ป๊ธฐ

ํ—ˆ๊น… ํŽ˜์ด์Šค(Hugging Face)๋Š” ๋‹ค์–‘ํ•œ ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์„ ์˜คํ”ˆ ์†Œ์Šค๋กœ ์ œ๊ณตํ•˜๋Š” ์‚ฌ์ดํŠธ์ž…๋‹ˆ๋‹ค. ์ด ์‚ฌ์ดํŠธ์—์„œ '๋ชจ๋ธ์ด๋ฆ„ + GGUF'๋กœ ๊ฒ€์ƒ‰ํ•˜์—ฌ ๋‹ค์šด๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Hugging Face – The AI community building the future.

 

Hugging Face – The AI community building the future.

The Home of Machine Learning Create, discover and collaborate on ML better. We provide paid Compute and Enterprise solutions. We are building the foundation of ML tooling with the community.

huggingface.co

ํ—ˆ๊น…ํŽ˜์ด์Šค์˜ Blossom 8B ์†Œ๊ฐœ ํ™”๋ฉด

4-2. ์ง์ ‘ ๋ณ€ํ™˜ํ•˜๊ธฐ

PyTorch ๋˜๋Š” Huggingface ํ˜•์‹์˜ ๋ชจ๋ธ ํŒŒ์ผ์„ ์ง์ ‘ GGUF ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ๋งํฌ์˜ ๊ฐ€์ด๋“œ์— ๋”ฐ๋ผ ํŒŒ์ด์ฌ ์Šคํฌ๋ฆฝํŠธ๋กœ ๋ณ€ํ™˜์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

GitHub - ggerganov/llama.cpp: LLM inference in C/C++

 

GitHub - ggerganov/llama.cpp: LLM inference in C/C++

LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

github.com

ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ณ€ํ™˜ ๊ณผ์ •์ด ์ €์ฒ˜๋Ÿผ ์ต์ˆ™์น˜ ์•Š์œผ์‹œ๋‹ค๋ฉด, ์•„์‰ฝ์ง€๋งŒ ํ—ˆ๊น… ํŽ˜์ด์Šค์—์„œ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ๋งŒ ์ฐพ์•„ ์“ธ ์ˆ˜๋ฐ–์— ์—†์Šต๋‹ˆ๋‹ค.

5. ์ฝ”๋“œ ์ž‘์„ฑํ•˜๊ธฐ

LLaMaSharp์—์„œ ์ œ๊ณตํ•˜๋Š” ๊ธฐ๋ณธ ์˜ˆ์ œ ์ฝ”๋“œ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ์•„๋ž˜์˜ ๋ฉ”์ธ ์ฝ”๋“œ๋ฅผ ์ค€๋น„ํ–ˆ์Šต๋‹ˆ๋‹ค.

using LLama;
using LLama.Common;
using LLama.Examples;

public class ChatSessionWithRoleName
{
    public static async Task Main()
    {
        string modelPath = UserSettings.GetModelPath();

        var parameters = new ModelParams(modelPath)
        {
            ContextSize = 1024,
            GpuLayerCount = 10,
            MainGpu = 0
        };
        using var model = LLamaWeights.LoadFromFile(parameters);
        using var context = model.CreateContext(parameters);
        var executor = new InteractiveExecutor(context);

        ChatSession session = new(executor);

        InferenceParams inferenceParams = new InferenceParams()
        {
            AntiPrompts = new List<string> { "User:" }
        };

        Console.ForegroundColor = ConsoleColor.Yellow;
        Console.WriteLine("The chat session has started.");

        // show the prompt
        Console.ForegroundColor = ConsoleColor.Green;
        string userInput = Console.ReadLine() ?? "";

        while (userInput != "exit")
        {
            await foreach (
                var text
                in session.ChatAsync(
                    new ChatHistory.Message(AuthorRole.User, userInput),
                    inferenceParams))
            {
                Console.ForegroundColor = ConsoleColor.White;
                Console.Write(text);
            }

            Console.ForegroundColor = ConsoleColor.Green;
            userInput = Console.ReadLine() ?? "";

            Console.ForegroundColor = ConsoleColor.White;
        }
    }
}

ContextSize, GpuLayerCount ๋“ฑ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์ด ๊ฐ€๋Šฅํ•˜๋‚˜, ์ž์„ธํžˆ ๋‹ค๋ฃฐ ์ค„ ๋ชฐ๋ผ ๊ธฐ๋ณธ ๊ฐ’์œผ๋กœ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ „์ฒด ์ฝ”๋“œ๋Š” ์•„๋ž˜ ์ฒจ๋ถ€๋œ ํ”„๋กœ์ ํŠธ ํŒŒ์ผ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

ExamLLaMaSharp.zip
0.00MB

๋™์ž‘ ํ™”๋ฉด

์ €๋Š” ํ—ˆ๊น…ํŽ˜์ด์Šค์—์„œ llama-3-Korean-Bllossom-8B-Q4_K_M.gguf ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ๋Œ€ํ™”๋ฅผ ์‹œ๋„ํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์ถ•๊ตฌ ๊ฒฝ๊ธฐ์˜ ๊ทœ์น™์„ ์„ค๋ช…ํ•ด ๋‹ฌ๋ผ๊ณ  ํ–ˆ๋Š”๋ฐ, ๊ทธ๋Ÿด์‹ธํ•œ ๋‹ต๋ณ€์„ ์ฃผ๋‹ค๊ฐ€ ์–ด๋Š ์ˆœ๊ฐ„๋ถ€ํ„ฐ๋Š” ์ด์ƒํ•œ ๋‹ต๋ณ€์„ ์ฃผ๊ธฐ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.

๋™์ž‘-ํ™”๋ฉด
๋™์ž‘ ํ™”๋ฉด

๋ชจ๋ธ์˜ ๋ฌธ์ œ๋ผ๊ธฐ๋ณด๋‹ค๋Š”, ์•„๋งˆ๋„ ์ œ๊ฐ€ ์‚ฌ์šฉํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์˜ ๋ฌธ์ œ์ด์ง€ ์•„๋‹๊นŒ ์‹ถ์Šต๋‹ˆ๋‹ค.

์ •๋ฆฌํ•˜๋ฉฐ

๋กœ์ปฌ PC ๊ธฐ๋ฐ˜์˜ LLM ์‚ฌ์šฉ์€ ์–ด๋–ค๊ฐ€ ์‚ดํŽด๋ณด์•˜๋Š”๋ฐ์š”. ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ํ…Œ์ŠคํŠธํ•ด ๋ณด์•˜์ง€๋งŒ, ํ•˜๋“œ์›จ์–ด ์‚ฌ์–‘ ๋ถ€์กฑ, ํŠœ๋‹ ์ˆ™์ง€ ๋“ฑ์˜ ๋ฌธ์ œ๋กœ ์›ํ•˜๋Š” ์„ฑ๋Šฅ์„ ์–ป์ง€๋Š” ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค. ์–ด์งธ์„œ์ธ์ง€, ๋‹ต๋ณ€์ด ๋๋‚˜์ง€ ์•Š๊ณ  ์ด์ƒํ•œ ๋‚ด์šฉ๋งŒ ์ฃผ๋”๊ตฐ์š”.

๊ฐœ์ธ์ ์œผ๋กœ ๋งŒ๋“ค์–ด ๋ณด๊ณ  ์‹ถ์€ ๊ฒƒ์ด ์žˆ๋Š”๋ฐ, ์ด์— ์ ํ•ฉํ•œ ๋ชจ๋ธ์„ ์ฐพ๋Š” ๊ฒƒ๋ถ€ํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹, ํ•˜๋“œ์›จ์–ด ์ค€๋น„ ๋“ฑ ์ƒ๊ฐํ•ด์•ผ ํ•  ๊ฒƒ๋“ค์ด ๋„ˆ๋ฌด ๋งŽ์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ์—๋Š” OpenAI API์™€ ๊ฐ™์€ ์„œ๋น„์Šค๋ฅผ ์ด์šฉํ•˜์—ฌ ์กฐ๊ธˆ ๋” ์‰ฝ๊ณ  ๋น ๋ฅด๊ฒŒ ์ ‘๊ทผ์ด ๊ฐ€๋Šฅํ•œ์ง€ ์•Œ์•„๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ˜์‘ํ˜•