[NLP] Retrieval-Augmented Generation for Large Language Models: A Survey

2025. 12. 11. 23:47ยท๐Ÿ‘จ‍๐Ÿ’ป About AI/Paper Review

1. Introduction

 

๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์€ ๋›ฐ์–ด๋‚œ ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ์ง€๋งŒ, ์—ฌ์ „ํžˆ ์น˜๋ช…์ ์ธ ๋‹จ์ ๋“ค์„ ์•ˆ๊ณ  ์žˆ๋‹ค. ํŠนํžˆ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์—†๋Š” ์ตœ์‹  ์ •๋ณด๋ฅผ ๋ชจ๋ฅธ๋‹ค๊ฑฐ๋‚˜ ์‘์šฉ ์ƒํ™ฉ์—์„œ ๊ธฐ์—… ๋‚ด ์ •๋ณด๋ฅผ ์ฐธ์กฐํ•  ์ˆ˜ ์—†๋‹ค. ์ด๋Ÿฌํ•œ ์ง€์‹ cut-off(๋‹จ์ ˆ) ๋ฌธ์ œ๋Š” ๊ฒฐ๊ณผ์ ์œผ๋กœ LLM์œผ๋กœ ํ•˜์—ฌ๊ธˆ ์‚ฌ์‹ค์ด ์•„๋‹Œ ๋‚ด์šฉ์„ ์‚ฌ์‹ค์ฒ˜๋Ÿผ ๋งํ•˜๋Š” ํ™˜๊ฐ(Hallucination) ํ˜„์ƒ์„ ์œ ๋„ํ•œ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ธ์ด ๋‚ด๋†“๋Š” ๋‹ต์˜ ์ถ”๋ก  ๊ณผ์ •์„ ํˆฌ๋ช…ํ•˜๊ฒŒ ์•Œ๊ธฐ ์–ด๋ ต๋‹ค๋Š” ๋‹จ์ ๋„ ์ง€์ ๋œ๋‹ค.

 

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋“ฑ์žฅํ•œ ๊ฒƒ์ด ๋ฐ”๋กœ RAG(Retrieval-Augmented Generation, ๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ)์ด๋‹ค. RAG๋Š” LLM์ด ๋‚ด๋ถ€์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ง€์‹์—๋งŒ ์˜์กดํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ๊ด€๋ จ ์ง€์‹์„ ์ฐพ์•„์™€ ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๊ฒŒ ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‹ต๋ณ€์˜ ์ •ํ™•๋„์™€ ์‹ ๋ขฐ์„ฑ์„ ๋†’์ด๊ณ , ์ง€์†์ ์ธ ์ง€์‹ ์—…๋ฐ์ดํŠธ๊ฐ€ ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. ์ด ๋…ผ๋ฌธ์€ RAG ๊ธฐ์ˆ ์ด ์–ด๋–ป๊ฒŒ ๋ฐœ์ „ํ•ด์™”๋Š”์ง€๋ฅผ ํ•œ๋ˆˆ์— ๋ณด์—ฌ์ฃผ๋Š” ๊ธฐ์ˆ  ํŠธ๋ฆฌ(Technology Tree)๋ฅผ ์ œ์‹œํ•˜๊ณ  ์žˆ๋‹ค.

 

Fig. 1. Technology tree of RAG research

 

 

์œ„ ๊ทธ๋ฆผ(Fig. 1)์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, RAG ์—ฐ๊ตฌ์˜ ๊ถค์ ์€ ํฌ๊ฒŒ ์„ธ ๋‹จ๊ณ„๋กœ ์ง„ํ™”ํ•ด ์™”๋‹ค.

  • Pre-training (์‚ฌ์ „ ํ•™์Šต ๋‹จ๊ณ„): ์ดˆ๊ธฐ์—๋Š” Transformer ๊ตฌ์กฐ์˜ ๋“ฑ์žฅ๊ณผ ํ•จ๊ป˜ ์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ(PTM)์— ์ถ”๊ฐ€์ ์ธ ์ง€์‹์„ ์ฃผ์ž…ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์ฃผ๋ฅผ ์ด๋ค˜๋‹ค.
  • Inference (์ถ”๋ก  ๋‹จ๊ณ„): ChatGPT ์ดํ›„, LLM์˜ ๊ฐ•๋ ฅํ•œ In-Context Learning(ICL) ๋Šฅ๋ ฅ์„ ํ™œ์šฉํ•˜์—ฌ ์ถ”๋ก  ์‹œ์ ์— ๋” ๋‚˜์€ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ํญ๋ฐœ์ ์œผ๋กœ ์„ฑ์žฅํ–ˆ๋‹ค.
  • Fine-tuning (๋ฏธ์„ธ ์กฐ์ • ๋‹จ๊ณ„): ์ตœ๊ทผ์—๋Š” ๊ฒ€์ƒ‰๋œ ์ •๋ณด๋ฅผ ๋‹จ์ˆœํžˆ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋„˜์–ด, RAG ํ”„๋กœ์„ธ์Šค ์ž์ฒด๋ฅผ LLM์˜ ๋ฏธ์„ธ ์กฐ์ •๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋‚˜์•„๊ฐ€๊ณ  ์žˆ๋‹ค.

 

2. Overview of RAG

 

 

1. RAG์˜ ์ž‘๋™ ํ”„๋กœ์„ธ์Šค

 

์ „ํ˜•์ ์ธ RAG์˜ ๊ณผ์ •์€ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ๊ด€๋ จ ๋ฌธ์„œ ์ฒญํฌ(Chunk)๋ฅผ ๊ฒ€์ƒ‰(Retrieval)ํ•˜๊ณ , ์ด๋ฅผ ์งˆ๋ฌธ๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ LLM์ด ๋‹ต๋ณ€์„ ์ƒ์„ฑ(Generation)ํ•˜๋Š” ๊ตฌ์กฐ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด LLM์€ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์—†๋Š” ์ตœ์‹  ์ •๋ณด๋ฅผ ๋ฐ˜์˜ํ•˜๊ณ , ํ™˜๊ฐ(Hallucination)์„ ์ค„์ด๋ฉฐ ๋‹ต๋ณ€์˜ ์‹ ๋ขฐ๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค.

 
Fig. 2. A representative instance of the RAG process applied to question answering
 

 

2. RAG ํŒจ๋Ÿฌ๋‹ค์ž„์˜ ์ง„ํ™” (RAG Paradigm)

 

A. Naive RAG (Retrieve - Read Framework)

 

Naive RAG๋Š” ChatGPT์˜ ๋“ฑ์žฅ ์งํ›„ ๋„๋ฆฌ ์ฑ„ํƒ๋œ ๊ฐ€์žฅ ์ดˆ๊ธฐ์˜ ์—ฐ๊ตฌ ํŒจ๋Ÿฌ๋‹ค์ž„์œผ๋กœ, ์ „ํ†ต์ ์ธ "Retrieve-Read" ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.

  • ํ”„๋กœ์„ธ์Šค:
    1. Indexing (์ธ๋ฑ์‹ฑ): PDF, HTML ๋“ฑ ๋‹ค์–‘ํ•œ ํฌ๋งท์˜ ๋ฌธ์„œ๋ฅผ ํ…์ŠคํŠธ๋กœ ์ถ”์ถœํ•˜๊ณ  ์ฒญํฌ(Chunk) ๋‹จ์œ„๋กœ ์ž˜๋ผ ๊ตฌ์„ฑํ•œ ๋’ค, ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์„ ํ†ตํ•ด ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์ €์žฅํ•œ๋‹ค.
    2. Retrieval (๊ฒ€์ƒ‰): ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์„ ๋ฒกํ„ฐํ™”ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋‚ด์˜ ์ฒญํฌ๋“ค๊ณผ ์œ ์‚ฌ๋„(Similarity)๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฐ€์žฅ ๊ด€๋ จ์„ฑ์ด ๋†’์€ ์ƒ์œ„ K๊ฐœ์˜ ๋ฌธ์„œ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.
    3. Generation (์ƒ์„ฑ): ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋ฅผ ์งˆ๋ฌธ๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๊ตฌ์„ฑํ•˜๊ณ , ์ด๋ฅผ LLM์— ์ž…๋ ฅํ•˜์—ฌ ์ตœ์ข… ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•œ๋‹ค.
  • ํ•œ๊ณ„์  (Limitations): ์›๋ฌธ์—์„œ๋Š” Naive RAG๊ฐ€ ๊ฐ€์ง€๋Š” ํ•œ๊ณ„๋ฅผ ์„ธ ๊ฐ€์ง€ ์ธก๋ฉด์—์„œ ๊ตฌ์ฒด์ ์œผ๋กœ ์ง€์ ํ•œ๋‹ค.
    • Retrieval Challenges: ๊ฒ€์ƒ‰์˜ ์ •ํ™•๋„(Precision)์™€ ์žฌํ˜„์œจ(Recall)์ด ๋‚ฎ์•„, ๊ด€๋ จ ์—†๋Š” ๋ฌธ์„œ๋ฅผ ๊ฐ€์ ธ์˜ค๊ฑฐ๋‚˜ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ๋†“์น˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
    • Generation Difficulties: ๋ชจ๋ธ์ด ๊ฒ€์ƒ‰๋œ ์ •๋ณด์— ๊ธฐ๋ฐ˜ํ•˜์ง€ ์•Š๊ณ  ํ—ˆ๊ตฌ์˜ ๋‚ด์šฉ์„ ๋‹ตํ•˜๋Š” ํ™˜๊ฐ(Hallucination) ๋ฌธ์ œ๊ฐ€ ์—ฌ์ „ํ•˜๋ฉฐ, ๋‹ต๋ณ€์˜ ํŽธํ–ฅ์„ฑ์ด๋‚˜ ๋…์„ฑ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.
    • Augmentation Hurdles: ๊ฒ€์ƒ‰๋œ ์ •๋ณด๋“ค์ด ์„œ๋กœ ์ถฉ๋Œํ•˜๊ฑฐ๋‚˜ ์ค‘๋ณต๋  ๋•Œ ์ด๋ฅผ ๋งค๋„๋Ÿฝ๊ฒŒ ํ†ตํ•ฉํ•˜์ง€ ๋ชปํ•˜์—ฌ, ๋‹ต๋ณ€์ด ์•ž๋’ค๊ฐ€ ์•ˆ ๋งž๊ฑฐ๋‚˜(Disjointed) ๋ถˆํ•„์š”ํ•˜๊ฒŒ ๋ฐ˜๋ณต๋˜๋Š”(Repetitive) ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

B. Advanced RAG: Optimization for Quality

 

 

Advanced RAG๋Š” Naive RAG์˜ ๊ฒ€์ƒ‰ ํ’ˆ์งˆ๊ณผ ์ƒ์„ฑ ํ’ˆ์งˆ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ฒ€์ƒ‰ ์ „(Pre-retrieval)๊ณผ ๊ฒ€์ƒ‰ ํ›„(Post-retrieval) ๊ณผ์ •์„ ์ตœ์ ํ™”ํ•˜๋Š” ์ „๋žต์„ ๋„์ž…ํ•œ๋‹ค.

  • Pre-retrieval Process (๊ฒ€์ƒ‰ ์ „ ์ตœ์ ํ™”):
    • Indexing Optimization: ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ(Sliding window) ๋ฐฉ์‹์ด๋‚˜ ๋” ์„ธ๋ฐ€ํ•œ ์ฒญํ‚น(Fine-grained segmentation)์„ ์ ์šฉํ•˜๊ณ , ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ(Metadata)๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์ธ๋ฑ์‹ฑ ๊ตฌ์กฐ๋ฅผ ๊ฐœ์„ ํ•œ๋‹ค.
    • Query Optimization: ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์„ ๊ทธ๋Œ€๋กœ ์“ฐ์ง€ ์•Š๊ณ , ์ฟผ๋ฆฌ ์žฌ์ž‘์„ฑ(Rewriting), ๋ณ€ํ™˜(Transformation), ํ™•์žฅ(Expansion) ๋“ฑ์˜ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ๊ฒ€์ƒ‰์— ์ตœ์ ํ™”๋œ ํ˜•ํƒœ๋กœ ๋‹ค๋“ฌ๋Š”๋‹ค.
  • Post-retrieval Process (๊ฒ€์ƒ‰ ํ›„ ์ตœ์ ํ™”):
    • Rerank (์žฌ์ˆœ์œ„ํ™”): ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋“ค ์ค‘ ๊ฐ€์žฅ ๊ด€๋ จ์„ฑ ๋†’์€ ์ •๋ณด๊ฐ€ ํ”„๋กฌํ”„ํŠธ์˜ ์•ž์ด๋‚˜ ๋’ค(Edges)์— ์œ„์น˜ํ•˜๋„๋ก ์ˆœ์„œ๋ฅผ ์žฌ์กฐ์ •ํ•œ๋‹ค. ์ด๋Š” LLM์ด ๊ธด ๋ฌธ๋งฅ์˜ ์ค‘๊ฐ„์— ์žˆ๋Š” ์ •๋ณด๋ฅผ ๋ง๊ฐํ•˜๋Š” 'Lost in the middle' ํ˜„์ƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค.
    • Context Compression (๋ฌธ๋งฅ ์••์ถ•): ๋ชจ๋“  ์ •๋ณด๋ฅผ ๋‹ค ๋„ฃ์œผ๋ฉด ์ •๋ณด ๊ณผ๋ถ€ํ•˜๊ฐ€ ๊ฑธ๋ฆฌ๋ฏ€๋กœ, ๋ถˆํ•„์š”ํ•œ ๋‚ด์šฉ์„ ์ œ๊ฑฐํ•˜๊ณ  ํ•ต์‹ฌ ์ •๋ณด๋งŒ ์••์ถ•ํ•˜์—ฌ LLM์— ์ „๋‹ฌํ•œ๋‹ค.

Fig. 3. Comparison between the three paradigms of RAG

 

 

C. Modular RAG: Flexibility and Versatility

 

 

Modular RAG๋Š” ๊ธฐ์กด์˜ ์„ ํ˜•์ ์ธ ๊ตฌ์กฐ๋ฅผ ํƒˆํ”ผํ•˜์—ฌ, ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ ๋ชจ๋“ˆ์„ ๋ ˆ๊ณ  ๋ธ”๋ก์ฒ˜๋Ÿผ ์กฐ๋ฆฝํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ์ง„๋ณด๋œ ํ˜•ํƒœ์ด๋‹ค.

  • New Modules (์ƒˆ๋กœ์šด ๋ชจ๋“ˆ์˜ ๋„์ž…): ๋‹จ์ˆœ ๊ฒ€์ƒ‰ ์™ธ์— ํŠนํ™”๋œ ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ชจ๋“ˆ๋“ค์ด ์ถ”๊ฐ€๋˜์—ˆ๋‹ค.
    • Search Module: ํŠน์ • ์‹œ๋‚˜๋ฆฌ์˜ค(๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค, ์ง€์‹ ๊ทธ๋ž˜ํ”„ ๋“ฑ)์— ๋งž์ถฐ ๊ฒ€์ƒ‰ ์—”์ง„์ด๋‚˜ ์ฝ”๋“œ๋ฅผ ์ด์šฉํ•ด ์ง์ ‘ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.
    • RAG-Fusion: ์‚ฌ์šฉ์ž์˜ ์ฟผ๋ฆฌ๋ฅผ ์—ฌ๋Ÿฌ ๊ด€์ ์œผ๋กœ ํ™•์žฅ(Multi-query)ํ•˜์—ฌ ๊ฒ€์ƒ‰ํ•˜๊ณ , ์ด๋ฅผ ๋‹ค์‹œ ํ†ตํ•ฉํ•˜์—ฌ ๋‹ต๋ณ€์˜ ๋‹ค์–‘์„ฑ์„ ํ™•๋ณดํ•œ๋‹ค.
    • Memory Module: LLM์˜ ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰์„ ๊ฐ€์ด๋“œํ•˜๊ณ , ์ง€์†์ ์œผ๋กœ ์ง€์‹์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฌด์ œํ•œ ๋ฉ”๋ชจ๋ฆฌ ํ’€์„ ์ƒ์„ฑํ•œ๋‹ค.
    • Routing: ์งˆ๋ฌธ์˜ ์ข…๋ฅ˜์— ๋”ฐ๋ผ ์š”์•ฝ์ด ํ•„์š”ํ•œ์ง€, ํŠน์ • DB ๊ฒ€์ƒ‰์ด ํ•„์š”ํ•œ์ง€ ๋“ฑ ์ตœ์ ์˜ ๊ฒฝ๋กœ๋ฅผ ์„ค์ •ํ•œ๋‹ค.
    • Predict: ๊ฒ€์ƒ‰ ๋Œ€์‹  LLM์ด ์ง์ ‘ ๋งฅ๋ฝ์„ ์ƒ์„ฑํ•˜์—ฌ ์ค‘๋ณต๊ณผ ๋…ธ์ด์ฆˆ๋ฅผ ์ค„์ธ๋‹ค.
  • New Patterns (์ƒˆ๋กœ์šด ํŒจํ„ด์˜ ์ ์šฉ): ์ •ํ•ด์ง„ ์ˆœ์„œ(Retrieve -> Read)๊ฐ€ ์•„๋‹ˆ๋ผ ์œ ์—ฐํ•œ ํ๋ฆ„์„ ๊ฐ€์ง€๊ธฐ๋„ ํ•œ๋‹ค.
    • Rewrite-Retrieve-Read: ๊ฒ€์ƒ‰ ์ „์— ์ฟผ๋ฆฌ๋ฅผ ๋จผ์ € ์žฌ์ž‘์„ฑํ•œ๋‹ค.
    • Generate-Read: ๊ฒ€์ƒ‰ ๋Œ€์‹  LLM์˜ ์ƒ์„ฑ ๋Šฅ๋ ฅ์„ ๋จผ์ € ํ™œ์šฉํ•œ๋‹ค.
    • Hybrid Retrieval: ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰๊ณผ ์‹œ๋งจํ‹ฑ ๊ฒ€์ƒ‰์„ ํ˜ผํ•ฉํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค.
    • DSP (Demonstrate-Search-Predict): ๋ฌธ๋งฅ ๋‚ด ํ•™์Šต(In-Context Learning)์„ ๊ฐ•ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์˜ˆ์‹œ๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ๊ฒ€์ƒ‰ํ•˜๊ณ  ์˜ˆ์ธกํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค.
    • Iterative & Adaptive: ITER-RETGEN์ฒ˜๋Ÿผ ๊ฒ€์ƒ‰๊ณผ ์ฝ๊ธฐ๋ฅผ ๋ฐ˜๋ณตํ•˜๊ฑฐ๋‚˜, Self-RAG์ฒ˜๋Ÿผ ๊ฒ€์ƒ‰์ด ํ•„์š”ํ•œ ์ˆœ๊ฐ„์„ ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ํŒ๋‹จ(Adaptive)ํ•˜์—ฌ ํšจ์œจ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•œ๋‹ค.

 

์ดˆ๊ธฐ RAG๊ฐ€ ์ผ์ง์„ ์˜ ํŒŒ์ดํ”„๋ผ์ธ์ด์—ˆ๋‹ค๋ฉด, Modular RAG๋Š” ๋ ˆ๊ณ  ๋ธ”๋ก์ฒ˜๋Ÿผ ์กฐ๋ฆฝ ๊ฐ€๋Šฅํ•œ ์—์ฝ”์‹œ์Šคํ…œ์œผ๋กœ ๋ณ€๋ชจํ–ˆ๋‹ค๋Š” ์ ์ด ํฅ๋ฏธ๋กญ๋‹ค. ์ด๋Š” RAG๊ฐ€ ๋‹จ์ˆœํ•œ ๊ธฐ์ˆ ์„ ๋„˜์–ด ํ•˜๋‚˜์˜ ๊ฑฐ๋Œ€ํ•œ ์•„ํ‚คํ…์ฒ˜๋กœ ์ง„ํ™”ํ•˜๊ณ  ์žˆ์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค.

 

 

3. RAG vs Fine-tuning "RAG๋ฅผ ์“ธ๊นŒ, ํŒŒ์ธ ํŠœ๋‹์„ ํ• ๊นŒ?"๋Š” ํ˜„์—…์˜ ๊ณ ๋ฏผ๊ฑฐ๋ฆฌ์ด๋‹ค. ์ด๋ฅผ ๋น„์œ ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • RAG: ๋ชจ๋ธ์—๊ฒŒ ๋งž์ถคํ˜• ๊ต๊ณผ์„œ(Textbook)๋ฅผ ์ฅ์—ฌ์ฃผ๊ณ  ์ •๋ณด๋ฅผ ์ฐพ์•„๋ณด๊ฒŒ ํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์•„, ์ •๋ฐ€ํ•œ ์ •๋ณด ๊ฒ€์ƒ‰์— ์œ ๋ฆฌํ•˜๋‹ค. ๋˜ํ•œ, ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ์œ„ํ•œ ์ปดํ“จํŒ… ์ž์›๊ณผ ๋ฐ์ดํ„ฐ, ์‹œ๊ฐ„์  ๋น„์šฉ์ด ์ ๊ธฐ ๋•Œ๋ฌธ์— ํšจ์œจ์„ฑ์—์„œ ํฐ ์šฐ์œ„๋ฅผ ๊ฐ€์ง„๋‹ค.
  • Fine-tuning: ํ•™์ƒ์ด ์ง€์‹์„ ๋‚ด๋ฉดํ™”(Internalizing)ํ•˜์—ฌ ์‹œํ—˜์„ ๋ณด๋Š” ๊ฒƒ๊ณผ ๊ฐ™์•„, ํŠน์ • ํ˜•์‹์ด๋‚˜ ์Šคํƒ€์ผ์„ ํ•™์Šตํ•˜๋Š” ๋ฐ ์œ ๋ฆฌํ•˜๋‹ค. ์ด๋Š” ๊ธฐ์กด LLM์˜ ๊ฐ•์ ์ธ ๋ณดํŽธ์„ฑ/์ผ๋ฐ˜ํ™”๋œ ์ง€์‹์„ ํฌ๊ธฐํ•˜๋Š” ๋Œ€์‹ , ํŠน์ • ์ž‘์—…์— ๋” ์ „๋ฌธ์ ์ธ ํ™œ์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

๊ฒฐ๋ก ์ ์œผ๋กœ ๋‘ ๊ธฐ์ˆ ์€ ์ƒํ˜ธ ๋ฐฐํƒ€์ ์ธ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์ƒํ˜ธ ๋ณด์™„์ ์ด๋ฉฐ, ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ์œ„ํ•ด์„œ๋Š” ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ์ด ์ ์ ˆํ•˜๋‹ค.

 

 

 

3. Retrieval

 

 

RAG์˜ ์„ฑ๋Šฅ์€ ๊ฒฐ๊ตญ "์–ผ๋งˆ๋‚˜ ๊ด€๋ จ ์žˆ๋Š” ๋ฌธ์„œ๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ฐพ์•„์˜ค๋А๋ƒ"์— ๋‹ฌ๋ ค์žˆ๋‹ค. ์ด ์„น์…˜์—์„œ๋Š” ๊ฒ€์ƒ‰ ํ’ˆ์งˆ์„ ๋†’์ด๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ์‹œ๋„๋“ค์„ ๋‹ค๋ฃฌ๋‹ค. ๊ฒ€์ƒ‰ ์†Œ์Šค์˜ ํ™•์žฅ๋ถ€ํ„ฐ ์ธ๋ฑ์‹ฑ, ์ฟผ๋ฆฌ ์ตœ์ ํ™”, ๊ทธ๋ฆฌ๊ณ  ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์˜ ํŠœ๋‹๊นŒ์ง€, '์›ํ•˜๋Š” ์ •๋ณด๋ฅผ ์ •ํ™•ํžˆ ์ฐพ์•„์˜ค๊ธฐ ์œ„ํ•œ' ๋ชจ๋“  ๊ธฐ์ˆ ์  ์‹œ๋„๋“ค์ด ์ด์–ด์ ธ์™”๋‹ค.

 

1. ๊ฒ€์ƒ‰ ์†Œ์Šค (Retrieval Source): ๊ฒ€์ƒ‰์˜ ๋Œ€์ƒ์ด ๋˜๋Š” ๋ฐ์ดํ„ฐ๋Š” ํ˜•ํƒœ์™€ ์ž…๋„ ๋‚ด์ง€๋Š” ๋ฐ€์ง‘๋„(Granularity)์— ๋”ฐ๋ผ RAG์˜ ์„ฑ๋Šฅ์— ์ง€๋Œ€ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค.

  • ๋ฐ˜์ •ํ˜• ๋ฐ์ดํ„ฐ (Semi-structured): PDF์ฒ˜๋Ÿผ ํ…์ŠคํŠธ์™€ ํ‘œ๊ฐ€ ์„ž์ธ ๋ฐ์ดํ„ฐ๋Š” ํ…์ŠคํŠธ ๋ถ„ํ•  ์‹œ ํ‘œ๊ฐ€ ๊นจ์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ‘œ๋ฅผ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๊ฑฐ๋‚˜, LLM์˜ ์ฝ”๋”ฉ ๋Šฅ๋ ฅ์„ ๋นŒ๋ ค Text-to-SQL๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ์‹์ด ์‹œ๋„๋˜๊ณ  ์žˆ๋‹ค.
  • ๊ตฌ์กฐํ™” ๋ฐ์ดํ„ฐ (Structured): ์ง€์‹ ๊ทธ๋ž˜ํ”„(Knowledge Graph, KG)๋Š” ๊ฒ€์ฆ๋œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•œ๋‹ค๋Š” ์ ์—์„œ ๋งค์šฐ ์œ ์šฉํ•˜๋‹ค. KnowledGPT๋‚˜ G-Retriever ๊ฐ™์€ ์—ฐ๊ตฌ๋Š” KG์—์„œ ์ •ํ™•ํ•œ ์‚ฌ์‹ค ๊ด€๊ณ„๋ฅผ ์ถ”์ถœํ•ด LLM์˜ ํ™˜๊ฐ์„ ์ค„์ด๋ ค๊ณ  ์‹œ๋„ํ•œ๋‹ค.
  • LLM ์ƒ์„ฑ ์ฝ˜ํ…์ธ  (LLMs-Generated Content): ์—ญ๋ฐœ์ƒ์œผ๋กœ, ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹Œ LLM์ด ์Šค์Šค๋กœ ์ƒ์„ฑํ•œ ์ง€์‹์„ ๊ฒ€์ƒ‰ ์†Œ์Šค๋กœ ์“ฐ๊ธฐ๋„ ํ•œ๋‹ค. GenRead๋Š” ๊ฒ€์ƒ‰๊ธฐ ๋Œ€์‹  LLM ์ƒ์„ฑ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ๋งฅ์„ ๋งŒ๋“ค๊ณ , Selfmem์€ ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์„ ๋‹ค์‹œ ๋ฉ”๋ชจ๋ฆฌ ํ’€์— ์ €์žฅํ•ด ์Šค์Šค๋กœ๋ฅผ ๊ฐ•ํ™”ํ•œ๋‹ค.
  • ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ์˜ ํ™•์žฅ: ์ดˆ๊ธฐ์—๋Š” ์œ„ํ‚คํ”ผ๋””์•„ ๊ฐ™์€ ๋น„์ •ํ˜• ํ…์ŠคํŠธ(Unstructured Data)๊ฐ€ ์ฃผ๋ฅ˜์˜€์œผ๋‚˜, ์ ์ฐจ ๊ทธ ๋ฒ”์œ„๊ฐ€ ๋„“์–ด์ง€๊ณ  ์žˆ๋‹ค.
  • ๊ฒ€์ƒ‰ ์ž…๋„ (Retrieval Granularity): ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋А ํฌ๊ธฐ๋กœ ์ž˜๋ผ์„œ ๊ฒ€์ƒ‰ํ• ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ํ† ํฐ, ๋ฌธ๊ตฌ, ๋ฌธ์žฅ, ๋ฌธ์„œ ๋“ฑ ๋‹ค์–‘ํ•œ ๋‹จ์œ„๊ฐ€ ์žˆ์ง€๋งŒ, ์ตœ๊ทผ์—๋Š” '๋ช…์ œ(Proposition)' ๋‹จ์œ„๊ฐ€ ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ๋‹ค. Proposition ์ฒญํฌ๋ฅผ ์ œ์•ˆํ•œ DenseX~๋Š” ํ…์ŠคํŠธ๋ฅผ ๊ณ ์œ ํ•œ ์‚ฌ์‹ค(Fact)์„ ๋‹ด์€ ์ตœ์†Œ ๋‹จ์œ„์ธ ๋ช…์ œ๋กœ ์ชผ๊ฐœ์–ด ๊ฒ€์ƒ‰์˜ ์ •ํ™•๋„๋ฅผ ๋†’์˜€๋‹ค.

2. ์ธ๋ฑ์‹ฑ ์ตœ์ ํ™” (Indexing Optimization) ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅํ•œ ํ˜•ํƒœ๋กœ ์ €์žฅํ•˜๋Š” ์ธ๋ฑ์‹ฑ ๋‹จ๊ณ„๋Š” ๊ฒ€์ƒ‰ ํ’ˆ์งˆ์„ ์ขŒ์šฐํ•˜๋Š” '๊ธฐ์ดˆ ๊ณต์‚ฌ'์ž…๋‹ˆ๋‹ค.

  • ์ฒญํ‚น ์ „๋žต (Chunking Strategy): ๋‹จ์ˆœํžˆ 100์ž, 500์ž ๋“ฑ ๊ณ ์ •๋œ ํฌ๊ธฐ๋กœ ๋ฌธ์„œ๋ฅผ ์ž๋ฅด๋ฉด ๋ฌธ๋งฅ์ด ๋Š๊ธธ ์œ„ํ—˜์ด ์žˆ๋‹ค. ์ด๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ(Sliding Window)๋ฅผ ์“ฐ๊ฑฐ๋‚˜ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜๋Š” Semantic Chunker์™€ ๊ฐ™์€ ์ „๋žต์„ ์‚ฌ์šฉํ•œ๋‹ค. ํ•œํŽธ Small2Big์€ ๊ฒ€์ƒ‰์€ ์ž‘์€ ๋ฌธ์žฅ ๋‹จ์œ„(Small)๋กœ ํ•˜๋˜, LLM์—๊ฒŒ๋Š” ๊ทธ ๋ฌธ์žฅ์ด ํฌํ•จ๋œ ๋” ํฐ ๋ฌธ๋งฅ(Big)์„ ์ œ๊ณตํ•˜์—ฌ ์ •๋ฐ€๋„์™€ ๋ฌธ๋งฅ ์ดํ•ด๋„๋ฅผ ๋™์‹œ์— ์žก๋Š” ๊ธฐ๋ฒ•์ด๋‹ค. ์ฆ‰, ๊ฒ€์ƒ‰๊ณผ ์ถ”๋ก ์— ์“ฐ์ด๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
  • ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ถ€์ฐฉ (Metadata Attachments): ์ฒญํฌ์— ํŒŒ์ผ๋ช…, ์ €์ž, ํƒ€์ž„์Šคํƒฌํ”„ ๋“ฑ์„ ํƒœ๊น…ํ•˜์—ฌ ํ•„ํ„ฐ๋ง์— ํ™œ์šฉํ•œ๋‹ค. ํŠนํžˆ Reverse HyDE๋ผ๋Š” ๊ธฐ๋ฒ•์ด ํฅ๋ฏธ๋กœ์šด๋ฐ, ๋ฌธ์„œ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ '์ด ๋ฌธ์„œ๊ฐ€ ๋‹ต๋ณ€ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์ƒ์˜ ์งˆ๋ฌธ'์„ LLM์œผ๋กœ ์ƒ์„ฑํ•˜์—ฌ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋กœ ์ €์žฅํ•œ๋‹ค. ๋‚˜์ค‘์— ์‚ฌ์šฉ์ž๊ฐ€ ์งˆ๋ฌธ์„ ๋˜์ง€๋ฉด, ์ด ๊ฐ€์ƒ์˜ ์งˆ๋ฌธ๊ณผ ๋งค์นญ๋˜์–ด ๊ฒ€์ƒ‰ ํ™•๋ฅ ์„ ๋†’์ธ๋‹ค.
  • ๊ตฌ์กฐ์  ์ธ๋ฑ์Šค (Structural Index): ๋ฌธ์„œ๋ฅผ ๋ถ€๋ชจ-์ž์‹ ๋…ธ๋“œ์˜ ๊ณ„์ธต ๊ตฌ์กฐ๋กœ ์ €์žฅํ•˜๊ฑฐ๋‚˜, ์ง€์‹ ๊ทธ๋ž˜ํ”„(KG)๋ฅผ ํ™œ์šฉํ•ด ๋ฌธ์„œ ๊ฐ„์˜ ์—ฐ๊ฒฐ์„ฑ์„ ๋ณด์กดํ•œ๋‹ค. ์ด๋Š” LLM์ด ๋‹จํŽธ์ ์ธ ์ •๋ณด ์กฐ๊ฐ์ด ์•„๋‹ˆ๋ผ, ์ •๋ณด์˜ ๊ตฌ์กฐ์  ๋งฅ๋ฝ๊นŒ์ง€ ์ดํ•ดํ•˜๋„๋ก ๋•๋Š”๋‹ค.

3. ์ฟผ๋ฆฌ ์ตœ์ ํ™” (Query Optimization): Naive RAG์˜ ๊ฐ€์žฅ ํฐ ํŒจ์ฐฉ์€ ์‚ฌ์šฉ์ž์˜ ๋ถˆ์™„์ „ํ•œ ์งˆ๋ฌธ์„ ๊ทธ๋Œ€๋กœ ๊ฒ€์ƒ‰์— ์“ด๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

  • ์ฟผ๋ฆฌ ํ™•์žฅ (Query Expansion): ์งˆ๋ฌธ ํ•˜๋‚˜๋กœ๋Š” ์„ธ๋ถ€ ๋งฅ๋ฝ๊ณผ ๊ด€๋ จ ์ •๋ณด๊ฐ€ ๋ถ€์กฑํ•  ๋•Œ ์‚ฌ์šฉํ•œ๋‹ค.
    • Multi-Query: LLM์„ ์ด์šฉํ•ด ํ•˜๋‚˜์˜ ์งˆ๋ฌธ์„ ๋‹ค์–‘ํ•œ ๊ด€์ ์˜ ์—ฌ๋Ÿฌ ์งˆ๋ฌธ์œผ๋กœ ๋ปฅํŠ€๊ธฐํ•˜๊ณ  ๋ณ‘๋ ฌ๋กœ ๊ฒ€์ƒ‰ํ•œ๋‹ค.
    • Sub-Query: ๋ณต์žกํ•œ ์งˆ๋ฌธ์€ Least-to-Most ํ”„๋กฌํ”„ํŒ…์„ ํ†ตํ•ด ํ•ด๊ฒฐ ๊ฐ€๋Šฅํ•œ ์ž‘์€ ํ•˜์œ„ ์งˆ๋ฌธ๋“ค๋กœ ์ชผ๊ฐœ์–ด ๋‹จ๊ณ„์ ์œผ๋กœ ๊ฒ€์ƒ‰ํ•œ๋‹ค.
    • Chain-of-Verification (CoVe): ํ™•์žฅ๋œ ์ฟผ๋ฆฌ๋ฅผ ๋‹ค์‹œ ๊ฒ€์ฆํ•˜์—ฌ ํ™˜๊ฐ์„ ์ค„์ด๋Š” ๊ธฐ๋ฒ•์ด๋ผ๊ณ  ํ•œ๋‹ค.
  • ์ฟผ๋ฆฌ ๋ณ€ํ™˜ (Query Transformation): ์งˆ๋ฌธ์˜ ๋ณธ์งˆ์„ ํŒŒ๊ณ ๋“ญ๋‹ˆ๋‹ค.
    • Query Rewrite: LLM์—๊ฒŒ "๊ฒ€์ƒ‰ํ•˜๊ธฐ ์ข‹๊ฒŒ ๋‹ค์‹œ ์จ์ค˜"๋ผ๊ณ  ์‹œํ‚ค๋Š” ๋ฐฉ์‹์ด๋‹ค. RRR์ด๋‚˜ BEQUE ๊ฐ™์€ ๋ชจ๋ธ์ด ์—ฌ๊ธฐ์— ํ•ด๋‹นํ•œ๋‹ค.
    • HyDE (Hypothetical Document Embeddings): ์งˆ๋ฌธ์— ๋Œ€ํ•ด LLM์ด '๊ฐ€์ƒ์˜ ๋‹ต๋ณ€'์„ ๋จผ์ € ์ž‘์„ฑํ•˜๊ฒŒ ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด ๊ฐ€์ƒ์˜ ๋‹ต๋ณ€๊ณผ ์œ ์‚ฌํ•œ ์‹ค์ œ ๋ฌธ์„œ๋ฅผ ์ฐพ๋Š”๋‹ค. ์งˆ๋ฌธ-๋ฌธ์„œ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ณด๋‹ค, ๋‹ต๋ณ€-๋ฌธ์„œ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ํ›จ์”ฌ ๊ฐ€๊น๋‹ค๋Š” ์ ์„ ์ด์šฉํ•œ ๊ธฐ๋ฐœํ•œ ์•„์ด๋””์–ด์ด๋‹ค.
    • Step-back Prompting: ๊ตฌ์ฒด์ ์ธ ์งˆ๋ฌธ์„ ์ถ”์ƒ์ ์ธ ๊ณ ์ฐจ์› ์งˆ๋ฌธ์œผ๋กœ ๋ฐ”๊ฟ”์„œ, ๋” ํฌ๊ด„์ ์ธ ๋ฐฐ๊ฒฝ ์ง€์‹์„ ๊ฒ€์ƒ‰ํ•˜๋„๋ก ์œ ๋„ํ•œ๋‹ค.
  • ์ฟผ๋ฆฌ ๋ผ์šฐํŒ… (Query Routing): ์งˆ๋ฌธ์˜ ์„ฑ๊ฒฉ์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋‚˜ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ์—ฐ๊ฒฐํ•ด ์ฃผ๋Š” '๊ตํ†ต์ •๋ฆฌ' ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

4. ์ž„๋ฒ ๋”ฉ (Embedding): ๊ฒฐ๊ตญ ๊ฒ€์ƒ‰์€ ๋ฒกํ„ฐ ๊ฐ„์˜ ์œ ์‚ฌ๋„ ์‹ธ์›€์ด๋‹ค.

  • ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰ (Hybrid Retrieval): ํ‚ค์›Œ๋“œ ์ผ์น˜์— ๊ฐ•ํ•œ ํฌ์†Œ(Sparse, ์˜ˆ: BM25) ๋ชจ๋ธ๊ณผ ์˜๋ฏธ์  ๋งฅ๋ฝ์„ ์žก๋Š” ๋ฐ€์ง‘(Dense, ์˜ˆ: BERT) ๋ชจ๋ธ์„ ์„ž์–ด ์“ด๋‹ค. ํฌ์†Œ ๋ชจ๋ธ์€ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์—†๋˜ ํฌ๊ท€ ๋‹จ์–ด๋‚˜ ์ „๋ฌธ ์šฉ์–ด ๊ฒ€์ƒ‰์— ๊ฐ•ํ•ด ๋ฐ€์ง‘ ๋ชจ๋ธ์˜ ์•ฝ์ ์„ ๋ณด์™„ํ•œ๋‹ค.
  • ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ ๋ฏธ์„ธ ์กฐ์ • (Fine-tuning Embedding): ์˜๋ฃŒ, ๋ฒ•๋ฅ  ๋“ฑ ํŠน์ˆ˜ ๋„๋ฉ”์ธ์—์„œ๋Š” ์ผ๋ฐ˜์ ์ธ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์ด ์„ฑ๋Šฅ์ด ๊ธ‰๊ฒฉํžˆ ํ•˜๋ฝํ•  ๊ฐ€๋Šฅ์„ฑ์ด ํฌ๋‹ค. PROMPTAGATOR๋‚˜ LLM-Embedder ๊ฐ™์€ ์—ฐ๊ตฌ๋Š” LLM์„ ํ™œ์šฉํ•ด ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜ ๋ผ๋ฒจ๋งํ•˜์—ฌ, ์ ์€ ๋ฐ์ดํ„ฐ๋กœ๋„ ๋„๋ฉ”์ธ ํŠนํ™” ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค. ํŠนํžˆ REPLUG๋Š” LLM์„ ๊ฐ๋…๊ด€(Supervisor)์œผ๋กœ ์‚ผ์•„ ๊ฒ€์ƒ‰๊ธฐ๋ฅผ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆํ–ˆ๋‹ค.

5. ์–ด๋Œ‘ํ„ฐ (Adapter): ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ชจ๋ธ ์ „์ฒด๋ฅผ ํŒŒ์ธ ํŠœ๋‹ํ•˜๊ธฐ ์–ด๋ ค์šธ ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. LoRA FT์™€ ๊ฐ™์ด ์™ธ๋ถ€ ์–ด๋Œ‘ํ„ฐ๋ฅผ ๋ถ€์ฐฉํ•˜์—ฌ LLM๊ณผ ๊ฒ€์ƒ‰๊ธฐ ์‚ฌ์ด์˜ ์ •๋ ฌ(Alignment)์„ ๋•๋Š”๋‹ค. UPRISE๋Š” ์ œ๋กœ์ƒท ์ž‘์—…์— ๋งž๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ž๋™์œผ๋กœ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒฝ๋Ÿ‰ ๊ฒ€์ƒ‰๊ธฐ๋ฅผ, PKG๋Š” ์•„์˜ˆ ๊ฒ€์ƒ‰ ๊ณผ์ •์„ ์ƒ๋žตํ•˜๊ณ  ์ฟผ๋ฆฌ์— ๋งž๋Š” ๋ฌธ์„œ๋ฅผ ๋ฐ”๋กœ ์ƒ์„ฑํ•ด ๋‚ด๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜๊ธฐ๋„ ํ–ˆ๋‹ค.

 

 

4. Generation

 

 

๋ฌธ์„œ๋ฅผ ์ž˜ ์ฐพ์•„์™”๋‹ค๊ณ  ๋์ด ์•„๋‹ˆ๋‹ค. LLM์˜ ์ถ”๋ก ์— ์šฉ์ดํ•˜๊ฒŒ ์ •๋ณด๋ฅผ ๊ฐ€๊ณตํ•˜๋ฉด ๋” ์ •ํ™•ํ•˜๊ณ  ์˜๋„์— ๋งž๋Š” ์ƒ์„ฑ์„ ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

1. ๋ฌธ๋งฅ ํ๋ ˆ์ด์…˜ (Context Curation): ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋ฅผ ๊ทธ๋Œ€๋กœ ๋‹ค ๋„ฃ๋Š”๋‹ค๊ณ  ๋Šฅ์‚ฌ๊ฐ€ ์•„๋‹ˆ๋‹ค. ์ค‘๋ณต ์ •๋ณด๋Š” LLM์˜ ์ถ”๋ก ์„ ํ๋ฆฌ๊ณ , ๋„ˆ๋ฌด ๊ธด ๋ฌธ๋งฅ์€ ํ•ต์‹ฌ ์ •๋ณด๋ฅผ ๋†“์น˜๊ฒŒ ๋งŒ๋“ ๋‹ค.

  1. Reranking (์žฌ์ˆœ์œ„ํ™”): LLM์€ ์ธ๊ฐ„์ฒ˜๋Ÿผ ๊ธด ๊ธ€์˜ ์‹œ์ž‘๊ณผ ๋๋ถ€๋ถ„์— ์ง‘์ค‘ํ•˜๊ณ  ์ค‘๊ฐ„ ๋‚ด์šฉ์„ ๊นŒ๋จน๋Š” 'Lost in the middle' ํ˜„์ƒ์„ ๋ณด์ธ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋“ค์„ ๋‹ค์‹œ ์ •๋ ฌํ•˜์—ฌ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ •๋ณด๊ฐ€ ํ”„๋กฌํ”„ํŠธ์˜ ์•ž์ด๋‚˜ ๋’ค(Edges)์— ์˜ค๋„๋ก ๋ฐฐ์น˜ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค. ๋‹จ์ˆœํ•œ ๊ทœ์น™(๋‹ค์–‘์„ฑ, ๊ด€๋ จ์„ฑ ๋“ฑ)์„ ์“ธ ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ์ตœ๊ทผ์—๋Š” BERT ๊ธฐ๋ฐ˜์˜ ์ธ์ฝ”๋”-๋””์ฝ”๋” ๋ชจ๋ธ์ด๋‚˜ Cohere rerank, bge-reranker ๊ฐ™์€ ํŠนํ™”๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ •๋ฐ€ํ•˜๊ฒŒ ์ˆœ์„œ๋ฅผ ์žฌ์กฐ์ •ํ•œ๋‹ค.
  2. Context Selection/Compression (๋ฌธ๋งฅ ์„ ํƒ ๋ฐ ์••์ถ•): "๊ด€๋ จ ๋ฌธ์„œ๋Š” ๋‹ค๋‹ค์ต์„ "์ด ์•„๋‹ ๊ฐ€๋Šฅ์„ฑ์ด ํฌ๋‹ค. ๊ณผ๋„ํ•œ ๋ฌธ๋งฅ์€ ๋…ธ์ด์ฆˆ๋ฅผ ์œ ๋ฐœํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
    • LLMLingua: GPT-2 Small์ด๋‚˜ LLaMA-7B ๊ฐ™์€ ์†Œํ˜• ์–ธ์–ด ๋ชจ๋ธ(sLLM)์„ ์‚ฌ์šฉํ•˜์—ฌ, ์ธ๊ฐ„์€ ์ฝ๊ธฐ ํž˜๋“ค์–ด๋„ LLM์€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์ˆ˜์ค€์œผ๋กœ ๋ถˆํ•„์š”ํ•œ ํ† ํฐ์„ ์ œ๊ฑฐํ•˜๊ณ  ์••์ถ•ํ•œ๋‹ค. ์ด๋Š” LLM์„ ๋”ฐ๋กœ ํ•™์Šต์‹œํ‚ฌ ํ•„์š” ์—†์ด ํ”„๋กฌํ”„ํŠธ ๊ธธ์ด๋ฅผ ํš๊ธฐ์ ์œผ๋กœ ์ค„์ธ๋‹ค.
    • Filter-Reranker: sLLM์„ ํ•„ํ„ฐ(Filter)๋กœ ์‚ฌ์šฉํ•ด ์‰ฌ์šด ๋ฌธ์„œ๋ฅผ ๊ฑฐ๋ฅด๊ณ , LLM์„ ๋ฆฌ๋žญ์ปค(Reranker)๋กœ ํ™œ์šฉํ•ด ์–ด๋ ค์šด ๋ฌธ์„œ๋ฅผ ์žฌ๋ฐฐ์—ดํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์ถ•ํ•˜์—ฌ ํšจ์œจ์„ ๋†’์ด๊ธฐ๋„ ํ•œ๋‹ค.

2. LLM ๋ฏธ์„ธ ์กฐ์ • (LLM Fine-tuning): RAG ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ƒ์„ฑ ๋ชจ๋ธ(Generator) ์ž์ฒด๋ฅผ ํŠœ๋‹ํ•˜๋Š” ์ „๋žต์ด๋‹ค.

  1. ๋„๋ฉ”์ธ ๋ฐ ํ˜•์‹ ์ตœ์ ํ™”: ํŠน์ • ๋„๋ฉ”์ธ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•˜๊ฑฐ๋‚˜, ํŠน์ˆ˜ํ•œ ํฌ๋งท(์˜ˆ: JSON ์ถœ๋ ฅ, ํŠน์ • ๋งํˆฌ)์„ ๋”ฐ๋ผ์•ผ ํ•  ๋•Œ ์œ ์šฉํ•˜๋‹ค.
    • SANTA: ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ(Structured Data)๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ํšจ๊ณผ์ ์ด๋‹ค. ๊ฒ€์ƒ‰๊ธฐ์™€ ์ƒ์„ฑ๊ธฐ ์‚ฌ์ด์˜ ๊ตฌ์กฐ์ , ์˜๋ฏธ์  ๋‰˜์•™์Šค๋ฅผ ์บก์Аํ™”ํ•˜๊ธฐ ์œ„ํ•ด 3๋‹จ๊ณ„ ํ•™์Šต ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค.
  2. ์ •๋ ฌ (Alignment): LLM์˜ ์ถœ๋ ฅ์„ ์‚ฌ๋žŒ์˜ ์„ ํ˜ธ๋„๋‚˜ ๊ฒ€์ƒ‰๊ธฐ์˜ ํŠน์„ฑ์— ๋งž์ถ”๋Š” ๊ณผ์ •์ด๋‹ค.
    • RLHF (Reinforcement Learning): ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์— ๋Œ€ํ•ด ์‚ฌ๋žŒ์ด ํ‰๊ฐ€ํ•˜๊ฑฐ๋‚˜, ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์™€์˜ ๊ด€๋ จ์„ฑ์„ ์ ์ˆ˜ํ™”ํ•˜์—ฌ ๊ฐ•ํ™” ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•œ๋‹ค .
    • Distillation (์ง€์‹ ์ฆ๋ฅ˜): GPT-4 ํ˜น์€ ์ด์ƒ์˜ ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ์‚ผ์•„, ๋” ์ž‘์€ ๋ชจ๋ธ์„ ํŠœ๋‹ํ•˜์—ฌ ๋น„์šฉ ํšจ์œจ์„ฑ์„ ๋†’์ธ๋‹ค.
  3. ํ˜‘์—… ๋ฏธ์„ธ ์กฐ์ • (Collaborative Fine-tuning): ๊ฒ€์ƒ‰๊ธฐ์™€ ์ƒ์„ฑ๊ธฐ๋ฅผ ๋”ฐ๋กœ๋”ฐ๋กœ ๋…ธ๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ, ์„œ๋กœ ํ•ฉ์„ ๋งž์ถ”๋„๋ก ํ•จ๊ป˜ ํŠœ๋‹ํ•œ๋‹ค.
    • RA-DIT: ๊ฒ€์ƒ‰๊ธฐ์™€ ์ƒ์„ฑ๊ธฐ ๊ฐ„์˜ ์ ์ˆ˜ ํ•จ์ˆ˜๋ฅผ KL ๋ฐœ์‚ฐ(Kullback-Leibler divergence)์„ ์ด์šฉํ•ด ์ •๋ ฌํ•œ๋‹ค. ์ฆ‰, ์ƒ์„ฑ๊ธฐ๊ฐ€ ์„ ํ˜ธํ•˜๋Š” ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰๊ธฐ๊ฐ€ ๋” ์ž˜ ์ฐพ์•„์˜ค๋„๋ก ์„œ๋กœ ํ”ผ๋“œ๋ฐฑ์„ ์ฃผ๊ณ ๋ฐ›์œผ๋ฉฐ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

5. Augmentation Process in RAG

 

 

ํ‘œ์ค€์ ์ธ RAG๋Š” "๊ฒ€์ƒ‰ ํ•œ ๋ฒˆ -> ์ƒ์„ฑ ํ•œ ๋ฒˆ"์ด๋ผ๋Š” ๋‹จ์ˆœํ•œ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์นœ๋‹ค. ํ•˜์ง€๋งŒ ๋ณต์žกํ•œ ์ถ”๋ก ์ด๋‚˜ ๋‹ค๋‹จ๊ณ„ ์ง€์‹์ด ํ•„์š”ํ•œ ๋ฌธ์ œ์—์„œ๋Š” ์ด ๋ฐฉ์‹์ด ๋ถˆ์ถฉ๋ถ„ํ•  ๋•Œ๊ฐ€ ๋งŽ๋‹ค. ๋”ฐ๋ผ์„œ ์ตœ๊ทผ์˜ ์—ฐ๊ตฌ๋“ค์€ ๊ฒ€์ƒ‰ ๊ณผ์ •์„ ๋ฐ˜๋ณต์ (Iterative), ์žฌ๊ท€์ (Recursive), ์ ์‘ํ˜•(Adaptive)์˜ ์„ธ ๊ฐ€์ง€ ํ”„๋กœ์„ธ์Šค๋กœ ๋‚˜๋ˆ„์–ด ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ์•ˆ์„ ์ œ์‹œํ•˜๋ฉฐ ํฐ ์ฃผ๋ชฉ์„ ๋ฐ›๊ณ  ์žˆ๋‹ค.

 

A. ๋ฐ˜๋ณต ๊ฒ€์ƒ‰ (Iterative Retrieval)

 

 

๋ฐ˜๋ณต ๊ฒ€์ƒ‰์€ LLM์ด ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๋Š” ๋™์•ˆ ์ง€์‹ ๋ฒ ์ด์Šค๋ฅผ ์—ฌ๋Ÿฌ ๋ฒˆ ๊ฒ€์ƒ‰ํ•˜์—ฌ ๋ฌธ๋งฅ์„ ํ’๋ถ€ํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐฉ์‹์ด๋‹ค.

  • ์ž‘๋™ ๋ฐฉ์‹: ์ดˆ๊ธฐ ์ฟผ๋ฆฌ๋กœ ๊ฒ€์ƒ‰์„ ํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ์™€ ํ˜„์žฌ๊นŒ์ง€ ์ƒ์„ฑ๋œ ํ…์ŠคํŠธ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์‹œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ด ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•˜๋ฉฐ ์ง€์‹์„ ์ ์ง„์ ์œผ๋กœ ๊ตฌ์ฒดํ™”ํ•œ๋‹ค.
  • ์žฅ์ : ํ•œ ๋ฒˆ์˜ ๊ฒ€์ƒ‰์œผ๋กœ๋Š” ๋†“์น  ์ˆ˜ ์žˆ๋Š” ์ถ”๊ฐ€์ ์ธ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๋‹ต๋ณ€์˜ ๊ฒฌ๊ณ ์„ฑ(Robustness)์„ ๋†’์ธ๋‹ค.
  • ๋Œ€ํ‘œ ์—ฐ๊ตฌ: ITER-RETGEN์€ "๊ฒ€์ƒ‰์ด ์ƒ์„ฑ์„ ๋•๊ณ , ์ƒ์„ฑ์ด ๋‹ค์‹œ ๊ฒ€์ƒ‰์„ ๋•๋Š”" ์‹œ๋„ˆ์ง€ ํšจ๊ณผ๋ฅผ ์–ป๊ณ ์ž ํ•œ๋‹ค. ์ƒ์„ฑ๋œ ๋‚ด์šฉ์ด ๋‹ค์Œ ๊ฒ€์ƒ‰์˜ ๋ฌธ๋งฅ์ด ๋˜์–ด ๋” ๊ด€๋ จ์„ฑ ๋†’์€ ์ •๋ณด๋ฅผ ์ฐพ์•„์˜ค๋Š” ์„ ์ˆœํ™˜ ๊ตฌ์กฐ๋ฅผ ๋งŒ๋“ ๋‹ค.
  • ํ•œ๊ณ„: ๋ฐ˜๋ณต ๊ณผ์ •์—์„œ ์˜๋ฏธ๊ฐ€ ๋Š๊ธฐ๊ฑฐ๋‚˜(Semantic discontinuity), ๋ถˆํ•„์š”ํ•œ ์ •๋ณด๊ฐ€ ๋ˆ„์ ๋  ์œ„ํ—˜์ด ์žˆ๋‹ค.

 

B. ์žฌ๊ท€์  ๊ฒ€์ƒ‰ (Recursive Retrieval)

 

์žฌ๊ท€์  ๊ฒ€์ƒ‰์€ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ฟผ๋ฆฌ๋ฅผ ์ ์  ๋” ๊ตฌ์ฒดํ™”ํ•˜๊ฑฐ๋‚˜, ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ์ž‘์€ ๋‹จ์œ„๋กœ ์ชผ๊ฐœ์–ด ํŒŒ๊ณ ๋“œ๋Š” ๋ฐฉ์‹์ด๋‹ค .

  • ์ฟผ๋ฆฌ ์ •์ œ: ์ดˆ๊ธฐ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๊ฐ€ ๋งŒ์กฑ์Šค๋Ÿฝ์ง€ ์•Š์„ ๋•Œ, ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ํ”ผ๋“œ๋ฐฑ ์‚ผ์•„ ๊ฒ€์ƒ‰ ์ฟผ๋ฆฌ๋ฅผ ์ˆ˜์ •ํ•˜๊ณ  ๋‹ค์‹œ ๊ฒ€์ƒ‰ํ•œ๋‹ค. ์ด๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ๊ฒ€์ƒ‰ ์—”์ง„์—์„œ ์›ํ•˜๋Š” ์ •๋ณด๊ฐ€ ๋‚˜์˜ฌ ๋•Œ๊นŒ์ง€ ๊ฒ€์ƒ‰์–ด๋ฅผ ๋ฐ”๊พธ๋Š” ๊ฒƒ๊ณผ ๋น„์Šทํ•˜๋‹ค.
  • ๋Œ€ํ‘œ ์—ฐ๊ตฌ:
    • IRCoT (Information Retrieval with Chain-of-Thought): CoT(์ƒ๊ฐ์˜ ์‚ฌ์Šฌ) ์ถ”๋ก  ๊ณผ์ •์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰์„ ๊ฐ€์ด๋“œํ•˜๊ณ , ๊ฒ€์ƒ‰๋œ ๊ฒฐ๊ณผ๋กœ ๋‹ค์‹œ CoT๋ฅผ ์ •์ œํ•œ๋‹ค.
    • ToC (Tree of Clarifications): ๋ชจํ˜ธํ•œ ์งˆ๋ฌธ์ด ๋“ค์–ด์˜ค๋ฉด '๋ช…ํ™•ํ™” ํŠธ๋ฆฌ'๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์งˆ๋ฌธ์˜ ์˜๋„๋ฅผ ๊ตฌ์ฒดํ™”ํ•˜๊ณ  ์ตœ์ ํ™”ํ•œ๋‹ค.
  • ๊ตฌ์กฐ์  ํ™œ์šฉ: ๊ณ„์ธต์  ์ธ๋ฑ์Šค(Hierarchical Index)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฒ˜์Œ์—๋Š” ๋ฌธ์„œ์˜ ์š”์•ฝ๋ณธ์„ ๊ฒ€์ƒ‰ํ•˜๊ณ , ๊ทธ๋‹ค์Œ ๊ตฌ์ฒด์ ์ธ ์„น์…˜์œผ๋กœ ๋“ค์–ด๊ฐ€๋Š” ๋ฐฉ์‹๋„ ์—ฌ๊ธฐ์— ํฌํ•จ๋œ๋‹ค. ์ง€์‹ ๊ทธ๋ž˜ํ”„์—์„œ ๊ผฌ๋ฆฌ์— ๊ผฌ๋ฆฌ๋ฅผ ๋ฌด๋Š” '๋ฉ€ํ‹ฐ ํ™‰(Multi-hop)' ๊ฒ€์ƒ‰๋„ ์žฌ๊ท€์  ๊ฒ€์ƒ‰์˜ ์ผ์ข…์ด๋‹ค.

 

Fig. 5. In addition to the most common once retrieval, RAG also includes three types of retrieval augmentation processes.

 

 

C. ์ ์‘ํ˜• ๊ฒ€์ƒ‰ (Adaptive Retrieval)

 

 

์ ์‘ํ˜• ๊ฒ€์ƒ‰์€ RAG ์‹œ์Šคํ…œ์ด "์–ธ์ œ ๊ฒ€์ƒ‰ํ• ์ง€" ํ˜น์€ "๊ฒ€์ƒ‰์ด ํ•„์š”ํ•œ์ง€"๋ฅผ ์Šค์Šค๋กœ ํŒ๋‹จํ•˜๋Š” ๊ฐ€์žฅ ์ง€๋Šฅ์ ์ธ ๋ฐฉ์‹์ด๋‹ค. ๋ถˆํ•„์š”ํ•œ ๊ฒ€์ƒ‰์„ ์ค„์—ฌ ํšจ์œจ์„ฑ์„ ๋†’์ด๊ณ , LLM์ด ์ž์‹ ์˜ ์ง€์‹๋งŒ์œผ๋กœ ์ถฉ๋ถ„ํ•  ๋•Œ๋Š” ๊ฒ€์ƒ‰์„ ๊ฑด๋„ˆ๋›ด๋‹ค.

  • ์—์ด์ „ํŠธ์  ์ ‘๊ทผ: LLM์ด ๋„๊ตฌ(Tool)๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์—์ด์ „ํŠธ์ฒ˜๋Ÿผ ํ–‰๋™ํ•œ๋‹ค. AutoGPT๋‚˜ Toolformer์ฒ˜๋Ÿผ, ๋ชจ๋ธ์ด ํ•„์š”ํ•˜๋‹ค๊ณ  ํŒ๋‹จํ•  ๋•Œ๋งŒ ๊ฒ€์ƒ‰ API๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.
  • ๋Œ€ํ‘œ ์—ฐ๊ตฌ:
    • WebGPT: ๊ฐ•ํ™”ํ•™์Šต์„ ํ†ตํ•ด GPT-3๊ฐ€ ๊ฒ€์ƒ‰ ์—”์ง„์„ ์Šค์Šค๋กœ ์‚ฌ์šฉํ•˜๊ณ , ๊ฒฐ๊ณผ๋ฅผ ๋ธŒ๋ผ์šฐ์ง•ํ•˜๋ฉฐ ์ฐธ๊ณ  ๋ฌธํ—Œ์„ ์ธ์šฉํ•˜๋„๋ก ํ›ˆ๋ จ์‹œ์ผฐ๋‹ค.
    • FLARE: ์ƒ์„ฑ ๊ณผ์ •์—์„œ ๋ชจ๋ธ์˜ '์ž์‹ ๊ฐ(Confidence)'์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•œ๋‹ค. ๋งŒ์•ฝ ์ƒ์„ฑํ•˜๋ ค๋Š” ๋‹จ์–ด์˜ ํ™•๋ฅ ์ด ๋‚ฎ์œผ๋ฉด(์ž์‹ ๊ฐ์ด ์—†์œผ๋ฉด), ๊ทธ๋•Œ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ฐ€๋™ํ•˜์—ฌ ์ •๋ณด๋ฅผ ์ฐพ์•„์˜จ๋‹ค.
    • Self-RAG: ๊ฐ€์žฅ ์ฃผ๋ชฉ๋ฐ›๋Š” ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜๋กœ, '์„ฑ์ฐฐ ํ† ํฐ(Reflection Tokens)'์„ ๋„์ž…ํ–ˆ๋‹ค. ๋ชจ๋ธ์ด ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋ฉด์„œ "Retrieve(๊ฒ€์ƒ‰ํ•ด)", "Critic(ํ‰๊ฐ€ํ•ด)" ๊ฐ™์€ ํ† ํฐ์„ ์Šค์Šค๋กœ ๋‚ด๋ฑ‰์œผ๋ฉฐ ๊ฒ€์ƒ‰์˜ ํ•„์š”์„ฑ์„ ๊ฒฐ์ •ํ•˜๊ณ , ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์˜ ํ’ˆ์งˆ์„ ์…€ํ”„ ๊ฒ€์ฆํ•œ๋‹ค. "Self-RAG์˜ ์„ค๊ณ„๋Š” ์ถ”๊ฐ€์ ์ธ ๋ถ„๋ฅ˜๊ธฐ๋‚˜ ์ž์—ฐ์–ด ์ถ”๋ก (NLI) ๋ชจ๋ธ์— ๋Œ€ํ•œ ์˜์กด ํ•„์š”์„ฑ์„ ์ œ๊ฑฐํ•˜์—ฌ, ๊ฒ€์ƒ‰ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์–ธ์ œ ์ž‘๋™์‹œํ‚ฌ์ง€์— ๋Œ€ํ•œ ์˜์‚ฌ๊ฒฐ์ • ๊ณผ์ •์„ ๊ฐ„์†Œํ™”ํ•˜๊ณ  ์ •ํ™•ํ•œ ์‘๋‹ต์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์žˆ์–ด ๋ชจ๋ธ์˜ ์ž์œจ์ ์ธ ํŒ๋‹จ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค๋Š” ์ ์—์„œ ์˜๋ฏธ๊ฐ€ ์žˆ๋‹ค.

 

6. Task and Evaluation

 

RAG ๊ธฐ์ˆ ์ด ์„ฑ์ˆ™ํ•ด์ง์— ๋”ฐ๋ผ, ์ด๋ฅผ ์ ์šฉํ•˜๋Š” ๋ถ„์•ผ(Task)๊ฐ€ ๋„“์–ด์ง€๊ณ , ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก  ๋˜ํ•œ ๊ณ ๋„ํ™”๋˜๊ณ  ์žˆ๋‹ค.

 

DOWNSTREAM TASKS AND DATASETS OF RAG

 

 

A. ํ•˜์œ„ ๊ณผ์ œ (Downstream Task)

RAG์˜ ํ•ต์‹ฌ ์‘์šฉ ๋ถ„์•ผ๋Š” ์—ฌ์ „ํžˆ ์งˆ์˜์‘๋‹ต(QA)์ด์ง€๋งŒ, ๊ทธ ์–‘์ƒ์ด ํ›จ์”ฌ ๋ณต์žกํ•˜๊ณ  ๋‹ค์–‘ํ•ด์กŒ๋‹ค.

  • QA์˜ ์‹ฌํ™”: ์ „ํ†ต์ ์ธ ๋‹จ๋‹ตํ˜•(Single-hop) ์งˆ๋ฌธ์„ ๋„˜์–ด, ์—ฌ๋Ÿฌ ๋ฌธ์„œ์˜ ์ •๋ณด๋ฅผ ์ข…ํ•ฉํ•ด์„œ ์ถ”๋ก ํ•ด์•ผ ํ•˜๋Š” ๋ฉ€ํ‹ฐ ํ™‰(Multi-hop) QA, ํŠน์ • ๋„๋ฉ”์ธ ์ง€์‹์ด ํ•„์š”ํ•œ ๋„๋ฉ”์ธ ํŠนํ™” QA, ๊ทธ๋ฆฌ๊ณ  ๊ธด ํ˜ธํก์˜ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ด์•ผ ํ•˜๋Š” Long-form QA ๋“ฑ์œผ๋กœ ์„ธ๋ถ„ํ™”๋˜์—ˆ๋‹ค.
  • ์˜์—ญ ํ™•์žฅ: RAG๋Š” ์ด์ œ QA๋ฅผ ๋„˜์–ด ํ…์ŠคํŠธ ์š”์•ฝ(Summarization), ์ •๋ณด ์ถ”์ถœ(Information Extraction, IE), ๋Œ€ํ™” ์ƒ์„ฑ(Dialogue Generation), ์ฝ”๋“œ ๊ฒ€์ƒ‰(Code Search) ๋“ฑ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๊ณผ์—…์œผ๋กœ ์˜ํ† ๋ฅผ ํ™•์žฅํ•˜๊ณ  ์žˆ๋‹ค.
์œ„ ํ‘œ๋ฅผ ๋ณด๋ฉด RAG๊ฐ€ ์ƒ์‹ ์ถ”๋ก , ์‚ฌ์‹ค ๊ฒ€์ฆ, ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋“ฑ ์ƒ๊ฐ๋ณด๋‹ค ํ›จ์”ฌ ๋ฐฉ๋Œ€ํ•œ ์˜์—ญ์—์„œ ์“ฐ์ด๊ณ  ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.. ๋„ˆ๋ฌด ๋งŽ์•„์„œ ์„ค๋ช…์€ ์ƒ๋žตํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์›๋ฌธ์„ ์ฐธ์กฐํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

 

B. ํ‰๊ฐ€ ๋ชฉํ‘œ (Evaluation Target)

๊ณผ๊ฑฐ์—๋Š” ์ •๋‹ต ์ผ์น˜ ์—ฌ๋ถ€(EM)๋‚˜ F1 ์ ์ˆ˜ ๊ฐ™์€ ์ „ํ†ต์ ์ธ ์ง€ํ‘œ์— ์˜์กดํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ RAG๋Š” ๊ฒ€์ƒ‰๊ณผ ์ƒ์„ฑ์ด ๊ฒฐํ•ฉ๋œ ๋ณตํ•ฉ ์‹œ์Šคํ…œ์ด๋ฏ€๋กœ, ์ด ๋‘ ๊ฐ€์ง€ ์ถ•์„ ๋ถ„๋ฆฌํ•ด์„œ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ํ‘œ์ค€์ด ๋˜์—ˆ๋‹ค .

  1. ๊ฒ€์ƒ‰ ํ’ˆ์งˆ (Retrieval Quality): ๊ฒ€์ƒ‰ ๋ชจ๋“ˆ์ด ์œ ์šฉํ•œ ๋ฌธ์„œ๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ฐพ์•„์™”๋Š”๊ฐ€? ์ถ”์ฒœ ์‹œ์Šคํ…œ์—์„œ ์“ฐ์ด๋Š” Hit Rate, MRR, NDCG ๊ฐ™์€ ์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์ •๋‹ต ๋ฌธ์„œ๊ฐ€ ์ƒ์œ„ ๋žญํ‚น์— ํฌํ•จ๋˜์—ˆ๋Š”์ง€๋ฅผ ์ธก์ •ํ•œ๋‹ค .
  2. ์ƒ์„ฑ ํ’ˆ์งˆ (Generation Quality): ์ƒ์„ฑ ๋ชจ๋“ˆ์ด ๋ฌธ๋งฅ์„ ์ž˜ ๋ฐ˜์˜ํ•˜์—ฌ ๋‹ต๋ณ€ํ–ˆ๋Š”๊ฐ€? ์ •๋‹ต ๋ผ๋ฒจ์ด ์—†๋Š” ๊ฒฝ์šฐ(Unlabeled)์—๋Š” ๋‹ต๋ณ€์˜ ์ถฉ์‹ค์„ฑ(Faithfulness)๊ณผ ๊ด€๋ จ์„ฑ(Relevance)์„ ๋ณด๊ณ , ๋ผ๋ฒจ์ด ์žˆ๋Š” ๊ฒฝ์šฐ์—๋Š” ์ •ํ™•๋„(Accuracy)๋ฅผ ์ธก์ •ํ•œ๋‹ค .

C. ํ‰๊ฐ€ ์ธก๋ฉด (Evaluation Aspects)

์ด ๋…ผ๋ฌธ์€ RAG ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด 3๊ฐ€์ง€ ํ’ˆ์งˆ ์ ์ˆ˜์™€ 4๊ฐ€์ง€ ํ•„์ˆ˜ ๋Šฅ๋ ฅ์ด๋ผ๋Š” ๊ตฌ์ฒด์ ์ธ ๊ธฐ์ค€์„ ์ œ์‹œํ•œ๋‹ค.

  • 3๋Œ€ ํ’ˆ์งˆ ์ ์ˆ˜ (Quality Scores):
    1. ๋ฌธ๋งฅ ๊ด€๋ จ์„ฑ (Context Relevance): ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๊ฐ€ ์งˆ๋ฌธ๊ณผ ์ง„์งœ ๊ด€๋ จ์ด ์žˆ๋Š”๊ฐ€? ๋ถˆํ•„์š”ํ•œ ์ •๋ณด๋Š” ๋น„์šฉ์„ ๋†’์ด๊ณ  LLM์„ ํ˜ผ๋ž€์Šค๋Ÿฝ๊ฒŒ ํ•˜๋ฏ€๋กœ, ์ •๋ฐ€ํ•œ ๊ฒ€์ƒ‰์ด ํ•„์ˆ˜์ ์ด๋‹ค.
    2. ๋‹ต๋ณ€ ์ถฉ์‹ค์„ฑ (Answer Faithfulness): ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์ด ๊ฒ€์ƒ‰๋œ ๋ฌธ๋งฅ์— ์ฒ ์ €ํžˆ ๊ทผ๊ฑฐํ•˜๊ณ  ์žˆ๋Š”๊ฐ€? ์ด๋Š” RAG์˜ ๊ฐ€์žฅ ํฐ ์ ์ธ 'ํ™˜๊ฐ(Hallucination)'์„ ์žก์•„๋‚ด๋Š” ํ•ต์‹ฌ ์ง€ํ‘œ๋กœ, ๋‹ต๋ณ€์ด ๋ฌธ๋งฅ๊ณผ ๋ชจ์ˆœ๋˜์ง€ ์•Š๋Š”์ง€๋ฅผ ๋ณธ๋‹ค.
    3. ๋‹ต๋ณ€ ๊ด€๋ จ์„ฑ (Answer Relevance): ๋‹ต๋ณ€์ด ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ ์˜๋„์— ๋ถ€ํ•ฉํ•˜๋Š”๊ฐ€? ์—‰๋šฑํ•œ ๋™๋ฌธ์„œ๋‹ต์„ ํ•˜์ง€ ์•Š๊ณ  ํ•ต์‹ฌ์„ ์ฐŒ๋ฅด๋Š”์ง€ ํ‰๊ฐ€ํ•œ๋‹ค.
  • RAG ์‹œ์Šคํ…œ์˜ 4๋Œ€ ํ•„์ˆ˜ ๋Šฅ๋ ฅ (Required Abilities):
    1. ๋…ธ์ด์ฆˆ ๊ฐ•๊ฑด์„ฑ (Noise Robustness): ์งˆ๋ฌธ๊ณผ ๊ด€๋ จ์€ ์žˆ์ง€๋งŒ ์ •๋‹ต ์ •๋ณด๋Š” ์—†๋Š” '๋…ธ์ด์ฆˆ ๋ฌธ์„œ'๊ฐ€ ์„ž์—ฌ ์žˆ์–ด๋„ ํ”๋“ค๋ฆฌ์ง€ ์•Š๋Š”๊ฐ€?
    2. ์†Œ๊ทน์  ๊ฑฐ๋ถ€ (Negative Rejection): ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋“ค์— ์ •๋‹ต์ด ์—†์„ ๋•Œ, ์–ต์ง€๋กœ ์ง€์–ด๋‚ด์ง€ ์•Š๊ณ  "์ •๋ณด๊ฐ€ ๋ถ€์กฑํ•˜์—ฌ ์•Œ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค"๋ผ๊ณ  ๊ฑฐ์ ˆํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€? ์ด๋Š” ์‹œ์Šคํ…œ์˜ ์‹ ๋ขฐ๋„์™€ ์ง๊ฒฐ๋œ๋‹ค.
    3. ์ •๋ณด ํ†ตํ•ฉ (Information Integration): ์—ฌ๋Ÿฌ ๋ฌธ์„œ์— ํฉ์–ด์ง„ ๋‹จํŽธ์ ์ธ ์ •๋ณด๋“ค์„ ์ข…ํ•ฉํ•˜์—ฌ ๋ณต์žกํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?
    4. ๋ฐ˜์‚ฌ์‹ค์  ๊ฐ•๊ฑด์„ฑ (Counterfactual Robustness): ๋ฌธ์„œ ์•ˆ์— ๋ช…๋ฐฑํžˆ ์ž˜๋ชป๋œ ์ •๋ณด(Known inaccuracies)๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์„ ๋•Œ, ์ด๋ฅผ ์‹๋ณ„ํ•˜๊ณ  ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?

D. ํ‰๊ฐ€ ๋ฒค์น˜๋งˆํฌ ๋ฐ ๋„๊ตฌ (Benchmarks and Tools)

์ด๋Ÿฌํ•œ ๋ณต์žกํ•œ ์ง€ํ‘œ๋“ค์„ ์‚ฌ๋žŒ์ด ์ผ์ผ์ด ์ฑ„์ ํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋”ฐ๋ผ์„œ LLM์„ ์‹ฌํŒ๊ด€(Judge)์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ์ž๋™ํ™”๋œ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ๋“ค์ด ๋“ฑ์žฅํ–ˆ๋‹ค.

  • ๋ฒค์น˜๋งˆํฌ: RGB, RECALL, CRUD ๋“ฑ์€ RAG์˜ ํ•„์ˆ˜ ๋Šฅ๋ ฅ(๊ฐ•๊ฑด์„ฑ, ์ •๋ณด ํ†ตํ•ฉ ๋“ฑ)์„ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ๋ฐ์ดํ„ฐ์…‹์ด๋‹ค.
  • ์ž๋™ํ™” ๋„๊ตฌ: RAGAS, ARES, TruLens ๊ฐ™์€ ๋„๊ตฌ๋“ค์€ ๋ฌธ๋งฅ ๊ด€๋ จ์„ฑ, ๋‹ต๋ณ€ ์ถฉ์‹ค์„ฑ ๋“ฑ์˜ ํ’ˆ์งˆ ์ ์ˆ˜๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ๊ณ„์‚ฐํ•ด ์ค€๋‹ค. ์ด๋“ค์€ RAG ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ฐœ์„ ํ•  ๋•Œ ๋‚˜์นจ๋ฐ˜๊ณผ ๊ฐ™์€ ์—ญํ• ์„ ํ•œ๋‹ค.
 

7. Discussion and Future Prospects

 

 

๋…ผ๋ฌธ์€ RAG์˜ ๋ฏธ๋ž˜์— ๋Œ€ํ•ด ๋ช‡ ๊ฐ€์ง€ ํฅ๋ฏธ๋กœ์šด ํ™”๋‘์™€ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•œ๋‹ค.

1. RAG vs ๊ธด ๋ฌธ๋งฅ (RAG vs Long Context)

์ตœ๊ทผ LLM์˜ ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ๊ฐ€ 20๋งŒ(2025๋…„ ํ˜„์žฌ๋Š” 100๋งŒ..) ํ† ํฐ ์ด์ƒ์œผ๋กœ ๊ธ‰๊ฒฉํžˆ ํ™•์žฅ๋˜๋ฉด์„œ, "LLM์ด ์ฑ… ํ•œ ๊ถŒ์„ ํ†ต์งธ๋กœ ์ฝ์„ ์ˆ˜ ์žˆ๋Š”๋ฐ ๊ตณ์ด RAG๊ฐ€ ํ•„์š”ํ•œ๊ฐ€?"๋ผ๋Š” ์˜๋ฌธ์ด ์ œ๊ธฐ๋˜๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๋…ผ๋ฌธ์€ RAG๊ฐ€ ์—ฌ์ „ํžˆ ๋Œ€์ฒด ๋ถˆ๊ฐ€๋Šฅํ•œ ์—ญํ• ์„ ํ•œ๋‹ค๊ณ  ์ฃผ์žฅํ•œ๋‹ค.

  • ํšจ์œจ์„ฑ: ๊ธด ๋ฌธ๋งฅ์„ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์€ ์ถ”๋ก  ์†๋„๋ฅผ ๋А๋ฆฌ๊ฒŒ ๋งŒ๋“ ๋‹ค. ๋ฐ˜๋ฉด RAG๋Š” ํ•„์š”ํ•œ ์ •๋ณด๋งŒ ์ฒญํฌ ๋‹จ์œ„๋กœ ๊ฐ€์ ธ์˜ค๋ฏ€๋กœ ํ›จ์”ฌ ํšจ์œจ์ ์ด๋‹ค.
  • ํˆฌ๋ช…์„ฑ: ๊ธด ๋ฌธ๋งฅ์„ ์ฝ๊ณ  ์ƒ์„ฑํ•œ ๋‹ต๋ณ€์€ ๋‚ด๋ถ€ ์ฒ˜๋ฆฌ๊ฐ€ ๋ธ”๋ž™๋ฐ•์Šค์ธ ๋ฐ˜๋ฉด, RAG๋Š” ์ฐธ์กฐ ๋ฌธ์„œ๋ฅผ ๋ช…ํ™•ํžˆ ์ œ์‹œํ•˜๋ฏ€๋กœ ์‚ฌ์šฉ์ž๊ฐ€ ๋‹ต๋ณ€์„ ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ๋‹ค.

2. RAG ๊ฐ•๊ฑด์„ฑ (RAG Robustness)

๊ฒ€์ƒ‰๋œ ์ •๋ณด์— ๋…ธ์ด์ฆˆ๋‚˜ ์ž˜๋ชป๋œ ์ •๋ณด๊ฐ€ ์„ž์—ฌ ์žˆ์„ ๋•Œ RAG ํ’ˆ์งˆ์ด ์ €ํ•˜๋˜๋Š” ๋ฌธ์ œ๋Š” ์—ฌ์ „ํ•˜๋‹ค. ์ €์ž๋“ค์€ "์ž˜๋ชป๋œ ์ •๋ณด๋Š” ์ •๋ณด๊ฐ€ ์—†๋Š” ๊ฒƒ๋ณด๋‹ค ๋” ๋‚˜์˜๋‹ค(Misinformation can be worse than no information at all)"๋ผ๊ณ  ๊ฒฝ๊ณ ํ•œ๋‹ค. ํฅ๋ฏธ๋กœ์šด ์ ์€ ๊ด€๋ จ ์—†๋Š” ๋ฌธ์„œ๊ฐ€ ํฌํ•จ๋˜์—ˆ์„ ๋•Œ ์˜คํžˆ๋ ค ์ •ํ™•๋„๊ฐ€ 30% ์ด์ƒ ์ฆ๊ฐ€ํ–ˆ๋‹ค๋Š” ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋„ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Š” ์šฐ๋ฆฌ๊ฐ€ "๋…ธ์ด์ฆˆ"๋ผ๊ณ  ๋ถ€๋ฅด๋Š”, "๊ด€๋ จ ์—†์–ด ๋ณด์ด๋Š”" ์ •๋ณด๋“ค์ด ๋ฌด์กฐ๊ฑด ํ•ด๋กœ์šด ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๊ฒฝ์šฐ์— ๋”ฐ๋ผ ๋ชจ๋ธ์˜ ์ถ”๋ก  ๋‹ค์–‘์„ฑ์„ ๋†’์ผ ์ˆ˜๋„ ์žˆ์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฒ€์ƒ‰๋œ ์ •๋ณด์™€ ์ƒ์„ฑ ๋ชจ๋ธ์„ ์–ด๋–ป๊ฒŒ ์œ ๊ธฐ์ ์œผ๋กœ ํ†ตํ•ฉํ• ์ง€์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฐ€ ๋” ํ•„์š”ํ•˜๋‹ค.

 

3. ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ ‘๊ทผ (Hybrid Approaches)

RAG์™€ ํŒŒ์ธ ํŠœ๋‹(Fine-tuning)์„ ๊ฒฐํ•ฉํ•˜๋Š” ๊ฒƒ์ด ๋Œ€์„ธ๊ฐ€ ๋˜๊ณ  ์žˆ๋‹ค.

  • RAG์™€ ํŒŒ์ธ ํŠœ๋‹์„ ์ˆœ์ฐจ์ ์œผ๋กœ ํ• ์ง€, ๊ต๋Œ€๋กœ ํ• ์ง€, ์•„๋‹ˆ๋ฉด ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€(End-to-End) ๊ณต๋™ ํ•™์Šต์‹œํ‚ฌ์ง€์— ๋Œ€ํ•œ ์ตœ์ ์˜ ์กฐํ•ฉ์„ ์ฐพ๋Š” ๊ฒƒ์ด ์—ฐ๊ตฌ ๊ณผ์ œ๋‹ค.
  • ๋˜ํ•œ, CRAG์ฒ˜๋Ÿผ ๊ฒฝ๋Ÿ‰ํ™”๋œ ํ‰๊ฐ€ ๋ชจ๋ธ์„ ๋„์ž…ํ•˜์—ฌ ๊ฒ€์ƒ‰ ํ’ˆ์งˆ์„ ํŒ๋‹จํ•˜๊ฒŒ ํ•˜๋Š” ๋“ฑ, RAG ์‹œ์Šคํ…œ ๋‚ด์— ์ž‘์€ ์ „๋ฌธ ๋ชจ๋ธ(+ sLLM)์„ ํ†ตํ•ฉํ•˜๋Š” ์ถ”์„ธ๋„ ๋‚˜ํƒ€๋‚˜๊ณ  ์žˆ๋‹ค.

4. RAG์˜ ์Šค์ผ€์ผ๋ง ๋ฒ•์น™ (Scaling Laws of RAG)

LLM์€ GPT3 ์ดํ›„๋กœ, ๋ชจ๋ธ, ์ฆ‰, ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ข‹์•„์ง„๋‹ค๋Š” '์Šค์ผ€์ผ๋ง ๋ฒ•์น™'์ด ํ™•๋ฆฝ๋˜์–ด ์žˆ์ง€๋งŒ, RAG์—๋„ ์ด๊ฒƒ์ด ์ ์šฉ๋˜๋Š”์ง€๋Š” ๋ฏธ์ง€์ˆ˜๋‹ค.

  • ์˜คํžˆ๋ ค ์ž‘์€ ๋ชจ๋ธ์ด ์ผ๋ถ€ ํฐ ๋ชจ๋ธ๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” '์—ญ ์Šค์ผ€์ผ๋ง ๋ฒ•์น™(Inverse Scaling Law)'์˜ ๊ฐ€๋Šฅ์„ฑ๋„ ์ œ๊ธฐ๋˜๊ณ  ์žˆ์–ด, ์ด์— ๋Œ€ํ•œ ์‹ฌ๋„ ์žˆ๋Š” ์กฐ์‚ฌ๊ฐ€ ํ•„์š”ํ•ด๋ณด์ธ๋‹ค.

5. ์ƒ์šฉํ™” ์ค€๋น„ ๋ฐ ์ƒํƒœ๊ณ„ (Production-Ready RAG & Ecosystem)

RAG๊ฐ€ ์‹คํ—˜์‹ค์„ ๋ฒ—์–ด๋‚˜ ์‹ค์ œ ์„œ๋น„์Šค(Production)๋กœ ๋‚˜์•„๊ฐ€๊ธฐ ์œ„ํ•œ ์š”๊ฑด๋“ค์ด๋‹ค.

  • ์—”์ง€๋‹ˆ์–ด๋ง ๊ณผ์ œ: ๋Œ€๊ทœ๋ชจ ์ง€์‹ ๋ฒ ์ด์Šค์—์„œ์˜ ๋ฌธ์„œ ์žฌํ˜„์œจ(Recall) ํ–ฅ์ƒ, ๊ฒ€์ƒ‰ ์†๋„ ๊ฐœ์„ , ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ ๋ณด์•ˆ(LLM์ด ์‹ค์ˆ˜๋กœ ๋ฏผ๊ฐ ์ •๋ณด๋ฅผ ์œ ์ถœํ•˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๊ฒƒ) ๋“ฑ์ด ํ•ด๊ฒฐํ•ด์•ผ ํ•  ์ˆ™์ œ์ด๋‹ค.
  • ์ƒํƒœ๊ณ„: LangChain, LlamaIndex ๊ฐ™์€ ๋„๊ตฌ๋“ค์ด RAG ๊ฐœ๋ฐœ์˜ ํ‘œ์ค€ ๊ธฐ์ˆ  ์Šคํƒ์œผ๋กœ ์ž๋ฆฌ ์žก์•˜์œผ๋ฉฐ, Flowise AI ๊ฐ™์€ ๋กœ์šฐ์ฝ”๋“œ ํ”Œ๋žซํผ์ด๋‚˜ Weaviate Verba ๊ฐ™์€ ๊ฐœ์ธํ™” ๋น„์„œ ์„œ๋น„์Šค ๋“ฑ์œผ๋กœ ์ƒํƒœ๊ณ„๊ฐ€ ๋ถ„ํ™”ํ•˜๊ณ  ์žˆ๋‹ค.
  • ๋ฐœ์ „ ๋ฐฉํ–ฅ: ๋งž์ถคํ™”(Customization), ๋‹จ์ˆœํ™”(Simplification), ์ „๋ฌธํ™”(Specialization)์˜ ์„ธ ๊ฐ€์ง€ ๋ฐฉํ–ฅ์œผ๋กœ ๊ธฐ์ˆ  ์Šคํƒ์ด ์ง„ํ™”ํ•˜๊ณ  ์žˆ๋‹ค.

6. ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ RAG (Multi-modal RAG)

๋งˆ์ง€๋ง‰์œผ๋กœ RAG๋Š” ํ…์ŠคํŠธ์˜ ๊ฒฝ๊ณ„๋ฅผ ๋„˜์–ด ๋‹ค์–‘ํ•œ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๋กœ ํ™•์žฅ ์ค‘์ด๋‹ค.

  • ์ด๋ฏธ์ง€: RA-CM3๋‚˜ BLIP-2์ฒ˜๋Ÿผ ํ…์ŠคํŠธ์™€ ์ด๋ฏธ์ง€๋ฅผ ํ•จ๊ป˜ ๊ฒ€์ƒ‰ํ•˜๊ณ  ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ๋“ค์ด ๋“ฑ์žฅํ–ˆ๋‹ค .
  • ์˜ค๋””์˜ค/๋น„๋””์˜ค: ์Œ์„ฑ์„ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•  ๋•Œ ์™ธ๋ถ€ ์ง€์‹์„ ์ฐธ์กฐํ•˜๊ฑฐ๋‚˜(UEOP), ๋น„๋””์˜ค์˜ ํƒ€์ž„๋ผ์ธ์„ ์˜ˆ์ธกํ•˜๊ณ  ์„ค๋ช…ํ•˜๋Š” ๋ฐ RAG๋ฅผ ํ™œ์šฉํ•œ๋‹ค.
  • ์ฝ”๋“œ: ๊ฐœ๋ฐœ์ž์˜ ์˜๋„์— ๋งž๋Š” ์ฝ”๋“œ ์˜ˆ์ œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜์—ฌ ํ”„๋กœ๊ทธ๋ž˜๋ฐ์„ ๋•๋Š” RBPS ๊ฐ™์€ ๋ชจ๋ธ์ด ์žˆ๋‹ค. ๊ตฌ์กฐํ™”๋œ ์ง€์‹์„ ๋‹ค๋ฃจ๋Š” CoK(Chain of Knowledge)๋Š” ์ง€์‹ ๊ทธ๋ž˜ํ”„์—์„œ ํŒฉํŠธ๋ฅผ ์ถ”์ถœํ•ด ์ฝ”๋“œ ์ƒ์„ฑ์ด๋‚˜ ์ถ”๋ก ์„ ๋•๋Š”๋‹ค.
 

8. Conclusion

 

Fig. 6. Summary of RAG ecosystem

 

์ด ๋…ผ๋ฌธ์€ RAG๊ฐ€ ๋‹จ์ˆœํžˆ "๊ฒ€์ƒ‰ํ•ด์„œ ๋ถ™์—ฌ๋„ฃ๊ธฐ"ํ•˜๋Š” ๊ธฐ์ˆ ์„ ๋„˜์–ด, LLM์˜ ํ•œ๊ณ„๋ฅผ ๋ณด์™„ํ•˜๊ณ  ์™ธ๋ถ€ ์ง€์‹์„ ๋Šฅ๋™์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ๊ฑฐ๋Œ€ํ•œ ์ธ์ง€ ์•„ํ‚คํ…์ฒ˜๋กœ ์ง„ํ™”ํ•˜๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.

์ดˆ๊ธฐ RAG๊ฐ€ ๋ฌธ์„œ๋ฅผ ์ž˜ ์ฐพ์•„์˜ค๋Š” ๊ฒƒ์— ์ง‘์ค‘ํ–ˆ๋‹ค๋ฉด, ์ด์ œ๋Š” ๋ชจ๋“ˆํ™”๋œ ๊ตฌ์กฐ ์†์—์„œ ๊ฒ€์ƒ‰๊ณผ ์ƒ์„ฑ์„ ์ •๊ตํ•˜๊ฒŒ ์กฐ์œจํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋‚˜์•„๊ฐ€๊ณ  ์žˆ๋‹ค. LLM์˜ ๋ฌธ๋งฅ ์œˆ๋„์šฐ๊ฐ€ ์•„๋ฌด๋ฆฌ ๋Š˜์–ด๋‚˜๋„, ์ธ๊ฐ„์ด ๋ชจ๋“  ์ฑ…์„ ์™ธ์šฐ๊ณ  ๋‹ค๋‹ ์ˆ˜ ์—†๋“ฏ ๋ฐฉ๋Œ€ํ•œ ์™ธ๋ถ€ ์ง€์‹์„ ํšจ์œจ์ ์œผ๋กœ ์ฐธ์กฐํ•˜๋Š” RAG์˜ ๊ฐ€์น˜๋Š” ์‚ฌ๋ผ์ง€์ง€ ์•Š์„ ๊ฒƒ์ด๋‹ค. ์•ž์œผ๋กœ RAG๋Š” ๋”์šฑ ๋‹ค์–‘ํ•œ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ์™€ ๊ฒฐํ•ฉํ•˜๋ฉฐ AI์˜ ์‹ค์งˆ์ ์ธ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ๋†’์ด๋Š” ํ•ต์‹ฌ ์—”์ง„์œผ๋กœ ์ž๋ฆฌ ์žก์„ ๊ฒƒ์ด๋ผ ๊ธฐ๋Œ€ํ•œ๋‹ค.

 

์ด ๋…ผ๋ฌธ์€ ์‹ค๋ฌด์—์„œ ํ•ต์‹ฌ ๊ธฐ์ˆ ๋กœ ์“ฐ์ด๋Š” RAG์˜ ์ง„ํ™” ๊ณผ์ •๊ณผ ํ–ฅํ›„ ๋ฐœ์ „ ๋ฐฉํ–ฅ๊นŒ์ง€, ๊ฐ€์ด๋“œ๋ผ์ธ์„ ์ œ์‹œํ•ด์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
๊ผญ ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ณด์‹œ๊ณ  reference๋ฅผ ๋”ฐ๋ผ ์—ฌ๋Ÿฌ ์ตœ์‹  ๋ฐฉ๋ฒ•๋ก ๋“ค๋„ ํ•จ๊ป˜ ์‚ดํŽด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
์ € ๊ฐœ์ธ์ ์œผ๋กœ๋„ ์•ฝ 2๋…„ ์ „, ์„์‚ฌ๋กœ ์—ฐ๊ตฌ์› ์ƒํ™œ์„ ์‹œ์ž‘ํ•  ๋•Œ ์ด ๋…ผ๋ฌธ์„ ์ฝ๊ณ  ์ „์ฒด์ ์ธ ํ๋ฆ„์„ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ๋„์›€์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค.
์‹ค์ œ๋กœ RAG๋ฅผ ๋‹ค๋ฃจ๋ฉฐ ์—ฐ๊ตฌ๋„ ํ•ด๋ณด๊ณ  ๊ธฐ์—…๊ณผ ํ”„๋กœ์ ํŠธ๋กœ ์‘์šฉ ์‹œ์Šคํ…œ๋„ ๊ฐœ๋ฐœํ•œ ๊ฒฝํ—˜์ด ์Œ“์ธ ๋’ค์— ์ด ๋…ผ๋ฌธ์„ ์ฝ์œผ๋‹ˆ ๋˜ ์ƒˆ๋กญ๋„ค์š”.
๊ฐํžˆ ํ•œ ๋ง์”€ ๋“œ๋ฆฌ์ž๋ฉด, RAG์˜ ํ•ต์‹ฌ์€ ๊ฒฐ๊ตญ ๋‚ด๊ฐ€ ํ’€๊ณ ์ž ํ•˜๋Š” ๋ฌธ์ œ์™€ ์ฃผ์–ด์ง„ ์ž์›(๋ฐ์ดํ„ฐ ๋“ฑ)์— ๋งž๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์„ค๊ณ„ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. RAG๋Š” ์—ฌ๋Ÿฌ ์š”์†Œ๋ฅผ ํƒˆ๋ถ€์ฐฉํ•ด๊ฐ€๋ฉฐ ์ž์‹ ์˜ ์„ค๊ณ„์— ๋งž๊ฒŒ ์กฐ๋ฆฝํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋งŒํผ ๋‹ค์–‘ํ•œ ์‹œ๋„์™€ ์‘์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ธฐ์— ์ถฉ๋ถ„ํžˆ ๋งŽ์€ ์‹œ๋„๋ฅผ ํ•ด๊ฐ€๋ฉฐ ๋‚ด๊ฐ€ ํ’€์–ด์•ผํ•˜๋Š” ๋ฌธ์ œ์— ๋งž๋Š” ์ตœ์ ์˜ ์„ค๊ณ„๋„๋ฅผ ์ฐพ์•„ ๋‚˜๊ฐ€๋Š” ๊ณผ์ •์„ ๊ฒฝํ—˜ํ•ด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค! (๊ผญ sota๋ผ๊ณ  ๋‹ค ์ข‹์€ ๊ฒƒ๋„ ์•„๋‹ˆ์ฃ .)

๋ชจ๋“  ํ•™์ƒ, ์—ฐ๊ตฌ์ž, ์‹ค๋ฌด์ž ๋ถ„๋“ค ์‘์›ํ•ฉ๋‹ˆ๋‹ค!

 

 

https://arxiv.org/abs/2312.10997

 

Retrieval-Augmented Generation for Large Language Models: A Survey

Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by

arxiv.org

 

 

 

'๐Ÿ‘จโ€๐Ÿ’ป About AI > Paper Review' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[NLP] Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study  (0) 2025.12.13
[LTSF] Less Is More - Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures (2022)  (0) 2023.04.28
[LTSF] RNN, LSTM(Long Short-Term Memory)  (0) 2023.04.06
[LTSF] Are Transformers Effective for Time Series Forecasting?(2022)  (0) 2023.04.02
[LTSF] Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting (2021)  (3) 2023.03.27
'๐Ÿ‘จ‍๐Ÿ’ป About AI/Paper Review' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [NLP] Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study
  • [LTSF] Less Is More - Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures (2022)
  • [LTSF] RNN, LSTM(Long Short-Term Memory)
  • [LTSF] Are Transformers Effective for Time Series Forecasting?(2022)
reign
reign
Business์™€ AI๋ฅผ ๋ฆฌ๋ทฐํ•ฉ๋‹ˆ๋‹ค
  • reign
    Biz with Data
    reign
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (25)
      • ๐Ÿคต About Business (6)
        • BIZ ์นผ๋Ÿผ (6)
      • ๐Ÿ‘จ‍๐Ÿ’ป About AI (19)
        • AI ์นผ๋Ÿผ (4)
        • Paper Review (8)
        • ๋จธ์‹ ๋Ÿฌ๋‹ (7)
  • ๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

    • ํ™ˆ
    • ํƒœ๊ทธ
    • ๋ฐฉ๋ช…๋ก
  • ๋งํฌ

    • github
  • ์ตœ๊ทผ ๋Œ“๊ธ€

  • ์ตœ๊ทผ ๊ธ€

  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.5
reign
[NLP] Retrieval-Augmented Generation for Large Language Models: A Survey
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”