openai-cookbook/examples/data/parsed_pdf_docs.json

[{"filename": "rag-deck.pdf", "text": "RAG\nTechnique\n\nFebruary 2024\n\n\fOverview\n\nRetrieval-Augmented Generation \nenhances the capabilities of language \nmodels by combining them with a \nretrieval system. This allows the model \nto leverage external knowledge sources \nto generate more accurate and \ncontextually relevant responses.\n\nExample use cases\n\n- Provide answers with up-to-date \n\ninformation\n\n- Generate contextual responses\n\nWhat we\u2019ll cover\n\n\u25cf Technical patterns\n\n\u25cf Best practices\n\n\u25cf Common pitfalls\n\n\u25cf Resources\n\n3\n\n\fWhat is RAG\n\nRetrieve information to Augment the model\u2019s knowledge and Generate the output\n\n\u201cWhat is your \nreturn policy?\u201d\n\nask\n\nresult\n\nsearch\n\nLLM\n\nreturn information\n\nTotal refunds: 0-14 days\n50% of value vouchers: 14-30 days\n$5 discount on next order: > 30 days\n\n\u201cYou can get a full refund up \nto 14 days after the \npurchase, then up to 30 days \nyou would get a voucher for \nhalf the value of your order\u201d\n\nKnowledge \nBase / External \nsources\n\n4\n\n\fWhen to use RAG\n\nGood for  \u2705\n\nNot good for  \u274c\n\n\u25cf\n\n\u25cf\n\nIntroducing new information to the model \n\n\u25cf\n\nTeaching the model a speci\ufb01c format, style, \n\nto update its knowledge\n\nReducing hallucinations by controlling \n\ncontent\n\n/!\\ Hallucinations can still happen with RAG\n\nor language\n\u2794 Use \ufb01ne-tuning or custom models instead\n\n\u25cf\n\nReducing token usage\n\u2794 Consider \ufb01ne-tuning depending on the use \n\ncase\n\n5\n\n\fTechnical patterns\n\nData preparation\n\nInput processing\n\nRetrieval\n\nAnswer Generation\n\n\u25cf Chunking\n\n\u25cf\n\n\u25cf\n\nEmbeddings\n\nAugmenting \ncontent\n\n\u25cf\n\nInput \naugmentation\n\n\u25cf NER\n\n\u25cf\n\nSearch\n\n\u25cf Context window\n\n\u25cf Multi-step \nretrieval\n\n\u25cf Optimisation\n\n\u25cf\n\nSafety checks\n\n\u25cf\n\nEmbeddings\n\n\u25cf Re-ranking\n\n6\n\n\fTechnical patterns\nData preparation\n\nchunk documents into multiple \npieces for easier consumption\n\ncontent\n\nembeddings\n\n0.983, 0.123, 0.289\u2026\n\n0.876, 0.145, 0.179\u2026\n\n0.983, 0.123, 0.289\u2026\n\nAugment content \nusing LLMs\n\nEx: parse text only, ask gpt-4 to rephrase & \nsummarize each part, generate bullet points\u2026\n\nBEST PRACTICES\n\nPre-process content for LLM \nconsumption: \nAdd summary, headers for each \npart, etc.\n+ curate relevant data sources\n\nKnowledge \nBase\n\nCOMMON PITFALLS\n\n\u2794 Having too much low-quality \n\ncontent\n\n\u2794 Having too large documents\n\n7\n\n\fTechnical patterns\nData preparation: chunking\n\nWhy chunking?\n\nIf your system doesn\u2019t require \nentire documents to provide \nrelevant answers, you can \nchunk them into multiple pieces \nfor easier consumption (reduced \ncost & latency).\n\nOther approaches: graphs or \nmap-reduce\n\nThings to consider\n\n\u25cf\n\nOverlap:\n\n\u25cb\n\n\u25cb\n\nShould chunks be independent or overlap one \nanother?\nIf they overlap, by how much?\n\n\u25cf\n\nSize of chunks: \n\n\u25cb What is the optimal chunk size for my use case?\n\u25cb\n\nDo I want to include a lot in the context window or \njust the minimum?\n\n\u25cf Where to chunk:\n\n\u25cb\n\n\u25cb\n\nShould I chunk every N tokens or use speci\ufb01c \nseparators? \nIs there a logical way to split the context that would \nhelp the retrieval process?\n\n\u25cf What to return:\n\n\u25cb\n\n\u25cb\n\nShould I return chunks across multiple documents \nor top chunks within the same doc?\nShould chunks be linked together with metadata to \nindicate common properties?\n\n8\n\n\fTechnical patterns\nData preparation: embeddings\n\nWhat to embed?\n\nDepending on your use case \nyou might not want just to \nembed the text in the \ndocuments but metadata as well \n- anything that will make it easier \nto surface this speci\ufb01c chunk or \ndocument when performing a \nsearch\n\nExamples\n\nEmbedding Q&A posts in a forum\nYou might want to embed the title of the posts, \nthe text of the original question and the c