Latest revision as of 10:19, 28 May 2025

YouTube ... Quora ...Google search ...Google News ...Bing News

Architectures for AI ... Generative AI Stack ... Enterprise Architecture (EA) ... Enterprise Portfolio Management (EPM) ... Architecture and Interior Design
Artificial Intelligence (AI) ... Generative AI ... Machine Learning (ML) ... Deep Learning ... Neural Network ... Reinforcement ... Learning Techniques
Large Language Model (LLM) ... Multimodal ... Foundation Models (FM) ... Generative Pre-trained ... Transformer ... Attention ... GAN ... BERT
The Modern AI Stack: Design Principles for the Future of Enterprise AI Architectures - Menlo Ventures

Excerpt from... Emerging Architectures for LLM Applications | Matt Bornstein & Rajko Radovanovic - andreessen horowitz ... Large language models are a powerful new primitive for building software. But since they are so new—and behave so differently from normal computing resources—it’s not always obvious how to use them. At a very high level, the workflow can be divided into three stages:

Data preprocessing / embedding: This stage involves storing private data (legal documents, in our example) to be retrieved later. Typically, the documents are broken into chunks, passed through an embedding model, then stored in a specialized database called a vector database.

Contextual data Contextual data for LLM apps can come in a variety of formats, including text documents, PDFs, CSVs, and SQL tables.
- Data pipelines Data pipelines with LLMs are a way to use large language models (LLMs) to process and analyze large amounts of data. LLMs can be used to extract information from text, translate languages, and answer questions.
  - Databricks ... direct file access and direct native support for Python, data science and AI frameworks
  - Airflow ... an open-source project that lets you programmatically author, schedule, and monitor your data pipelines using Python
  - Unstructured ... is data that does not have a predefined schema, such as text documents, images, and audio files.
- Embedding model
  - OpenAI ... called text-embedding-ada-002, 2nd generation embedding model
  - Cohere ... a large language model (LLM) that is trained on a massive dataset of text and code. It can be used to represent the meaning of text as a list of numbers, which is useful for comparing text for similarity, clustering text, and classifying text.
  - Hugging Face
- Vector database
  - Pinecone ... provide long-term memory for storing and query vector embeddings, a type of data that represents semantic information
  - Weaviate ... open source vector database that allows storing and retrieving data objects based on their semantic properties by indexing them with vectors
  - ChromaDB ... the open-source embedding database, build Python or JavaScript LLM apps with memory

Prompt construction / retrieval: When a user submits a query (a legal question, in this case), the application constructs a series of prompts to submit to the language model. A compiled prompt typically combines a prompt template hard-coded by the developer; examples of valid outputs called few-shot examples; any necessary information retrieved from external APIs; and a set of relevant documents retrieved from the vector database.

Prompt Few-shot examples Most developers start new projects by experimenting with simple prompts, consisting of direct instructions (zero-shot prompting) or possibly some example outputs (few-shot prompting).
- Playground
  - OpenAI ... a web-based tool that makes it easy to test prompts and get familiar with how the API works.
  - nat.dev ... An LLM playground you can run on your laptop; Use any model from OpenAI, Anthropic, Cohere, Forefront, HuggingFace, Aleph Alpha, Replicate, Banana and llama.cpp. Open Playground - GitHub
  - Humanloop ... use the playground to experiment with new prompts, collect model generated data and user feedback, and finetune models
- Orchestration
  - LangChain ... chain together different components to create more advanced use cases around LLMs
  - LlamaIndex ... data framework that allows users to connect custom data sources to Large Language Model (LLM)s. It provides tools to structure data, offers data connectors to ingest existing data sources and data formats (APIs, PDFs, docs, SQL, etc.), and provides an advanced retrieval/query interface over the data.
  - ChatGPT ... creates chat-based applications.
- APIs/plugins
  - Serp ... giving ChatGPT the ability to use web browsing, python code execution, and custom plugins
  - Wolfram... Simple API, Short Answers API, Spoken Results API, Full Results API, & Conversational API
  - Zapier ... automates 5,000+ app integrations

Prompt execution / inference: Once the prompts have been compiled, they are submitted to a pre-trained LLM for inference—including both proprietary model APIs and open-source or self-trained models. Some developers also add operational systems like logging, caching, and validation at this stage.

LLM cache
- Redis ... (remote dictionary server) provides an adaptive prompt creation mechanism based on the current context, which helps overcome the context length limitations of LLMs
- SQLite ... disk-based storage to cache LLM prompts and responses for LangChain
- GPTCache ... uses different storage backends, such as Redis, SQLite, or MinIO, to cache LLM prompts
Logging/LLMops
- Weights & Biases (W&B) ... improve prompt engineering with visually interactive evaluation loops.
- MLflow
- PromptLayer
- Helicone
Validation
- Guardrails
- Rebuff
- Microsoft Guidance
- LMQL
App hosting
- Vercel
- Steamship
- Streamlit
- Modal
LM APIs (proprietary)
- OpenAI
- Anthropic
LLM APIs (open)
- Hugging Face
- Replicate
Cloud providers
- AWS
- GCP
- Azure
- CoreWeave
Opinionated clouds
- Databricks
- Anyscale
- Mosaic
- Modal
- RunPod

'https://i0.wp.com/a16z.com/wp-content/uploads/2023/06/2657-Emerging-LLM-App-Stack-R2-1-of-4-2.png'

The Building Blocks of Generative AI

The Building Blocks of Generative AI | Jonathan Shriftman - Medium

'https://miro.medium.com/v2/resize:fit:828/format:webp/1*HGK_A6g9EuQqQENdPP_e_g.jpeg'

Analyzing Layers and Players

Winning in the Generative AI Stack: Analyzing Layers and Players, Including the Rise of E2E AI Applications | Liat Ben-Zur - LinkedIn

'https://media.licdn.com/dms/image/D5612AQHe75n0tww_eA/article-cover_image-shrink_600_2000/0/1677813607702?e=1700697600&v=beta&t=WQtreRkX1abXMbXXf5To_45ij8Rj1OKqm5JHzEHEZCA'

The Missing Link in Generative AI

The Missing Link in Generative AI | A. Paka, K. Gade & K. Kenthapadi - Fiddler

'https://global-uploads.webflow.com/5e069c660d37a18c17207c29/6488b386000b9dfa5245ddf3_the-new-generative-ai-stack-1.png'

The New Infrastructure Stack for Generative AI

The New Infrastructure Stack for Generative AI | Matt Lu & Amanda Robs - Cowboy Ventures - Medium

'https://miro.medium.com/v2/resize:fit:828/format:webp/1*Wx3APBj_i6P385EMmY4_Mg.png'

@@ Line 2: / Line 2: @@
 |title=PRIMO.ai
 |titlemode=append
-|keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
+|keywords=ChatGPT, artificial, intelligence, machine, learning, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
 <!-- Google tag (gtag.js) -->
@@ Line 21: / Line 21: @@
 * [[Architectures]] for AI ... [[Generative AI Stack]] ... [[Enterprise Architecture (EA)]] ... [[Enterprise Portfolio Management (EPM)]] ... [[Architecture and Interior Design]]
-* [[Large Language Model (LLM)]] ... [[Large Language Model (LLM)#Multimodal|Multimodal]] ... [[Foundation Models (FM)]] ... [[Generative Pre-trained Transformer (GPT)|Generative Pre-trained]] ... [[Transformer]] ... [[GPT-4]] ... [[GPT-5]] ... [[Attention]] ... [[Generative Adversarial Network (GAN)|GAN]] ... [[Bidirectional Encoder Representations from Transformers (BERT)|BERT]]
+* [[What is Artificial Intelligence (AI)? | Artificial Intelligence (AI)]] ... [[Generative AI]] ... [[Machine Learning (ML)]] ... [[Deep Learning]] ... [[Neural Network]] ... [[Reinforcement Learning (RL)|Reinforcement]] ... [[Learning Techniques]]
+* [[Large Language Model (LLM)]] ... [[Large Language Model (LLM)#Multimodal|Multimodal]] ... [[Foundation Models (FM)]] ... [[Generative Pre-trained Transformer (GPT)|Generative Pre-trained]] ... [[Transformer]] ... [[Attention]] ... [[Generative Adversarial Network (GAN)|GAN]] ... [[Bidirectional Encoder Representations from Transformers (BERT)|BERT]]
+* [https://menlovc.com/perspective/the-modern-ai-stack-design-principles-for-the-future-of-enterprise-ai-architectures/  The Modern AI Stack: Design Principles for the Future of Enterprise AI Architectures - Menlo Ventures]
 <i>Excerpt from...</i> [https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/ Emerging Architectures for LLM Applications | Matt Bornstein & Rajko Radovanovic - andreessen horowitz] ... Large language models are a powerful new primitive for building software. But since they are so new—and behave so differently from normal computing resources—it’s not always obvious how to use them. At a very high level, the workflow can be divided into three stages:
@@ Line 39: / Line 41: @@
 *** [[Hugging Face]]
 ** <b>Vector database</b>
-*** [[Database#Pinecone|Pinecone]]  ... provide long-term memory for storing and query vector [[embedding]]s, a type of data that represents semantic information
+*** [[Database#Pinecone|Pinecone]]  ... provide long-term [[memory]] for storing and query vector [[embedding]]s, a type of data that represents semantic information
 *** [https://weaviate.io/ Weaviate] ... open source vector database that allows storing and retrieving data objects based on their semantic properties by indexing them with vectors
-*** [https://pypi.org/project/chromadb/ ChromaDB] ... the open-source [[embedding]] database, build Python or JavaScript LLM apps with memory
+*** [https://pypi.org/project/chromadb/ ChromaDB] ... the open-source [[embedding]] database, build Python or JavaScript LLM apps with [[memory]]
@@ Line 103: / Line 105: @@
-<img src="https://i0.wp.com/a16z.com/wp-content/uploads/2023/06/2657-Emerging-LLM-App-Stack-R2-1-of-4-2.png" width="1000">
+<center><img src="https://i0.wp.com/a16z.com/wp-content/uploads/2023/06/2657-Emerging-LLM-App-Stack-R2-1-of-4-2.png" width="1000"><br>'https://i0.wp.com/a16z.com/wp-content/uploads/2023/06/2657-Emerging-LLM-App-Stack-R2-1-of-4-2.png'</center>
 = The Building Blocks of Generative AI =
 * [https://shriftman.medium.com/the-building-blocks-of-generative-ai-a75350466a2f The Building Blocks of Generative AI | Jonathan Shriftman - Medium]
-https://miro.medium.com/v2/resize:fit:828/format:webp/1*HGK_A6g9EuQqQENdPP_e_g.jpeg
+<center><img src="https://miro.medium.com/v2/resize:fit:828/format:webp/1*HGK_A6g9EuQqQENdPP_e_g.jpeg" width="1000"><br>'https://miro.medium.com/v2/resize:fit:828/format:webp/1*HGK_A6g9EuQqQENdPP_e_g.jpeg'</center>
+= Analyzing Layers and Players =
+* [https://www.linkedin.com/pulse/winning-generative-ai-stack-analyzing-layers-players-rise-ben-zur/ Winning in the Generative AI Stack: Analyzing Layers and Players, Including the Rise of E2E AI Applications | Liat Ben-Zur - LinkedIn]
+<center><img src="https://media.licdn.com/dms/image/D5612AQHe75n0tww_eA/article-cover_image-shrink_600_2000/0/1677813607702?e=1700697600&v=beta&t=WQtreRkX1abXMbXXf5To_45ij8Rj1OKqm5JHzEHEZCA" width="1000"><br>'https://media.licdn.com/dms/image/D5612AQHe75n0tww_eA/article-cover_image-shrink_600_2000/0/1677813607702?e=1700697600&v=beta&t=WQtreRkX1abXMbXXf5To_45ij8Rj1OKqm5JHzEHEZCA'</center>
+= The Missing Link in Generative AI =
+* [https://www.fiddler.ai/blog/the-missing-link-in-generative-ai The Missing Link in Generative AI | A. Paka, K. Gade & K. Kenthapadi - Fiddler]
+<center><img src="https://global-uploads.webflow.com/5e069c660d37a18c17207c29/6488b386000b9dfa5245ddf3_the-new-generative-ai-stack-1.png" width="1000"><br>'https://global-uploads.webflow.com/5e069c660d37a18c17207c29/6488b386000b9dfa5245ddf3_the-new-generative-ai-stack-1.png'</center>
+= The New Infrastructure Stack for Generative AI =
+* [https://medium.com/cowboy-ventures/the-new-infra-stack-for-generative-ai-9db8f294dc3f The New Infrastructure Stack for Generative AI | Matt Lu & Amanda Robs - Cowboy Ventures - Medium]
+<center><img src="https://miro.medium.com/v2/resize:fit:828/format:webp/1*Wx3APBj_i6P385EMmY4_Mg.png" width="1000"><br>'https://miro.medium.com/v2/resize:fit:828/format:webp/1*Wx3APBj_i6P385EMmY4_Mg.png'</center>

Difference between revisions of "Generative AI Stack"

Latest revision as of 10:19, 28 May 2025

Contents

The Building Blocks of Generative AI

Analyzing Layers and Players

The Missing Link in Generative AI

The New Infrastructure Stack for Generative AI

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools