Module 2 Quiz: AI Fundamentals

Test your knowledge of how AI actually works

Question 1: What is the primary advantage of GPUs over CPUs for AI training?

Explanation: GPUs were originally designed for rendering graphics, which requires massive parallel computation. This same capability makes them ideal for the matrix multiplications used in neural network training - they can perform thousands of calculations at once, unlike CPUs which are optimized for sequential processing.

Question 2: What are Google TPUs specifically optimized for?

Explanation: TPU stands for Tensor Processing Unit. Google designed these custom chips specifically for tensor operations (multi-dimensional array calculations) that neural networks use. They can be 15-30x faster than GPUs for certain AI tasks while using less energy.

Question 3: In a neural network, what are 'weights'?

Explanation: Weights are the numbers that determine how strongly each neuron connection influences the next layer. Training a neural network means adjusting billions of these weights to minimize prediction errors. Modern language models have hundreds of billions of parameters (weights) that must be tuned.

Question 4: What is backpropagation?

Explanation: Backpropagation works backward through the neural network to calculate how much each weight contributed to the prediction error. This information is then used by gradient descent to adjust the weights slightly to reduce that error. It's the fundamental algorithm that makes neural network training possible.

Question 5: What is a vector embedding?

Explanation: Vector embeddings convert data (words, images, etc.) into lists of numbers (vectors) with hundreds or thousands of dimensions. Similar concepts have similar vectors. This is how AI systems mathematically represent and process information - they don't 'understand' words, they work with numerical vectors.

Question 6: The classic vector math example 'king - man + woman = queen' demonstrates what?

Explanation: This example shows that embeddings capture semantic relationships. The vector difference between 'king' and 'man' represents royalty/gender, and adding that to 'woman' points to 'queen'. The model learned these relationships from patterns in training data, not explicit programming.

Question 7: What does LLM stand for and what are they?

Explanation: LLM stands for Large Language Model. ChatGPT, Claude, Gemini, and GPT-4 are all LLMs - neural networks trained on trillions of words to predict the next token in a sequence. They're also called foundation models or generative AI models. The 'large' refers to parameter count (billions to trillions).

Question 8: What is the key innovation of the transformer architecture introduced in 2017?

Explanation: The transformer architecture introduced the attention mechanism in the paper 'Attention is All You Need.' Attention allows the model to understand context - when processing 'it was too tired,' attention helps determine whether 'it' refers to 'animal' or 'street' based on the full sentence context.

Question 9: What is a token in LLM processing?

Explanation: Tokens are the basic units LLMs process - roughly a word or word fragment. English averages ~1.3 tokens per word. 'understanding' might be one token, while 'ChatGPT' might be two: 'Chat' + 'GPT'. API pricing is per token: Claude costs $3/$15 per million tokens (input/output).

Question 10: What is the MOST serious problem with LLMs that cannot be eliminated?

Explanation: Hallucinations are the most serious LLM problem. LLMs are prediction engines that generate plausible text based on patterns, which sometimes means inventing citations, statistics, or facts. This is fundamental to how they work - they can't be eliminated, only reduced. Always verify factual claims from LLMs.

Question 11: What is model drift?

Explanation: Model drift occurs when LLM behavior changes over time, even without retraining. As companies update models (fix bugs, adjust safety filters, improve performance), responses to identical prompts can change. A prompt that worked perfectly in January might fail in March. Enterprise applications must test regularly and version-control prompts.

Question 12: What is a context window?

Explanation: Context windows are the LLM's memory limit. Claude has a 200K token context (~150,000 words), GPT-4 has 128K tokens. Everything in your conversation must fit in this window. When it fills up, older content gets truncated and the model 'forgets' it. This limits use cases like analyzing 1000-page contracts.

Question 13: What does RLHF stand for and what is it?

Explanation: RLHF (Reinforcement Learning from Human Feedback) is the training stage where human raters rank different AI responses, and the model learns to produce higher-rated outputs. This is how ChatGPT became helpful and safe instead of just completing text. It's the final training stage after pre-training and supervised fine-tuning.

Question 14: What is RAG (Retrieval-Augmented Generation)?

Explanation: RAG connects LLMs to knowledge bases instead of fine-tuning. When a user asks a question: (1) Search your documents for relevant information, (2) Include that information in the prompt, (3) Model generates answer based on your data. RAG is cheaper than fine-tuning and easier to update - most enterprise AI uses RAG.

Question 15: How much did training GPT-4 reportedly cost in compute resources?

Explanation: Training GPT-4 cost an estimated $100 million in compute resources. The limiting factor in AI advancement is often not algorithms but access to enough computing power. This is why AI leaders (OpenAI, Google, Meta, Anthropic) spend billions on infrastructure - thousands of GPUs running for weeks or months.

Question 16: What is prompt engineering?

Explanation: Prompt engineering is crafting effective inputs to get better AI outputs. Techniques include: being specific about format/length, providing examples (few-shot learning), assigning roles ('You are an expert...'), using chain-of-thought ('Let's think step-by-step'), and iterating. Some companies hire prompt engineers at $200K+ salaries.

Question 17: Why do LLMs sometimes give inconsistent answers to the same question?

Explanation: LLMs use a temperature parameter that adds randomness to generation - useful for creative writing but problematic for tasks requiring deterministic outputs. Ask the same question three times, get three different answers. Setting temperature to 0 reduces but doesn't eliminate variance. High-stakes applications need multiple runs and human validation.

Question 18: What is the environmental cost of training large AI models?

Explanation: Training large models consumes enormous energy. GPT-3 training used 1,287 MWh of electricity - equivalent to 120 US homes for a year, producing 552 tons of CO2. Data centers running inference at scale also use significant power for computation and cooling. As AI adoption grows, energy consumption is a major sustainability concern.

Question 19: What is the best practice for using LLM outputs in high-stakes decisions (medical, legal, financial)?

Explanation: For high-stakes decisions, implement human-in-the-loop validation. LLMs should draft, suggest, or assist - not make final calls on hiring, medical diagnoses, loan approvals, or legal judgments. LLMs are probabilistic systems that hallucinate and can't guarantee correctness. Human review is both safer and often legally required in regulated industries.

Question 20: What is fine-tuning an LLM?

Explanation: Fine-tuning takes a pre-trained model and continues training it on your specific data. This specializes the model for your use case. Example: fine-tune GPT-4 on your customer service transcripts to create a model that answers in your brand voice with your product knowledge. Requires hundreds-thousands of examples and GPU time.