← Home
Module 2 Quiz: AI Fundamentals
Test your knowledge of how AI actually works
Question 1: What is the primary advantage of GPUs over CPUs for AI training?
GPUs are cheaper than CPUs
GPUs excel at parallel processing and can perform thousands of calculations simultaneously
GPUs use less electricity
GPUs have more memory
Explanation:
GPUs were originally designed for rendering graphics, which requires massive parallel computation. This same capability makes them ideal for the matrix multiplications used in neural network training - they can perform thousands of calculations at once, unlike CPUs which are optimized for sequential processing.
Question 2: What are Google TPUs specifically optimized for?
Video game rendering
Cryptocurrency mining
Tensor operations used in neural networks
Database queries
Explanation:
TPU stands for Tensor Processing Unit. Google designed these custom chips specifically for tensor operations (multi-dimensional array calculations) that neural networks use. They can be 15-30x faster than GPUs for certain AI tasks while using less energy.
Question 3: In a neural network, what are 'weights'?
The size of the training dataset
Numbers that determine the strength of connections between neurons
The computational cost of running the model
The accuracy score of the model
Explanation:
Weights are the numbers that determine how strongly each neuron connection influences the next layer. Training a neural network means adjusting billions of these weights to minimize prediction errors. Modern language models have hundreds of billions of parameters (weights) that must be tuned.
Question 4: What is backpropagation?
Running the model backward to undo mistakes
An algorithm that calculates how much each weight contributed to the error by working backward through the network
A technique for reducing model size
A method for increasing training speed
Explanation:
Backpropagation works backward through the neural network to calculate how much each weight contributed to the prediction error. This information is then used by gradient descent to adjust the weights slightly to reduce that error. It's the fundamental algorithm that makes neural network training possible.
Question 5: What is a vector embedding?
A compression technique for reducing file sizes
Converting words, images, or data into lists of numbers that AI models can process
A type of neural network architecture
A security protocol for AI systems
Explanation:
Vector embeddings convert data (words, images, etc.) into lists of numbers (vectors) with hundreds or thousands of dimensions. Similar concepts have similar vectors. This is how AI systems mathematically represent and process information - they don't 'understand' words, they work with numerical vectors.
Question 6: The classic vector math example 'king - man + woman = queen' demonstrates what?
AI systems can do basic arithmetic
Word embeddings capture semantic relationships that can be manipulated mathematically
Neural networks understand human language
Transformers are better than other architectures
Explanation:
This example shows that embeddings capture semantic relationships. The vector difference between 'king' and 'man' represents royalty/gender, and adding that to 'woman' points to 'queen'. The model learned these relationships from patterns in training data, not explicit programming.
Question 7: What does LLM stand for and what are they?
Linear Learning Models - simple statistical systems
Large Language Models - neural networks trained on massive text datasets to predict the next word
Logical Language Machines - rule-based AI systems
Limited Learning Memory - AI with restricted data access
Explanation:
LLM stands for Large Language Model. ChatGPT, Claude, Gemini, and GPT-4 are all LLMs - neural networks trained on trillions of words to predict the next token in a sequence. They're also called foundation models or generative AI models. The 'large' refers to parameter count (billions to trillions).
Question 8: What is the key innovation of the transformer architecture introduced in 2017?
Using CPUs instead of GPUs
The attention mechanism that lets models weigh the importance of different words in context
Eliminating the need for training data
Running on smartphones
Explanation:
The transformer architecture introduced the attention mechanism in the paper 'Attention is All You Need.' Attention allows the model to understand context - when processing 'it was too tired,' attention helps determine whether 'it' refers to 'animal' or 'street' based on the full sentence context.
Question 9: What is a token in LLM processing?
A security credential for API access
Roughly a word or word fragment - the basic unit of text processed by LLMs
A reward signal for reinforcement learning
A type of neural network layer
Explanation:
Tokens are the basic units LLMs process - roughly a word or word fragment. English averages ~1.3 tokens per word. 'understanding' might be one token, while 'ChatGPT' might be two: 'Chat' + 'GPT'. API pricing is per token: Claude costs $3/$15 per million tokens (input/output).
Question 10: What is the MOST serious problem with LLMs that cannot be eliminated?
They are too expensive to run
Hallucinations - confidently stating false information as fact
They require too much electricity
They can't process images
Explanation:
Hallucinations are the most serious LLM problem. LLMs are prediction engines that generate plausible text based on patterns, which sometimes means inventing citations, statistics, or facts. This is fundamental to how they work - they can't be eliminated, only reduced. Always verify factual claims from LLMs.
Question 11: What is model drift?
When models become less accurate over time due to hardware degradation
Changes in LLM behavior over time as providers update models - prompts that worked before may fail later
The tendency of models to prefer certain topics
When training data becomes outdated
Explanation:
Model drift occurs when LLM behavior changes over time, even without retraining. As companies update models (fix bugs, adjust safety filters, improve performance), responses to identical prompts can change. A prompt that worked perfectly in January might fail in March. Enterprise applications must test regularly and version-control prompts.
Question 12: What is a context window?
The time period covered by training data
The maximum amount of text an LLM can process at once - everything must fit or older content gets truncated
The user interface for entering prompts
The computational resources needed for inference
Explanation:
Context windows are the LLM's memory limit. Claude has a 200K token context (~150,000 words), GPT-4 has 128K tokens. Everything in your conversation must fit in this window. When it fills up, older content gets truncated and the model 'forgets' it. This limits use cases like analyzing 1000-page contracts.
Question 13: What does RLHF stand for and what is it?
Rapid Learning High Frequency - fast training technique
Reinforcement Learning from Human Feedback - humans rate AI outputs and the model learns to produce higher-rated responses
Recursive Language Formatting Helper - a prompting technique
Real-time Language Flow Handler - streaming response system
Explanation:
RLHF (Reinforcement Learning from Human Feedback) is the training stage where human raters rank different AI responses, and the model learns to produce higher-rated outputs. This is how ChatGPT became helpful and safe instead of just completing text. It's the final training stage after pre-training and supervised fine-tuning.
Question 14: What is RAG (Retrieval-Augmented Generation)?
A type of neural network architecture
Giving LLMs access to external knowledge bases - search documents, include relevant info in prompt, generate answer
A GPU acceleration technique
A method for reducing hallucinations through repetition
Explanation:
RAG connects LLMs to knowledge bases instead of fine-tuning. When a user asks a question: (1) Search your documents for relevant information, (2) Include that information in the prompt, (3) Model generates answer based on your data. RAG is cheaper than fine-tuning and easier to update - most enterprise AI uses RAG.
Question 15: How much did training GPT-4 reportedly cost in compute resources?
$1 million
$10 million
$100 million
$1 billion
Explanation:
Training GPT-4 cost an estimated $100 million in compute resources. The limiting factor in AI advancement is often not algorithms but access to enough computing power. This is why AI leaders (OpenAI, Google, Meta, Anthropic) spend billions on infrastructure - thousands of GPUs running for weeks or months.
Question 16: What is prompt engineering?
Writing computer code to control AI systems
Crafting effective inputs to AI systems using techniques like few-shot learning, chain-of-thought, and role assignment
Designing the user interface for AI applications
Training custom AI models
Explanation:
Prompt engineering is crafting effective inputs to get better AI outputs. Techniques include: being specific about format/length, providing examples (few-shot learning), assigning roles ('You are an expert...'), using chain-of-thought ('Let's think step-by-step'), and iterating. Some companies hire prompt engineers at $200K+ salaries.
Question 17: Why do LLMs sometimes give inconsistent answers to the same question?
They randomly forget their training
They use randomness (temperature parameter) to generate varied outputs, which creates creativity but reduces consistency
The models are poorly trained
Network latency affects responses
Explanation:
LLMs use a temperature parameter that adds randomness to generation - useful for creative writing but problematic for tasks requiring deterministic outputs. Ask the same question three times, get three different answers. Setting temperature to 0 reduces but doesn't eliminate variance. High-stakes applications need multiple runs and human validation.
Question 18: What is the environmental cost of training large AI models?
Minimal - AI training uses less energy than a household
Moderate - about the same as a small office building
Substantial - GPT-3 training used 1,287 MWh (equivalent to 120 US homes for a year) and produced 552 tons of CO2
Negligible - cloud providers use 100% renewable energy
Explanation:
Training large models consumes enormous energy. GPT-3 training used 1,287 MWh of electricity - equivalent to 120 US homes for a year, producing 552 tons of CO2. Data centers running inference at scale also use significant power for computation and cooling. As AI adoption grows, energy consumption is a major sustainability concern.
Question 19: What is the best practice for using LLM outputs in high-stakes decisions (medical, legal, financial)?
Trust the LLM completely since it's trained on expert data
Implement human-in-the-loop validation - LLMs assist but humans make final decisions
Use multiple LLMs and average their answers
Only use the most expensive models like GPT-4
Explanation:
For high-stakes decisions, implement human-in-the-loop validation. LLMs should draft, suggest, or assist - not make final calls on hiring, medical diagnoses, loan approvals, or legal judgments. LLMs are probabilistic systems that hallucinate and can't guarantee correctness. Human review is both safer and often legally required in regulated industries.
Question 20: What is fine-tuning an LLM?
Adjusting the temperature parameter for better outputs
Continuing to train a pre-trained model on your specific data to specialize it for your use case
Writing better prompts through trial and error
Reducing the model size to run faster
Explanation:
Fine-tuning takes a pre-trained model and continues training it on your specific data. This specializes the model for your use case. Example: fine-tune GPT-4 on your customer service transcripts to create a model that answers in your brand voice with your product knowledge. Requires hundreds-thousands of examples and GPU time.
Submit Quiz
Retake Quiz
← Back to Module 2
📚 Study Guide