Module 2 Quiz: AI Fundamentals | AI for Business Course

Question 1: What is the primary advantage of GPUs over CPUs for AI training?

GPUs are cheaper than CPUs

GPUs excel at parallel processing and can perform thousands of calculations simultaneously

GPUs use less electricity

GPUs have more memory

Explanation: GPUs were originally designed for rendering graphics, which requires massive parallel computation. This same capability makes them ideal for the matrix multiplications used in neural network training - they can perform thousands of calculations at once, unlike CPUs which are optimized for sequential processing.

Question 2: What are Google TPUs specifically optimized for?

Video game rendering

Cryptocurrency mining

Tensor operations used in neural networks

Database queries

Explanation: TPU stands for Tensor Processing Unit. Google designed these custom chips specifically for tensor operations (multi-dimensional array calculations) that neural networks use. They can be 15-30x faster than GPUs for certain AI tasks while using less energy.

Question 3: In a neural network, what are 'weights'?

The size of the training dataset

Numbers that determine the strength of connections between neurons

The computational cost of running the model

The accuracy score of the model

Explanation: Weights are the numbers that determine how strongly each neuron connection influences the next layer. Training a neural network means adjusting billions of these weights to minimize prediction errors. Modern language models have hundreds of billions of parameters (weights) that must be tuned.

Question 4: What is backpropagation?

Running the model backward to undo mistakes

An algorithm that calculates how much each weight contributed to the error by working backward through the network

A technique for reducing model size

A method for increasing training speed

Explanation: Backpropagation works backward through the neural network to calculate how much each weight contributed to the prediction error. This information is then used by gradient descent to adjust the weights slightly to reduce that error. It's the fundamental algorithm that makes neural network training possible.

Question 5: What is a vector embedding?

A compression technique for reducing file sizes

Converting words, images, or data into lists of numbers that AI models can process

A type of neural network architecture

A security protocol for AI systems

Explanation: Vector embeddings convert data (words, images, etc.) into lists of numbers (vectors) with hundreds or thousands of dimensions. Similar concepts have similar vectors. This is how AI systems mathematically represent and process information - they don't 'understand' words, they work with numerical vectors.

Question 6: The classic vector math example 'king - man + woman = queen' demonstrates what?

AI systems can do basic arithmetic

Word embeddings capture semantic relationships that can be manipulated mathematically

Neural networks understand human language

Transformers are better than other architectures

Explanation: This example shows that embeddings capture semantic relationships. The vector difference between 'king' and 'man' represents royalty/gender, and adding that to 'woman' points to 'queen'. The model learned these relationships from patterns in training data, not explicit programming.

Question 7: What does LLM stand for and what are they?

Linear Learning Models - simple statistical systems

Large Language Models - neural networks trained on massive text datasets to predict the next word

Logical Language Machines - rule-based AI systems

Limited Learning Memory - AI with restricted data access

Explanation: LLM stands for Large Language Model. ChatGPT, Claude, Gemini, and GPT-4 are all LLMs - neural networks trained on trillions of words to predict the next token in a sequence. They're also called foundation models or generative AI models. The 'large' refers to parameter count (billions to trillions).

Question 8: What is the key innovation of the transformer architecture introduced in 2017?

Using CPUs instead of GPUs

The attention mechanism that lets models weigh the importance of different words in context

Eliminating the need for training data

Running on smartphones

Explanation: The transformer architecture introduced the attention mechanism in the paper 'Attention is All You Need.' Attention allows the model to understand context - when processing 'it was too tired,' attention helps determine whether 'it' refers to 'animal' or 'street' based on the full sentence context.

Question 9: What is a token in LLM processing?

A security credential for API access

Roughly a word or word fragment - the basic unit of text processed by LLMs

A reward signal for reinforcement learning

A type of neural network layer

Explanation: Tokens are the basic units LLMs process - roughly a word or word fragment. English averages ~1.3 tokens per word. 'understanding' might be one token, while 'ChatGPT' might be two: 'Chat' + 'GPT'. API pricing is per token: Claude costs $3/$15 per million tokens (input/output).

Question 10: What is the MOST serious problem with LLMs that cannot be eliminated?

They are too expensive to run

Hallucinations - confidently stating false information as fact

They require too much electricity

They can't process images

Explanation: Hallucinations are the most serious LLM problem. LLMs are prediction engines that generate plausible text based on patterns, which sometimes means inventing citations, statistics, or facts. This is fundamental to how they work - they can't be eliminated, only reduced. Always verify factual claims from LLMs.

Question 11: What is model drift?

When models become less accurate over time due to hardware degradation

Changes in LLM behavior over time as providers update models - prompts that worked before may fail later

The tendency of models to prefer certain topics

When training data becomes outdated

Explanation: Model drift occurs when LLM behavior changes over time, even without retraining. As companies update models (fix bugs, adjust safety filters, improve performance), responses to identical prompts can change. A prompt that worked perfectly in January might fail in March. Enterprise applications must test regularly and version-control prompts.

Question 12: What is a context window?

The time period covered by training data

The maximum amount of text an LLM can process at once - everything must fit or older content gets truncated

The user interface for entering prompts

The computational resources needed for inference

Explanation: Context windows are the LLM's memory limit. Claude has a 200K token context (~150,000 words), GPT-4 has 128K tokens. Everything in your conversation must fit in this window. When it fills up, older content gets truncated and the model 'forgets' it. This limits use cases like analyzing 1000-page contracts.

Question 13: What does RLHF stand for and what is it?

Rapid Learning High Frequency - fast training technique

Reinforcement Learning from Human Feedback - humans rate AI outputs and the model learns to produce higher-rated responses

Recursive Language Formatting Helper - a prompting technique

Real-time Language Flow Handler - streaming response system

Explanation: RLHF (Reinforcement Learning from Human Feedback) is the training stage where human raters rank different AI responses, and the model learns to produce higher-rated outputs. This is how ChatGPT became helpful and safe instead of just completing text. It's the final training stage after pre-training and supervised fine-tuning.

Question 14: What is RAG (Retrieval-Augmented Generation)?

A type of neural network architecture

Giving LLMs access to external knowledge bases - search documents, include relevant info in prompt, generate answer

A GPU acceleration technique

A method for reducing hallucinations through repetition

Explanation: RAG connects LLMs to knowledge bases instead of fine-tuning. When a user asks a question: (1) Search your documents for relevant information, (2) Include that information in the prompt, (3) Model generates answer based on your data. RAG is cheaper than fine-tuning and easier to update - most enterprise AI uses RAG.

Question 15: How much did training GPT-4 reportedly cost in compute resources?

$1 million

$10 million

$100 million

$1 billion

Explanation: Training GPT-4 cost an estimated $100 million in compute resources. The limiting factor in AI advancement is often not algorithms but access to enough computing power. This is why AI leaders (OpenAI, Google, Meta, Anthropic) spend billions on infrastructure - thousands of GPUs running for weeks or months.

Question 16: What is prompt engineering?

Writing computer code to control AI systems

Crafting effective inputs to AI systems using techniques like few-shot learning, chain-of-thought, and role assignment

Designing the user interface for AI applications

Training custom AI models

Explanation: Prompt engineering is crafting effective inputs to get better AI outputs. Techniques include: being specific about format/length, providing examples (few-shot learning), assigning roles ('You are an expert...'), using chain-of-thought ('Let's think step-by-step'), and iterating. Some companies hire prompt engineers at $200K+ salaries.

Question 17: Why do LLMs sometimes give inconsistent answers to the same question?

They randomly forget their training

They use randomness (temperature parameter) to generate varied outputs, which creates creativity but reduces consistency

The models are poorly trained

Network latency affects responses

Explanation: LLMs use a temperature parameter that adds randomness to generation - useful for creative writing but problematic for tasks requiring deterministic outputs. Ask the same question three times, get three different answers. Setting temperature to 0 reduces but doesn't eliminate variance. High-stakes applications need multiple runs and human validation.

Question 18: What is the environmental cost of training large AI models?

Minimal - AI training uses less energy than a household

Moderate - about the same as a small office building

Substantial - GPT-3 training used 1,287 MWh (equivalent to 120 US homes for a year) and produced 552 tons of CO2

Negligible - cloud providers use 100% renewable energy

Explanation: Training large models consumes enormous energy. GPT-3 training used 1,287 MWh of electricity - equivalent to 120 US homes for a year, producing 552 tons of CO2. Data centers running inference at scale also use significant power for computation and cooling. As AI adoption grows, energy consumption is a major sustainability concern.

Question 19: What is the best practice for using LLM outputs in high-stakes decisions (medical, legal, financial)?

Trust the LLM completely since it's trained on expert data

Implement human-in-the-loop validation - LLMs assist but humans make final decisions

Use multiple LLMs and average their answers

Only use the most expensive models like GPT-4

Explanation: For high-stakes decisions, implement human-in-the-loop validation. LLMs should draft, suggest, or assist - not make final calls on hiring, medical diagnoses, loan approvals, or legal judgments. LLMs are probabilistic systems that hallucinate and can't guarantee correctness. Human review is both safer and often legally required in regulated industries.

Question 20: What is fine-tuning an LLM?

Adjusting the temperature parameter for better outputs

Continuing to train a pre-trained model on your specific data to specialize it for your use case

Writing better prompts through trial and error

Reducing the model size to run faster

Explanation: Fine-tuning takes a pre-trained model and continues training it on your specific data. This specializes the model for your use case. Example: fine-tune GPT-4 on your customer service transcripts to create a model that answers in your brand voice with your product knowledge. Requires hundreds-thousands of examples and GPU time.