Top Interview Questions
Teradata is a powerful enterprise-level data warehousing and analytics platform designed to handle large volumes of data efficiently. It is widely used by organizations that require high-performance analytics, complex query processing, and scalability. Teradata is best known for its ability to manage massive parallel processing (MPP) systems, which allow it to process huge datasets quickly and reliably.
Founded in 1979 as a part of NCR Corporation, Teradata became an independent company in 2007. Over the years, it has evolved from a traditional on-premises data warehouse solution to a modern, cloud-enabled analytics platform that supports advanced analytics, machine learning, and hybrid architectures.
Teradata is primarily a relational database management system (RDBMS) optimized for data warehousing and analytics. Unlike traditional databases that are designed for transaction processing, Teradata is designed for Online Analytical Processing (OLAP). It allows users to run complex queries on large datasets without significantly affecting system performance.
Teradata supports standard SQL, making it easy for developers, data analysts, and BI professionals to interact with the system. It is commonly used in industries such as banking, telecommunications, retail, healthcare, and airlines, where data volumes are extremely large and business decisions depend on accurate and fast analytics.
Teradata’s core strength lies in its MPP architecture. Data and queries are distributed across multiple nodes, allowing tasks to be processed in parallel. This significantly improves query performance and scalability.
Teradata can scale linearly, meaning performance improves as more nodes are added. Organizations can expand their systems as data grows without major redesigns.
Teradata efficiently handles complex joins, aggregations, and large scans, making it ideal for analytical workloads involving terabytes or petabytes of data.
Teradata uses ANSI-compliant SQL, enabling users to write complex queries easily. Advanced SQL features such as window functions and analytical functions are well supported.
Teradata automatically distributes data evenly across the system using Primary Indexes (PI), which reduces data skew and improves performance.
Teradata includes powerful workload management tools that prioritize critical queries and ensure fair resource allocation among users.
Teradata architecture is designed to support parallel processing and high availability. The main components include:
A Teradata system consists of multiple nodes. Each node is an independent processing unit with its own CPU, memory, and storage.
The Parsing Engine is responsible for:
Receiving SQL requests from users
Checking syntax and semantics
Creating execution plans
Managing sessions and security
AMPs are the workhorses of Teradata. They:
Store data on disk
Retrieve and manipulate data
Perform aggregations and joins
Write results back to disk
Each AMP processes data independently, enabling parallel execution.
BYNET is the high-speed communication layer that connects nodes and enables data transfer between AMPs.
Indexes play a critical role in Teradata performance.
Determines how data is distributed across AMPs
Can be unique or non-unique
Helps in even data distribution and faster access
Optional index for faster access when PI is not used in queries
Can be unique or non-unique
Consumes additional storage
Pre-joins tables to improve query performance
Useful for complex queries involving multiple tables
Teradata SQL is similar to standard SQL but includes some extensions. Query processing follows these steps:
User submits an SQL query
Parsing Engine validates and optimizes the query
Query plan is generated
AMPs execute the plan in parallel
Results are returned to the user
Teradata’s optimizer is cost-based and highly sophisticated, choosing the most efficient execution path based on statistics and data distribution.
Efficient data loading is essential for data warehouses. Teradata provides several utilities:
Used for loading large volumes of data into empty tables
Very fast but does not allow duplicate rows
Used for bulk loading, updating, deleting, and upserting data
Works on populated tables
Used for near-real-time data loading
Suitable for small batches and continuous loads
A unified framework that replaces FastLoad, MultiLoad, and TPump
Supports both batch and continuous loads
Traditional deployment where Teradata runs on dedicated hardware in a company’s data center. Suitable for organizations with strict security or regulatory requirements.
Teradata Vantage is the modern analytics platform that unifies data warehousing, data lakes, and advanced analytics.
Teradata supports major cloud platforms:
AWS
Microsoft Azure
Google Cloud Platform
Cloud deployment offers flexibility, scalability, and reduced infrastructure costs.
Combines on-premises and cloud environments, allowing organizations to move workloads gradually to the cloud.
Teradata supports advanced analytics and AI/ML workloads through:
In-database analytics
Integration with Python, R, and SAS
Machine learning functions
Graph and time-series analytics
With Teradata Vantage, users can run analytics where the data resides, reducing data movement and improving performance.
Security is a critical component of Teradata systems. Key security features include:
User authentication and authorization
Role-based access control
Data encryption at rest and in transit
Auditing and logging
Compliance with industry standards
These features ensure data confidentiality, integrity, and availability.
Handles extremely large datasets efficiently
Excellent performance for complex analytical queries
Highly scalable and reliable
Strong workload management
Mature and stable platform
Widely used in large enterprises
High cost compared to some modern cloud-native databases
Requires skilled professionals for administration
Not ideal for small datasets or OLTP workloads
Migration and licensing can be complex
Enterprise data warehousing
Customer behavior analytics
Financial risk analysis
Fraud detection
Supply chain analytics
Telecom call detail record (CDR) analysis
Professionals skilled in Teradata can work as:
Teradata Developer
Data Warehouse Engineer
Data Analyst
BI Developer
Database Administrator (DBA)
Key skills include SQL, data modeling, performance tuning, and ETL tools.
Answer:
Teradata is a massively parallel processing (MPP) relational database management system (RDBMS) designed to handle very large volumes of data. It is mainly used in data warehousing and analytics environments where high performance, scalability, and reliability are required.
Teradata distributes data across multiple nodes and processes queries in parallel, which makes it extremely fast for complex analytical queries on large datasets.
Answer:
Key features of Teradata include:
Massively Parallel Processing (MPP) architecture
High scalability (supports petabytes of data)
Shared-nothing architecture
Automatic data distribution
Advanced query optimization
High availability and fault tolerance
Support for ANSI SQL
Answer:
MPP is an architecture where multiple processors work independently and in parallel to process data. In Teradata:
Each processor handles a portion of the data
Queries are divided into smaller tasks
Tasks are executed simultaneously
This results in faster query execution and better performance for large datasets.
Answer:
A node is a physical or virtual server in a Teradata system. Each node contains:
CPUs
Memory
Disk storage
AMPs (Access Module Processors)
Multiple nodes work together to process queries in parallel.
Answer:
AMP stands for Access Module Processor. It is the fundamental unit of work in Teradata. Each AMP:
Stores a portion of the database
Performs row-level operations such as sorting, aggregating, and joining
Works independently of other AMPs
The number of AMPs directly impacts performance.
Answer:
The Parsing Engine (PE) is responsible for:
Receiving SQL queries from users
Checking syntax and semantics
Optimizing queries
Generating execution plans
Sending work instructions to AMPs
The PE does not store data.
Answer:
BYNET is the high-speed communication network in Teradata that connects:
Parsing Engines
AMPs
Nodes
It enables fast data transfer and synchronization between system components.
Answer:
Teradata architecture consists of:
Client Layer – Tools like SQL Assistant, BTEQ, or BI tools
Parsing Engine Layer – Parses and optimizes queries
AMP Layer – Stores and processes data
BYNET – Communication layer between components
This architecture ensures high performance and scalability.
Answer:
A Primary Index determines how rows are distributed across AMPs. It:
Is mandatory for every table
Can be unique or non-unique
Controls data distribution
Improves query performance when used in WHERE clauses
Answer:
| Unique PI | Non-Unique PI |
|---|---|
| Each value is unique | Duplicate values allowed |
| Faster access | Slightly slower |
| Ensures even distribution | May cause data skew |
Answer:
A Secondary Index (SI) provides an alternative access path to data. It is used when queries do not use the Primary Index.
Types:
Unique Secondary Index (USI)
Non-Unique Secondary Index (NUSI)
Answer:
Data skew occurs when data is unevenly distributed across AMPs. This leads to:
Performance issues
Longer query execution time
Overloaded AMPs
Choosing the right Primary Index helps reduce data skew.
Answer:
Teradata uses a hash function to:
Convert Primary Index values into hash values
Determine which AMP stores a row
The hash value ensures even data distribution.
Answer:
A fallback table stores a duplicate copy of data on a different AMP. It provides:
Data protection
High availability in case of AMP failure
Fallback requires additional storage space.
Answer:
A multiset table allows duplicate rows. Teradata supports:
SET tables – Do not allow duplicate rows
MULTISET tables – Allow duplicate rows
Multiset tables are commonly used in data warehouses.
Answer:
A volatile table:
Exists only during a user session
Is stored in memory (or spool)
Is automatically dropped at session end
Is mainly used for intermediate results
Answer:
Spool space is temporary disk space used to:
Store intermediate query results
Sort and join data during query execution
Each user is allocated a specific amount of spool space.
Answer:
| DELETE | DROP |
|---|---|
| Removes rows | Removes entire table |
| Can use WHERE clause | Cannot use WHERE |
| Table structure remains | Table structure removed |
Answer:
BTEQ (Basic Teradata Query) is a command-line utility used to:
Run SQL scripts
Automate batch jobs
Export and import data
Answer:
Advantages include:
Excellent performance for large data
Linear scalability
High reliability and availability
Strong support for analytics
Efficient data distribution
Answer:
| Teradata | Oracle |
|---|---|
| MPP architecture | SMP architecture |
| Best for data warehousing | Best for OLTP |
| Automatic data distribution | Manual tuning required |
Answer:
Statistics provide information about data distribution. The optimizer uses statistics to:
Choose efficient execution plans
Improve query performance
Statistics are collected using the COLLECT STATISTICS command.
Answer:
A Join Index is a pre-joined table that improves query performance by reducing join processing at runtime.
Answer:
| SET Table | MULTISET Table |
|---|---|
| No duplicate rows | Duplicate rows allowed |
| Slower inserts | Faster inserts |
| Default table type | Preferred in DW |
Answer:
Teradata is ideal for data warehousing because it:
Handles huge data volumes
Executes complex analytical queries efficiently
Supports parallel processing
Provides high scalability and reliability
Answer:
In Teradata, a database is a logical container that holds:
Tables
Views
Indexes
Macros
Stored procedures
A database also controls space allocation and user access.
Answer:
A user is a special type of database that:
Can log in to Teradata
Owns objects (tables, views)
Has permissions and spool space
Every user is technically a database, but not every database is a user.
Answer:
| USER | DATABASE |
|---|---|
| Can log in | Cannot log in |
| Has password | No password |
| Has spool space | No spool space |
| Used by people/apps | Used for storage |
Answer:
A view is a virtual table created using a SELECT query. It:
Does not store data physically
Stores only the query definition
Is used for security and simplicity
Answer:
A macro is a stored set of SQL statements that can be executed with a single call.
Benefits:
Reduces repeated SQL coding
Improves consistency
Easy execution
Answer:
| Macro | Stored Procedure |
|---|---|
| Only SQL statements | SQL + control logic |
| No IF/LOOP | Supports IF, LOOP |
| Faster | Slightly slower |
| Simple tasks | Complex logic |
Answer:
A stored procedure is a program stored in the database that:
Contains SQL and procedural logic
Supports conditions and loops
Can accept input/output parameters
Used for complex business logic.
Answer:
A surrogate key is an artificial key (usually a number) used instead of a natural key.
Example:
Customer_ID generated by system
It helps in:
Performance
Avoiding business key changes
Answer:
A natural key is a real-world attribute that uniquely identifies a record.
Example:
PAN number
Email ID
Answer:
Normalization is the process of:
Organizing data
Removing redundancy
Improving data integrity
In Teradata data warehouses, normalization is often reduced for performance.
Answer:
Denormalization is the process of:
Combining tables
Reducing joins
Improving query performance
It is commonly used in data warehouses.
Answer:
A fact table stores measurable data such as:
Sales amount
Quantity
Revenue
It usually contains foreign keys to dimension tables.
Answer:
A dimension table contains descriptive information like:
Customer
Product
Time
Location
Used for filtering and grouping data.
Answer:
A star schema has:
One fact table in the center
Multiple dimension tables around it
It is simple and provides fast query performance.
Answer:
A snowflake schema is an extension of star schema where:
Dimension tables are normalized
More joins are required
Answer:
| Star Schema | Snowflake Schema |
|---|---|
| Simple design | Complex design |
| Fewer joins | More joins |
| Faster queries | Slower queries |
| More storage | Less storage |
Answer:
A full table scan occurs when:
No index is used
Teradata scans all rows of the table
It is slower and should be avoided for large tables.
Answer:
The EXPLAIN command shows:
How Teradata executes a query
Whether indexes are used
Data movement between AMPs
Used for performance tuning.
Answer:
A NoPI table:
Has no primary index
Rows are distributed randomly
Commonly used for staging data
Answer:
A hash join:
Uses hash values to match rows
Is efficient for large tables
Is commonly used in Teradata
Answer:
A merge join:
Requires sorted data
Compares rows sequentially
Used when data is already sorted
Answer:
A cartesian join:
Occurs when join condition is missing
Produces all possible combinations
Very expensive and should be avoided
Answer:
PPI divides data based on a column like:
Date
Region
It improves performance for range queries.
Answer:
A secondary index subtable:
Stores index values separately
Points to base table rows
Requires extra storage
Answer:
FastLoad is used to:
Load large volumes of data into empty tables
Load data quickly
Does not allow duplicate rows
Answer:
MultiLoad is used to:
Load, update, delete data
Work with large tables
Support restart capability
Answer:
| FastLoad | MultiLoad |
|---|---|
| Insert only | Insert, Update, Delete |
| Empty table | Populated table |
| No duplicates | Allows duplicates |
| Very fast | Slightly slower |
Answer:
TPT is a modern utility that:
Replaces FastLoad, MultiLoad, Export
Supports parallel data movement
Improves performance
Answer:
Locking ensures:
Data consistency
Prevents conflicts during access
Types:
Read lock
Write lock
Exclusive lock
Answer:
Deadlock occurs when:
Two transactions wait for each other
Neither can proceed
Teradata automatically detects and resolves deadlocks.
Answer:
A checkpoint allows:
Restarting a job from failure point
Reducing reprocessing time
Common in MultiLoad and FastLoad.
Answer:
Referential integrity ensures:
Child records reference valid parent records
Data consistency across tables
Answer:
ANSI mode enforces:
Standard SQL behavior
Strict rules for NULL comparison
Transaction control
Answer:
It is a GUI tool used to:
Run SQL queries
View results
Monitor sessions
Answer:
A Teradata DBA:
Manages users and space
Monitors performance
Handles backups and recovery
Ensures system availability
Answer:
Teradata follows a Massively Parallel Processing (MPP) and shared-nothing architecture.
Main components:
Client Layer – BTEQ, SQL Assistant, BI tools
Parsing Engine (PE) – Parses SQL, checks syntax, creates execution plan
BYNET – High-speed network for communication
AMPs – Store data and execute queries in parallel
This architecture enables linear scalability and high performance.
Answer:
Data distribution is based on:
Primary Index value
Hashing algorithm
Hash Map
The PI value is hashed and mapped to a specific AMP, ensuring even data distribution.
Answer:
Data skew occurs when data is unevenly distributed across AMPs.
Check AMP usage in Viewpoint
Analyze EXPLAIN plan
Query DBC.DISKSPACE
Choose a better Primary Index
Use composite PI
Use NoPI tables for staging
Redistribute data
Answer:
| Primary Index | PPI |
|---|---|
| Distributes data across AMPs | Divides data within AMP |
| Improves join performance | Improves range queries |
| Hash-based | Range-based |
Answer:
A Join Index is a physical structure that stores pre-joined data.
Frequent joins on same columns
Complex joins on large tables
Query performance is critical
Answer:
Merge Join – Requires sorted data
Hash Join – Uses hash values, very common
Nested Join – Used for small tables
Product Join – Cartesian join (to be avoided)
Answer:
Statistics help the optimizer:
Estimate row counts
Choose join order
Select join method
Without statistics, Teradata may choose inefficient execution plans.
Answer:
COLLECT STATISTICS COLUMN(column_name) ON table_name;
Best practice:
Collect stats on PI, join columns, filters
Answer:
Spool space stores intermediate query results.
Optimize queries
Use proper WHERE conditions
Drop unnecessary volatile tables
Increase user spool allocation
Answer:
SET tables avoid duplicates but slow inserts
MULTISET tables allow duplicates and faster loads
β‘οΈ MULTISET is preferred in data warehouses.
Answer:
NoPI tables have:
No primary index
Random row distribution
Used in:
Staging tables
Temporary processing
Answer:
| Utility | Use Case |
|---|---|
| FastLoad | Initial bulk load |
| MultiLoad | Update/Delete/Insert |
| TPT | Modern parallel utility |
Answer:
Uses locking mechanisms
Supports multiple sessions
Manages workload via TASM
Answer:
Teradata Active System Management (TASM) manages:
Workload prioritization
Resource allocation
Query throttling
Answer:
Fallback stores duplicate data on different AMPs.
Used when:
High availability is required
Data loss is unacceptable
Answer:
ACCESS – Read without blocking
READ – Consistent read
WRITE – Allows read but blocks write
EXCLUSIVE – Blocks all access
Answer:
Steps:
Check EXPLAIN plan
Verify statistics
Check data skew
Optimize joins
Reduce spool usage
Use proper indexes
Answer:
Volatile tables:
Exist for session duration
Used for intermediate results
Improve performance by reducing I/O
Answer:
| DELETE | TRUNCATE | DROP |
|---|---|---|
| Row-level | Removes all rows | Removes table |
| WHERE allowed | No WHERE | No table left |
| Slow | Fast | Permanent |
Answer:
Query banding attaches metadata to queries, useful for:
Monitoring
Auditing
Workload management
Answer:
The optimizer:
Uses cost-based optimization
Analyzes statistics
Chooses best execution path
Answer:
Fallback – Logical data protection
RAID – Physical disk protection
Both are used together for high availability.
Answer:
Temporal tables store:
Historical data
Valid time and transaction time
Used for auditing and tracking changes.
Answer:
Use batch deletes
Use partitioning
Avoid full table locks
Answer:
Typical architecture:
Source systems → Staging (NoPI)
Transformation layer
Core warehouse
Reporting layer
Answer:
Occurs when one AMP processes most join rows.
Fix:
Change join order
Redistribute smaller table
Use statistics
Answer:
| USI | NUSI |
|---|---|
| Unique values | Duplicate values |
| Fast access | Slower |
| Stored as subtable | Stored as subtable |
Answer:
A hash map maps hash values to AMPs and ensures even distribution.
Answer:
Handled by:
TASM
Priority scheduling
Query throttling
Answer:
Because it offers:
Linear scalability
High performance
Parallel processing
Robust workload management
Answer:
A good Primary Index (PI) should:
Be frequently used in WHERE and JOIN clauses
Have high cardinality
Ensure even data distribution
Be stable (values should not change)
Avoid:
Low-cardinality columns (gender, status)
Volatile business keys
Answer:
A Composite PI consists of multiple columns.
Used when:
Single column doesn’t distribute data evenly
Queries frequently use multiple columns together
Example:
PRIMARY INDEX (customer_id, order_date)
Answer:
AMP-local join – Data already on same AMP (fastest)
Redistributed join – Rows moved between AMPs
Duplicated join – Smaller table copied to all AMPs
Best practice: Aim for AMP-local joins.
Answer:
Duplication copies a small table to all AMPs to avoid redistribution.
Used when:
One table is very small
Reduces data movement
Answer:
Spool skew occurs when:
Spool usage is uneven across AMPs
Fix:
Change join order
Use better PI
Collect statistics
Break query into steps
Answer:
Steps:
Check Viewpoint
Review EXPLAIN plan
Check statistics
Identify skew
Monitor spool usage
Tune joins and filters
Answer:
QueryGrid allows Teradata to:
Query external systems (Hadoop, Oracle)
Avoid data movement
Perform cross-platform analytics
Answer:
| ACCESS | READ |
|---|---|
| No blocking | Blocks writes |
| Faster | Consistent reads |
| Dirty reads possible | No dirty reads |
Answer:
Stale stats lead to:
Poor execution plans
Longer run times
Increased spool usage
Solution: Recollect statistics regularly.
Answer:
Teradata:
Detects deadlocks automatically
Aborts one transaction
Rolls back changes
Answer:
Perm – Permanent table storage
Spool – Query processing
Temp – Global temporary tables
Answer:
Drop unused tables
Archive old data
Monitor disk usage
Increase space allocation
Answer:
A GTT:
Structure persists
Data is session-specific
Used for temporary processing
Answer:
| Volatile Table | GTT |
|---|---|
| Dropped at logout | Structure persists |
| Session-based | Multi-session |
| Faster setup | Reusable |
Answer:
Used for:
Audit history
Slowly Changing Dimensions
Time travel queries
Answer:
Use TPT
Use NoPI staging
Batch loads
Validate after load
Answer:
Checkpoint:
Saves job progress
Allows restart from failure
Answer:
FastExport:
Extracts large volumes of data
Uses parallelism
Faster than SELECT
Answer:
A Hash Index:
Improves equality lookups
Uses hashing
Alternative to NUSI
Answer:
| Join Index | Hash Index |
|---|---|
| Pre-joined data | Single-table lookup |
| Improves joins | Improves filters |
| Uses more space | Lightweight |
Answer:
Use SET tables
Use ROW_NUMBER() with QUALIFY
Remove duplicates via staging
Answer:
QUALIFY filters result after window functions.
Example:
SELECT *
FROM sales
QUALIFY ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY date) = 1;
Answer:
| WHERE | HAVING | QUALIFY |
|---|---|---|
| Filters rows | Filters groups | Filters window results |
Answer:
Limits number of concurrent queries to:
Prevent system overload
Maintain performance
Handled by TASM.
Answer:
Short-running, high-priority queries such as:
Dashboard lookups
Single-row queries
Answer:
DEV – Testing and development
QA/UAT – Validation
PROD – Live business data
Answer:
Use scripts
Validate permissions
Test performance
Monitor post-deployment
Answer:
Change Data Capture tracks:
Inserts
Updates
Deletes
Used in incremental loads.
Answer:
Viewpoint is used for:
Monitoring sessions
Query performance
Resource usage
System health
Answer:
Scenario: Query suddenly takes 10x more time
Solution:
Check stats freshness
Identify skew
Review EXPLAIN
Apply tuning