Top Interview Questions
NumPy, short for Numerical Python, is an open-source Python library designed for numerical and scientific computing. It is the foundational library for many Python-based data science, machine learning, and artificial intelligence projects. Developed by Travis Oliphant in 2005, NumPy provides support for large multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays efficiently.
The key advantage of NumPy is its performance. While Python’s standard data structures like lists and tuples are flexible, they are not optimized for numerical computations. NumPy arrays, however, are implemented in C, which allows for much faster computation, memory efficiency, and vectorized operations.
N-Dimensional Array (ndarray):
The heart of NumPy is the ndarray (N-dimensional array) object. Unlike Python lists, ndarray stores elements of the same data type in contiguous memory, allowing for fast mathematical operations. NumPy arrays support various dimensions, such as 1D arrays (vectors), 2D arrays (matrices), and even higher-dimensional arrays.
Vectorized Operations:
NumPy allows element-wise operations on arrays without explicit loops, also known as vectorization. For example, adding two arrays of the same size or multiplying an array by a scalar can be performed in a single operation. This significantly improves speed and simplifies code.
Mathematical Functions:
NumPy provides a wide range of mathematical functions, including linear algebra, statistical operations, Fourier transforms, and random number generation. These functions are optimized for performance and can handle large datasets efficiently.
Broadcasting:
Broadcasting is a powerful feature that allows operations on arrays of different shapes. NumPy automatically expands smaller arrays to match the shape of larger arrays during arithmetic operations, reducing the need for manual reshaping.
Memory Efficiency:
NumPy arrays consume less memory than Python lists because they store elements of the same data type in contiguous memory blocks. This efficiency allows handling of very large datasets in scientific and engineering applications.
Integration with Other Libraries:
NumPy serves as the base for many Python libraries like Pandas, Matplotlib, SciPy, TensorFlow, and scikit-learn. Its array structures and functions provide a standard interface for data manipulation and numerical computations.
NumPy arrays can be created in multiple ways:
From Python lists or tuples:
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr) # Output: [1 2 3 4]
Using built-in functions:
np.zeros(shape) creates an array filled with zeros.
np.ones(shape) creates an array filled with ones.
np.arange(start, stop, step) generates values within a specified range.
np.linspace(start, stop, num) generates a specified number of evenly spaced values.
Example:
arr = np.arange(0, 10, 2)
print(arr) # Output: [0 2 4 6 8]
Random arrays:
NumPy has a random module to generate arrays with random numbers.
rand_arr = np.random.rand(3, 3) # 3x3 array with values between 0 and 1
NumPy provides a wide range of operations on arrays:
Arithmetic Operations:
Element-wise operations like addition, subtraction, multiplication, and division are straightforward:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # Output: [5 7 9]
print(a * b) # Output: [4 10 18]
Statistical Functions:
NumPy offers functions like mean(), median(), std() (standard deviation), var() (variance), and sum():
arr = np.array([1, 2, 3, 4, 5])
print(np.mean(arr)) # Output: 3.0
print(np.std(arr)) # Output: 1.4142135623730951
Linear Algebra:
Operations like matrix multiplication, determinants, inverses, and eigenvalues are possible using NumPy:
A = np.array([[1, 2], [3, 4]])
B = np.array([[2, 0], [1, 2]])
print(np.dot(A, B))
Reshaping and Indexing:
NumPy allows reshaping arrays and extracting specific elements using slicing or boolean indexing:
arr = np.arange(12)
arr = arr.reshape(3, 4)
print(arr[1, 2]) # Access element at row 1, column 2
Broadcasting Example:
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([1, 0, 1])
print(A + B)
# Output:
# [[2 2 4]
# [5 5 7]]
The biggest advantage of NumPy over standard Python lists is speed. NumPy operations are executed at C-speed, bypassing the Python interpreter’s overhead. For instance, adding two arrays of 1 million elements using NumPy is several times faster than using Python loops. Additionally, memory efficiency allows handling larger datasets without consuming excessive system resources.
NumPy is widely used across multiple domains:
Data Analysis:
NumPy arrays form the backbone of Pandas, enabling efficient data manipulation, filtering, and aggregation.
Machine Learning:
Libraries like scikit-learn, TensorFlow, and PyTorch rely heavily on NumPy for data preprocessing and matrix operations.
Scientific Computing:
NumPy supports numerical simulations, statistical computations, and solving differential equations, making it popular in physics, chemistry, and biology research.
Finance:
NumPy is used for risk analysis, portfolio optimization, and modeling stock prices due to its statistical and matrix computation capabilities.
Image Processing:
Libraries like OpenCV use NumPy arrays to represent images and perform pixel-level manipulations efficiently.
High-performance operations with large arrays.
Reduced memory consumption compared to native Python lists.
Supports multi-dimensional arrays and matrices.
Wide range of built-in mathematical functions.
Seamless integration with other Python libraries.
Enables vectorized operations, avoiding explicit loops.
Supports broadcasting, making operations flexible and concise.
NumPy arrays are homogeneous, meaning all elements must be of the same type.
Not ideal for handling missing data; other libraries like Pandas are better suited.
Less intuitive for beginners compared to Python lists for small, simple tasks.
Limited support for symbolic mathematics (use SymPy for that).
Answer:
NumPy (Numerical Python) is a Python library used for numerical computations. It provides support for:
Large, multi-dimensional arrays and matrices.
Mathematical functions to operate on these arrays.
Efficient memory and performance compared to regular Python lists.
NumPy is widely used in data science, machine learning, and scientific computing.
Answer:
Performance: NumPy arrays are faster due to optimized C code and contiguous memory allocation.
Memory Efficiency: NumPy arrays consume less memory than Python lists.
Convenience: Supports vectorized operations and broadcasting.
Functionality: Provides built-in functions for linear algebra, statistics, and random number generation.
Answer:
pip install numpy
Answer:
import numpy as np
np is the common alias for NumPy.
Answer:
A NumPy array is a grid of values of the same data type, indexed by a tuple of non-negative integers.
Example:
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr)
| Feature | Python List | NumPy Array |
|---|---|---|
| Data Type | Can store mixed types | Stores single type |
| Performance | Slower | Faster |
| Memory Usage | More | Less |
| Operations | Element-wise loops | Vectorized |
Answer:
import numpy as np
# 1D array
arr1 = np.array([1, 2, 3])
# 2D array
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
# 3D array
arr3 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
Answer:
1D array: Single row or column.
2D array: Matrix with rows and columns.
3D array: Cubes or higher dimensional data.
Answer:
import numpy as np
arr = np.array([[1,2,3],[4,5,6]])
print(arr.ndim) # Output: 2
Answer:
print(arr.shape) # Output: (2, 3)
shape returns a tuple representing (rows, columns, …).
Answer:
print(arr.dtype)
You can also define a data type explicitly:
arr = np.array([1,2,3], dtype=float)
Answer:
# Zeros
zeros = np.zeros((2,3))
# Ones
ones = np.ones((3,2))
# Range
range_arr = np.arange(0,10,2) # 0,2,4,6,8
# Linearly spaced numbers
linspace_arr = np.linspace(0,1,5) # 5 numbers between 0 and 1
Answer:
arr = np.arange(6)
reshaped = arr.reshape((2,3))
Reshape changes the dimensions but keeps data same.
Total elements must match.
Answer:
a = np.array([1,2,3])
b = np.array([4,5,6])
print(a + b) # [5,7,9]
print(a * b) # [4,10,18]
print(a - b) # [-3,-3,-3]
print(a / b) # [0.25,0.4,0.5]
Answer:
Broadcasting allows NumPy to perform operations on arrays of different shapes.
Example:
a = np.array([1,2,3])
b = 2
print(a + b) # [3,4,5]
Answer:
arr = np.array([10,20,30,40,50])
print(arr[1:4]) # [20,30,40]
print(arr[:3]) # [10,20,30]
print(arr[-2:]) # [40,50]
For 2D arrays:
arr2d = np.array([[1,2,3],[4,5,6]])
print(arr2d[0,1]) # 2
print(arr2d[:,2]) # [3,6] # All rows, 3rd column
Answer:
arr = np.array([1,2,3,4,5])
print(arr.max()) # 5
print(arr.min()) # 1
print(arr.sum()) # 15
print(arr.mean()) # 3.0
print(np.median(arr)) # 3.0
Answer:
import numpy as np
# Random float [0,1)
rand_float = np.random.rand(3,2)
# Random integer [low, high)
rand_int = np.random.randint(1,10,size=(2,3))
# Random normal distribution
rand_normal = np.random.randn(3,3)
Answer:
a = np.array([1,2,3])
b = np.array([4,5,6])
concatenated = np.concatenate((a,b)) # [1,2,3,4,5,6]
# For 2D arrays along axis
arr1 = np.array([[1,2],[3,4]])
arr2 = np.array([[5,6],[7,8]])
np.concatenate((arr1, arr2), axis=0) # Stack rows
np.concatenate((arr1, arr2), axis=1) # Stack columns
Answer:
View: Shares data with original array.
Copy: Creates a separate array.
arr = np.array([1,2,3])
v = arr.view()
c = arr.copy()
arr[0] = 100
print(v) # [100,2,3]
print(c) # [1,2,3]
Answer:
arr = np.array([1,2,3])
print(arr.nbytes) # Total bytes used
Answer:
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
print(np.dot(a,b)) # Matrix multiplication
print(a.T) # Transpose
print(np.linalg.inv(a)) # Inverse
Answer:
arr = np.array([1,2,np.nan,4])
np.isnan(arr) # [False, False, True, False]
np.nan_to_num(arr) # Replace nan with 0
Answer:
Data analysis: Handling large datasets efficiently.
Machine learning: Feeding data into models.
Image processing: Representing images as arrays.
Scientific computing: Simulations and mathematical modeling.
np.array and np.asarray?Answer:
np.array() creates a new array and copies data by default.
np.asarray() converts input to an array without copying if possible.
import numpy as np
lst = [1, 2, 3]
arr1 = np.array(lst)
arr2 = np.asarray(lst)
lst[0] = 100
print(arr1) # [1,2,3] -> independent copy
print(arr2) # [100,2,3] -> shares data
Answer:
Flatten converts a multi-dimensional array into 1D.
arr = np.array([[1,2,3],[4,5,6]])
flat1 = arr.flatten() # returns a copy
flat2 = arr.ravel() # returns a view
print(flat1) # [1 2 3 4 5 6]
print(flat2) # [1 2 3 4 5 6]
flatten() and ravel()?Answer:
| Method | Returns | Copies Data? |
|---|---|---|
| flatten() | 1D array copy | Yes |
| ravel() | 1D array view | No (if possible) |
ravel() is memory efficient.
flatten() is safer if you don’t want changes in the original array.
Answer:
a = np.array([1,2,3])
b = np.array([4,5,6])
v_stack = np.vstack((a,b)) # vertical stack
h_stack = np.hstack((a,b)) # horizontal stack
print(v_stack)
# [[1 2 3]
# [4 5 6]]
print(h_stack) # [1 2 3 4 5 6]
Answer:
arr = np.array([1,2,3,4,5,6])
split_arr = np.split(arr, 3) # Splits into 3 equal arrays
print(split_arr) # [array([1,2]), array([3,4]), array([5,6])]
np.array_split() can split arrays unequally if size doesn’t divide evenly.
Answer:
Boolean arrays are arrays of True/False values used for filtering.
arr = np.array([1,2,3,4,5])
bool_arr = arr > 3
print(bool_arr) # [False False False True True]
filtered = arr[arr > 3]
print(filtered) # [4 5]
Answer:
You can select specific elements using arrays of indices.
arr = np.array([10,20,30,40,50])
indices = [0,2,4]
print(arr[indices]) # [10,30,50]
For 2D arrays:
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
rows = np.array([0,2])
cols = np.array([1,2])
print(arr2d[rows[:,None], cols]) # [[2 3] [8 9]]
np.dot and np.matmul?Answer:
np.dot() performs dot product for vectors or matrix multiplication.
np.matmul() is specifically for matrix multiplication, especially for 2D or higher.
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
print(np.dot(a,b)) # [[19 22],[43 50]]
print(np.matmul(a,b)) # [[19 22],[43 50]]
Difference is more visible in higher-dimensional arrays (batch matrix multiplication).
Answer:
arr = np.array([1,2,3,4])
print(np.cumsum(arr)) # [1 3 6 10]
print(np.cumprod(arr)) # [1 2 6 24]
print(np.diff(arr)) # [1 1 1]
Useful in statistics, time-series, and analysis.
Answer:
arr = np.array([3,1,4,2])
sorted_arr = np.sort(arr)
print(sorted_arr) # [1 2 3 4]
# In-place sort
arr.sort()
print(arr) # [1 2 3 4]
For 2D arrays:
arr2d = np.array([[3,2,1],[6,5,4]])
print(np.sort(arr2d, axis=1)) # Sort rows
print(np.sort(arr2d, axis=0)) # Sort columns
Answer:
arr = np.array([1,2,2,3,3,3])
unique = np.unique(arr)
print(unique) # [1 2 3]
np.unique() also returns indices or counts:
values, counts = np.unique(arr, return_counts=True)
print(values, counts) # [1 2 3] [1 2 3]
where in NumPy?Answer:
np.where() finds indices satisfying a condition.
arr = np.array([1,2,3,4,5])
indices = np.where(arr > 3)
print(indices) # (array([3,4]),)
print(arr[indices]) # [4 5]
Answer:
arr = np.array([1,2,3,4])
np.save('my_array.npy', arr) # Save as binary file
loaded_arr = np.load('my_array.npy') # Load array
print(loaded_arr)
For multiple arrays: np.savez() and np.load().
Answer:
arr = np.array([1,2,3,4,5,6])
print(np.mean(arr)) # Average
print(np.median(arr)) # Median
print(np.std(arr)) # Standard deviation
print(np.var(arr)) # Variance
print(np.ptp(arr)) # Peak-to-peak (max-min)
Answer:
a = np.array([True, False, True])
b = np.array([False, False, True])
print(np.logical_and(a,b)) # [False False True]
print(np.logical_or(a,b)) # [True False True]
print(np.logical_not(a)) # [False True False]
ix_?Answer:
arr = np.arange(16).reshape(4,4)
rows = [0,2]
cols = [1,3]
print(arr[np.ix_(rows, cols)])
# [[1 3]
# [9 11]]
Useful for selecting specific rows and columns simultaneously.
np.dot, np.vdot, and np.inner?Answer:
np.dot(a, b): Performs matrix multiplication or dot product depending on input shapes.
np.vdot(a, b): Returns the dot product of flattened arrays, complex conjugate if complex.
np.inner(a, b): Returns sum of products of elements; works for vectors and higher-dimensional arrays differently than dot.
a = np.array([1,2])
b = np.array([3,4])
print(np.dot(a,b)) # 11
print(np.vdot(a,b)) # 11
print(np.inner(a,b))# 11
Answer:
a = np.array([1,2,3])
b = np.array([2,2,1])
print(a == b) # [False True False]
print(a != b) # [True False True]
print(a > b) # [False False True]
Can use these boolean arrays for filtering or conditional operations.
Answer:
Using np.vectorize or universal functions (ufuncs):
arr = np.array([1,2,3,4])
def square(x):
return x**2
vec_square = np.vectorize(square)
print(vec_square(arr)) # [1 4 9 16]
# Using ufunc
print(np.sqrt(arr)) # [1. 1.414 1.732 2.]
Answer:
arr = np.array([[1,2],[3,4]])
print(np.trace(arr)) # 1+4 = 5
print(np.linalg.det(arr)) # -2.0
print(np.linalg.matrix_rank(arr)) # 2
Useful in linear algebra and ML algorithms.
Answer:
arr = np.array([[1,2],[2,1]])
values, vectors = np.linalg.eig(arr)
print(values) # Eigenvalues
print(vectors) # Eigenvectors (columns)
Frequently used in PCA and dimensionality reduction.
Answer:
Broadcasting allows operations between arrays of different shapes.
Rules:
If shapes are different, the smaller shape is padded with 1s on left.
Dimensions are compatible if they are equal or one of them is 1.
a = np.array([[1,2,3],[4,5,6]])
b = np.array([1,0,1])
print(a + b)
# [[2 2 4]
# [5 5 7]]
Answer:
Fancy indexing allows selecting elements using arrays of indices.
arr = np.arange(10)
indices = [1,3,5,7]
print(arr[indices]) # [1 3 5 7]
Works for 2D arrays to select specific rows/columns:
arr2d = np.arange(16).reshape(4,4)
print(arr2d[[0,2],[1,3]]) # [1 11]
Answer:
arr = np.array([1,2,3,4,5])
arr[arr % 2 == 0] = 0 # Replace even numbers with 0
print(arr) # [1 0 3 0 5]
# Using np.where
arr = np.array([1,2,3,4,5])
new_arr = np.where(arr % 2 == 0, -1, arr)
print(new_arr) # [1 -1 3 -1 5]
Answer:
arr = np.array([[1,2,3],[4,5,6]])
print(np.cumsum(arr, axis=0)) # Sum along columns
# [[1 2 3]
# [5 7 9]]
print(np.cumprod(arr, axis=1)) # Product along rows
# [[1 2 6]
# [4 20 120]]
Answer:
A = np.array([[3,1],[1,2]])
B = np.array([9,8])
# Solve Ax = B
x = np.linalg.solve(A,B)
print(x) # [2. 3.]
# Pseudo-inverse for non-square matrices
A = np.array([[1,2,3],[4,5,6]])
pinv = np.linalg.pinv(A)
print(pinv)
Used in regression, machine learning, and signal processing.
Answer:
arr = np.array([1,2,3,4])
fft_arr = np.fft.fft(arr)
ifft_arr = np.fft.ifft(fft_arr)
print(fft_arr) # Frequency domain
print(ifft_arr) # Back to time domain
Useful in signal processing, image processing, and audio analysis.
Answer:
Structured arrays store heterogeneous data like a table.
data = np.array([(1, 'Alice', 3.5),
(2, 'Bob', 4.0)],
dtype=[('id', 'i4'), ('name', 'U10'), ('gpa', 'f4')])
print(data['name']) # ['Alice' 'Bob']
print(data['gpa']) # [3.5 4.0]
Used in datasets where each column has a different type.
Answer:
Use vectorized operations instead of Python loops.
Use np.dot / np.matmul for matrix multiplication.
Use np.ravel() instead of flatten() if modification is not needed.
Avoid frequent array resizing; preallocate arrays.
Use numexpr or Cython for heavy computations.
Answer:
np.random.seed(42)
arr = np.random.rand(3)
print(arr) # Always same values each run
Important for reproducibility in ML experiments.
Answer:
points = np.array([[1,2],[4,6]])
dist = np.sqrt(np.sum((points[0] - points[1])**2))
print(dist) # 5.0
Can be extended for multiple points using broadcasting.
Answer:
NumPy (Numerical Python) is a Python library used for numerical computing. It provides support for:
Multi-dimensional arrays (ndarray)
Mathematical operations on arrays
Linear algebra, Fourier transforms, and random number generation
Why use NumPy:
High-performance array operations compared to Python lists
Efficient memory storage
Integration with C/C++/Fortran code
Basis for libraries like Pandas, SciPy, and scikit-learn
Example:
import numpy as np
arr = np.array([1, 2, 3])
print(arr * 2) # Output: [2 4 6]
Answer:
ndarray is NumPy’s core data structure, representing a multidimensional, homogeneous array of fixed-size items.
Key attributes:
ndarray.ndim → number of dimensions
ndarray.shape → shape of the array
ndarray.size → total elements
ndarray.dtype → data type of elements
Example:
arr = np.array([[1,2,3],[4,5,6]])
print(arr.ndim) # 2
print(arr.shape) # (2,3)
print(arr.size) # 6
print(arr.dtype) # int64
| Feature | NumPy Array | Python List |
|---|---|---|
| Storage | Homogeneous, fixed type | Heterogeneous |
| Performance | Faster (vectorized operations) | Slower (loops) |
| Memory | Less memory | More memory |
| Functionality | Many mathematical functions | Limited |
Example:
import numpy as np
import time
arr = np.arange(1000000)
start = time.time()
arr2 = arr * 2
print("NumPy:", time.time() - start)
lst = list(range(1000000))
start = time.time()
lst2 = [x*2 for x in lst]
print("List:", time.time() - start)
Answer:
Broadcasting allows operations on arrays of different shapes without making copies of data. NumPy automatically expands smaller arrays to match the shape of larger arrays.
Rules:
If arrays have different dimensions, prepend 1 to the smaller shape.
Arrays are compatible if dimensions are equal or one is 1.
Arrays are broadcasted to the maximum dimension.
Example:
a = np.array([[1,2,3],[4,5,6]])
b = np.array([10,20,30])
print(a + b)
# Output:
# [[11 22 33]
# [14 25 36]]
Answer:
NumPy arrays can be created using:
np.array() → from Python lists/tuples
np.arange(start, stop, step) → like range()
np.zeros(shape) → array of zeros
np.ones(shape) → array of ones
np.empty(shape) → uninitialized array
np.linspace(start, stop, num) → evenly spaced numbers
Example:
arr1 = np.zeros((2,3))
arr2 = np.ones((3,3))
arr3 = np.linspace(0, 1, 5)
Answer:
View: A new array object that shares data with the original array. Changes affect both arrays.
Copy: A new array object with its own data. Changes do not affect the original array.
Example:
a = np.array([1,2,3])
b = a.view()
c = a.copy()
b[0] = 10
c[1] = 20
print(a) # [10 2 3] -> affected by view
print(c) # [1 20 3] -> independent
Answer:
Integer indexing: Access elements using integers
Boolean indexing: Use a boolean array to filter elements
Fancy indexing: Index arrays using arrays of integers
Example:
a = np.array([10,20,30,40])
index = [0,2]
print(a[index]) # [10 30]
mask = a > 20
print(a[mask]) # [30 40]
Answer:
NumPy provides numpy.linalg module for:
Matrix multiplication: np.dot(a,b) or @
Determinant: np.linalg.det()
Inverse: np.linalg.inv()
Eigenvalues: np.linalg.eig()
Example:
A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
C = np.dot(A,B)
inv_A = np.linalg.inv(A)
det_A = np.linalg.det(A)
Answer:
reshape(), flatten(), ravel() → shape changes
concatenate(), vstack(), hstack() → join arrays
split(), hsplit(), vsplit() → split arrays
transpose() → change axes
Example:
a = np.arange(6).reshape(2,3)
print(a.flatten()) # [0 1 2 3 4 5]
print(np.hstack([a,a])) # horizontal stack
Answer:
NumPy has np.nan, np.inf and functions:
np.isnan() → check NaN
np.isinf() → check infinity
np.nan_to_num() → replace NaN/Inf
np.nanmean(), np.nansum() → ignore NaN in computations
Example:
arr = np.array([1, np.nan, 3, np.inf])
print(np.isnan(arr)) # [False True False False]
print(np.nan_to_num(arr)) # [1. 0. 3. inf]
ravel() and flatten()?| Feature | ravel() |
flatten() |
|---|---|---|
| Return | View if possible | Copy always |
| Memory | Efficient | Takes more memory |
| Modifying original | Yes | No |
Example:
a = np.array([[1,2],[3,4]])
b = a.ravel()
c = a.flatten()
b[0] = 10
print(a) # [[10 2] [3 4]]
c[1] = 20
print(a) # [[10 2] [3 4]] -> unaffected
Answer:
Use numpy.random module:
np.random.rand(d0,d1) → uniform [0,1)
np.random.randn(d0,d1) → normal distribution
np.random.randint(low, high, size) → random integers
np.random.choice(array, size) → random sampling
Example:
np.random.seed(0) # reproducible
print(np.random.rand(2,3))
print(np.random.randint(1,10,5))
Answer:
Use vectorized operations instead of loops
Avoid Python lists for large data
Use dtype optimization (e.g., float32 vs float64)
Leverage numexpr or Cython for heavy computations
Example of vectorization:
a = np.arange(1000000)
# Loop
# for i in range(len(a)): a[i] = a[i]*2
# Vectorized
a = a*2
Answer:
C-contiguous (row-major): rows are stored consecutively in memory
F-contiguous (column-major): columns are stored consecutively in memory
Check memory layout:
a = np.array([[1,2],[3,4]], order='C')
b = np.array([[1,2],[3,4]], order='F')
print(a.flags) # C_CONTIGUOUS = True
print(b.flags) # F_CONTIGUOUS = True
Answer:
Structured arrays store heterogeneous data like a table (columns with different types).
Example:
data = np.array([(1, 'Alice', 25), (2, 'Bob', 30)],
dtype=[('id', 'i4'), ('name', 'U10'), ('age', 'i4')])
print(data['name']) # ['Alice' 'Bob']
einsum in NumPy.Answer:
numpy.einsum is a powerful function for summations, products, and tensor contractions using Einstein notation. It avoids creating intermediate arrays, improving speed and memory.
Example:
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
# Matrix multiplication using einsum
c = np.einsum('ij,jk->ik', a, b)
print(c)
np.dot(), np.matmul(), and @.| Function | Use Case |
|---|---|
np.dot(a,b) |
Dot product, 1D & 2D arrays |
np.matmul(a,b) |
Matrix multiplication, 2D or higher |
a @ b |
Python operator equivalent to matmul |
Use memory-mapped arrays (np.memmap)
Use float32 instead of float64 if precision allows
Use in-place operations to save memory
Avoid unnecessary copies
Example:
fp = np.memmap('data.dat', dtype='float32', mode='w+', shape=(1000,1000))
Answer:
np.save('filename.npy', array) → save single array
np.load('filename.npy') → load array
np.savez('filename.npz', a=array1, b=array2) → save multiple arrays
Example:
np.save('arr.npy', a)
b = np.load('arr.npy')
How does np.vectorize work?
How to perform masking operations efficiently?
Explain chunking large arrays for memory efficiency.
How to combine structured arrays with views?
Explain NumPy ufuncs and custom ufuncs.
Explain strided tricks using as_strided.
Answer:
A ufunc (universal function) is a vectorized function that operates element-wise on arrays. They are implemented in C for speed.
Examples of ufuncs:
np.add, np.subtract, np.multiply, np.divide
np.sin, np.cos, np.exp, np.log
Example:
a = np.array([1,2,3])
b = np.array([4,5,6])
print(np.add(a,b)) # [5 7 9]
Custom ufuncs:
from numpy import vectorize
def square(x):
return x*x
vfunc = np.vectorize(square)
print(vfunc(np.array([1,2,3]))) # [1 4 9]
Answer:
In-place operations modify the existing array rather than creating a new one. This saves memory and improves speed.
Example:
a = np.array([1,2,3])
a += 5 # in-place addition
print(a) # [6 7 8]
astype and viewAnswer:
astype(dtype) → creates a copy of the array with a new dtype
view(dtype) → creates a view (shared data) of the array with a different dtype
Example:
a = np.array([1,2,3], dtype=np.int32)
b = a.astype(np.float64) # copy
c = a.view(np.uint32) # view
Answer:
NumPy provides cumulative operations:
np.cumsum() → cumulative sum
np.cumprod() → cumulative product
Example:
arr = np.array([1,2,3,4])
print(np.cumsum(arr)) # [1 3 6 10]
print(np.cumprod(arr)) # [1 2 6 24]
np.memmap)Answer:
np.memmap allows you to handle large arrays on disk without loading them entirely into memory.
Example:
fp = np.memmap('data.dat', dtype='float32', mode='w+', shape=(1000,1000))
fp[:10,:10] = 1
fp.flush() # write to disk
as_strided and its useAnswer:
numpy.lib.stride_tricks.as_strided lets you create new views on arrays with custom strides. This is memory-efficient but requires careful use to avoid memory errors.
Example:
from numpy.lib.stride_tricks import as_strided
a = np.arange(10)
strided = as_strided(a, shape=(4,3), strides=(a.itemsize, a.itemsize))
print(strided)
Answer:
Advanced broadcasting involves reshaping arrays or adding dummy dimensions to match shapes for operations.
Example:
a = np.array([1,2,3])[:, np.newaxis] # shape (3,1)
b = np.array([10,20]) # shape (2,)
print(a + b)
# [[11 21]
# [12 22]
# [13 23]]
np.einsum for tensor operationsAnswer:
einsum allows complex tensor contractions, summations, and multiplications with a single expression. It avoids temporary arrays.
Example:
A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
C = np.einsum('ij,jk->ik', A, B) # Matrix multiplication
Answer:
Use in-place operations
Reduce dtype size (float32 vs float64)
Avoid Python loops
Use vectorized operations
Use numexpr or Cython for very large computations
Answer:
Use boolean indexing or np.ma module for masked arrays.
Example:
a = np.array([1,2,3,4,5])
mask = a > 3
print(a[mask]) # [4 5]
import numpy.ma as ma
masked = ma.masked_greater(a, 3)
print(masked) # [1 2 3 -- --]
np.lib.recfunctionsAnswer:
recfunctions helps combine, split, or modify structured arrays. Useful in heterogeneous datasets.
Example:
from numpy.lib import recfunctions as rfn
a = np.array([(1,'Alice'),(2,'Bob')], dtype=[('id','i4'),('name','U10')])
b = np.array([(3,'Charlie')], dtype=a.dtype)
combined = rfn.stack_arrays([a,b], usemask=False)
print(combined)
Answer:
numpy.char module allows vectorized string operations on arrays of strings.
Example:
arr = np.array(['apple', 'banana'])
print(np.char.upper(arr)) # ['APPLE' 'BANANA']
print(np.char.add(arr, '_fruit')) # ['apple_fruit' 'banana_fruit']
ravel() and flatten() in detailAnswer:
| Feature | ravel | flatten |
|---|---|---|
| Copy | No, returns view | Yes, returns copy |
| Memory | Efficient | More memory usage |
| Modifying original array | Affects original if view | Does not affect original |
np.copy() and np.deepcopy()Answer:
np.copy() → shallow copy, independent array but shares objects if dtype=object
deepcopy() → fully independent including objects in array
Answer:
Use axis parameter:
Example:
a = np.array([[1,2,3],[4,5,6]])
print(np.sum(a, axis=0)) # column-wise: [5 7 9]
print(np.sum(a, axis=1)) # row-wise: [6 15]
np.tile() vs np.repeat()Answer:
np.tile() → repeats the whole array
np.repeat() → repeats each element
Example:
a = np.array([1,2])
print(np.tile(a, 2)) # [1 2 1 2]
print(np.repeat(a, 2)) # [1 1 2 2]
Answer:
Use vectorized comparisons:
a = np.array([1,2,3])
b = np.array([2,2,4])
print(a == b) # [False True False]
print(a > b) # [False False False]
np.unique() and its return optionsAnswer:
np.unique() returns unique elements and optionally:
return_index → indices in original array
return_inverse → map original to unique
return_counts → frequency of each unique element
Example:
arr = np.array([1,2,2,3])
uniq, counts = np.unique(arr, return_counts=True)
print(uniq, counts) # [1 2 3] [1 2 1]
np.where() and its usesAnswer:
np.where(condition, x, y) selects x where condition is True and y where False.
Example:
a = np.array([1,2,3,4])
b = np.where(a>2, a*2, a+1)
print(b) # [2 3 6 8]
Answer:
Fancy indexing allows selection using multiple integer arrays.
Example:
a = np.arange(9).reshape(3,3)
row = np.array([0,1])
col = np.array([1,2])
print(a[row, col]) # [1 5]
Answer:
Use broadcasting instead of loops:
points = np.array([[0,0],[1,1],[2,2]])
diff = points[:,np.newaxis,:] - points[np.newaxis,:,:]
dist = np.sqrt(np.sum(diff**2, axis=-1))
np.add.reduce and np.multiply.reduceAnswer:
These are ufunc reductions for summing or multiplying along an axis:
a = np.array([1,2,3])
print(np.add.reduce(a)) # 6
print(np.multiply.reduce(a)) # 6
np.fromfunctionAnswer:
Creates an array by applying a function to indices.
a = np.fromfunction(lambda i,j: i+j, (3,3), dtype=int)
print(a)
np.meshgrid and its useAnswer:
Generates coordinate matrices from coordinate vectors. Used in plotting and vectorized computations.
x = np.array([1,2])
y = np.array([3,4])
X, Y = np.meshgrid(x, y)
Answer:
Structured arrays allow heterogeneous data like a table.
Record arrays allow attribute access with arr.fieldname.
NaN) in large arrays?Use np.isnan() to detect
Use np.nanmean(), np.nansum(), etc. to ignore NaNs
Replace NaNs with np.nan_to_num()
np.loadtxt and np.genfromtxt| Feature | loadtxt | genfromtxt |
|---|---|---|
| Missing values | Cannot handle | Can handle |
| Data type | Single | Mixed |
| Speed | Faster | Slower |
np.hstack, np.vstack, np.concatenatehstack → stack horizontally
vstack → stack vertically
concatenate → flexible axis selection
Answer:
A = np.array([[1,2],[3,4]])
mask = A > 2
A[mask] = 0
np.dot vs np.tensordotnp.dot → 1D/2D dot product
np.tensordot → generalized contraction along axes
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
np.tensordot(a,b, axes=1)