NumPY

NumPY

Top Interview Questions

About NumPY

 

Introduction to NumPy

NumPy, short for Numerical Python, is an open-source Python library designed for numerical and scientific computing. It is the foundational library for many Python-based data science, machine learning, and artificial intelligence projects. Developed by Travis Oliphant in 2005, NumPy provides support for large multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays efficiently.

The key advantage of NumPy is its performance. While Python’s standard data structures like lists and tuples are flexible, they are not optimized for numerical computations. NumPy arrays, however, are implemented in C, which allows for much faster computation, memory efficiency, and vectorized operations.


Core Features of NumPy

  1. N-Dimensional Array (ndarray):
    The heart of NumPy is the ndarray (N-dimensional array) object. Unlike Python lists, ndarray stores elements of the same data type in contiguous memory, allowing for fast mathematical operations. NumPy arrays support various dimensions, such as 1D arrays (vectors), 2D arrays (matrices), and even higher-dimensional arrays.

  2. Vectorized Operations:
    NumPy allows element-wise operations on arrays without explicit loops, also known as vectorization. For example, adding two arrays of the same size or multiplying an array by a scalar can be performed in a single operation. This significantly improves speed and simplifies code.

  3. Mathematical Functions:
    NumPy provides a wide range of mathematical functions, including linear algebra, statistical operations, Fourier transforms, and random number generation. These functions are optimized for performance and can handle large datasets efficiently.

  4. Broadcasting:
    Broadcasting is a powerful feature that allows operations on arrays of different shapes. NumPy automatically expands smaller arrays to match the shape of larger arrays during arithmetic operations, reducing the need for manual reshaping.

  5. Memory Efficiency:
    NumPy arrays consume less memory than Python lists because they store elements of the same data type in contiguous memory blocks. This efficiency allows handling of very large datasets in scientific and engineering applications.

  6. Integration with Other Libraries:
    NumPy serves as the base for many Python libraries like Pandas, Matplotlib, SciPy, TensorFlow, and scikit-learn. Its array structures and functions provide a standard interface for data manipulation and numerical computations.


Creating NumPy Arrays

NumPy arrays can be created in multiple ways:

  1. From Python lists or tuples:

    import numpy as np
    arr = np.array([1, 2, 3, 4])
    print(arr)  # Output: [1 2 3 4]
    
  2. Using built-in functions:

    • np.zeros(shape) creates an array filled with zeros.

    • np.ones(shape) creates an array filled with ones.

    • np.arange(start, stop, step) generates values within a specified range.

    • np.linspace(start, stop, num) generates a specified number of evenly spaced values.

    Example:

    arr = np.arange(0, 10, 2)
    print(arr)  # Output: [0 2 4 6 8]
    
  3. Random arrays:
    NumPy has a random module to generate arrays with random numbers.

    rand_arr = np.random.rand(3, 3)  # 3x3 array with values between 0 and 1
    

Array Operations

NumPy provides a wide range of operations on arrays:

  1. Arithmetic Operations:
    Element-wise operations like addition, subtraction, multiplication, and division are straightforward:

    a = np.array([1, 2, 3])
    b = np.array([4, 5, 6])
    print(a + b)  # Output: [5 7 9]
    print(a * b)  # Output: [4 10 18]
    
  2. Statistical Functions:
    NumPy offers functions like mean(), median(), std() (standard deviation), var() (variance), and sum():

    arr = np.array([1, 2, 3, 4, 5])
    print(np.mean(arr))  # Output: 3.0
    print(np.std(arr))   # Output: 1.4142135623730951
    
  3. Linear Algebra:
    Operations like matrix multiplication, determinants, inverses, and eigenvalues are possible using NumPy:

    A = np.array([[1, 2], [3, 4]])
    B = np.array([[2, 0], [1, 2]])
    print(np.dot(A, B))
    
  4. Reshaping and Indexing:
    NumPy allows reshaping arrays and extracting specific elements using slicing or boolean indexing:

    arr = np.arange(12)
    arr = arr.reshape(3, 4)
    print(arr[1, 2])  # Access element at row 1, column 2
    
  5. Broadcasting Example:

    A = np.array([[1, 2, 3], [4, 5, 6]])
    B = np.array([1, 0, 1])
    print(A + B)
    # Output:
    # [[2 2 4]
    #  [5 5 7]]
    

Performance Advantage

The biggest advantage of NumPy over standard Python lists is speed. NumPy operations are executed at C-speed, bypassing the Python interpreter’s overhead. For instance, adding two arrays of 1 million elements using NumPy is several times faster than using Python loops. Additionally, memory efficiency allows handling larger datasets without consuming excessive system resources.


Applications of NumPy

NumPy is widely used across multiple domains:

  1. Data Analysis:
    NumPy arrays form the backbone of Pandas, enabling efficient data manipulation, filtering, and aggregation.

  2. Machine Learning:
    Libraries like scikit-learn, TensorFlow, and PyTorch rely heavily on NumPy for data preprocessing and matrix operations.

  3. Scientific Computing:
    NumPy supports numerical simulations, statistical computations, and solving differential equations, making it popular in physics, chemistry, and biology research.

  4. Finance:
    NumPy is used for risk analysis, portfolio optimization, and modeling stock prices due to its statistical and matrix computation capabilities.

  5. Image Processing:
    Libraries like OpenCV use NumPy arrays to represent images and perform pixel-level manipulations efficiently.


Advantages of NumPy

  • High-performance operations with large arrays.

  • Reduced memory consumption compared to native Python lists.

  • Supports multi-dimensional arrays and matrices.

  • Wide range of built-in mathematical functions.

  • Seamless integration with other Python libraries.

  • Enables vectorized operations, avoiding explicit loops.

  • Supports broadcasting, making operations flexible and concise.


Limitations of NumPy

  • NumPy arrays are homogeneous, meaning all elements must be of the same type.

  • Not ideal for handling missing data; other libraries like Pandas are better suited.

  • Less intuitive for beginners compared to Python lists for small, simple tasks.

  • Limited support for symbolic mathematics (use SymPy for that).

Fresher Interview Questions

 

1. What is NumPy?

Answer:
NumPy (Numerical Python) is a Python library used for numerical computations. It provides support for:

  • Large, multi-dimensional arrays and matrices.

  • Mathematical functions to operate on these arrays.

  • Efficient memory and performance compared to regular Python lists.

NumPy is widely used in data science, machine learning, and scientific computing.


2. What are the advantages of NumPy over Python lists?

Answer:

  1. Performance: NumPy arrays are faster due to optimized C code and contiguous memory allocation.

  2. Memory Efficiency: NumPy arrays consume less memory than Python lists.

  3. Convenience: Supports vectorized operations and broadcasting.

  4. Functionality: Provides built-in functions for linear algebra, statistics, and random number generation.


3. How do you install NumPy?

Answer:

pip install numpy

4. How do you import NumPy in Python?

Answer:

import numpy as np
  • np is the common alias for NumPy.


5. What is a NumPy array?

Answer:
A NumPy array is a grid of values of the same data type, indexed by a tuple of non-negative integers.

Example:

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr)

6. Difference between Python list and NumPy array

Feature Python List NumPy Array
Data Type Can store mixed types Stores single type
Performance Slower Faster
Memory Usage More Less
Operations Element-wise loops Vectorized

7. How to create a NumPy array?

Answer:

import numpy as np

# 1D array
arr1 = np.array([1, 2, 3])

# 2D array
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

# 3D array
arr3 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

8. What are the different types of arrays in NumPy?

Answer:

  1. 1D array: Single row or column.

  2. 2D array: Matrix with rows and columns.

  3. 3D array: Cubes or higher dimensional data.


9. How to check the dimension of a NumPy array?

Answer:

import numpy as np
arr = np.array([[1,2,3],[4,5,6]])
print(arr.ndim)  # Output: 2

10. How to check the shape of a NumPy array?

Answer:

print(arr.shape)  # Output: (2, 3)
  • shape returns a tuple representing (rows, columns, …).


11. How to check the data type of elements in a NumPy array?

Answer:

print(arr.dtype)
  • You can also define a data type explicitly:

arr = np.array([1,2,3], dtype=float)

12. How to create arrays of zeros, ones, or a range?

Answer:

# Zeros
zeros = np.zeros((2,3))

# Ones
ones = np.ones((3,2))

# Range
range_arr = np.arange(0,10,2)  # 0,2,4,6,8

# Linearly spaced numbers
linspace_arr = np.linspace(0,1,5)  # 5 numbers between 0 and 1

13. How to reshape an array?

Answer:

arr = np.arange(6)
reshaped = arr.reshape((2,3))
  • Reshape changes the dimensions but keeps data same.

  • Total elements must match.


14. How to perform element-wise operations?

Answer:

a = np.array([1,2,3])
b = np.array([4,5,6])

print(a + b)  # [5,7,9]
print(a * b)  # [4,10,18]
print(a - b)  # [-3,-3,-3]
print(a / b)  # [0.25,0.4,0.5]

15. What is broadcasting in NumPy?

Answer:
Broadcasting allows NumPy to perform operations on arrays of different shapes.

Example:

a = np.array([1,2,3])
b = 2
print(a + b)  # [3,4,5]

16. How to slice and index arrays?

Answer:

arr = np.array([10,20,30,40,50])
print(arr[1:4])  # [20,30,40]
print(arr[:3])   # [10,20,30]
print(arr[-2:])  # [40,50]
  • For 2D arrays:

arr2d = np.array([[1,2,3],[4,5,6]])
print(arr2d[0,1])  # 2
print(arr2d[:,2])  # [3,6]  # All rows, 3rd column

17. How to find max, min, sum, mean, median?

Answer:

arr = np.array([1,2,3,4,5])
print(arr.max())   # 5
print(arr.min())   # 1
print(arr.sum())   # 15
print(arr.mean())  # 3.0
print(np.median(arr))  # 3.0

18. How to generate random numbers?

Answer:

import numpy as np

# Random float [0,1)
rand_float = np.random.rand(3,2)

# Random integer [low, high)
rand_int = np.random.randint(1,10,size=(2,3))

# Random normal distribution
rand_normal = np.random.randn(3,3)

19. How to concatenate arrays?

Answer:

a = np.array([1,2,3])
b = np.array([4,5,6])
concatenated = np.concatenate((a,b))  # [1,2,3,4,5,6]

# For 2D arrays along axis
arr1 = np.array([[1,2],[3,4]])
arr2 = np.array([[5,6],[7,8]])
np.concatenate((arr1, arr2), axis=0)  # Stack rows
np.concatenate((arr1, arr2), axis=1)  # Stack columns

20. How to copy vs view an array?

Answer:

  • View: Shares data with original array.

  • Copy: Creates a separate array.

arr = np.array([1,2,3])
v = arr.view()
c = arr.copy()
arr[0] = 100
print(v)  # [100,2,3]
print(c)  # [1,2,3]

21. How to check the memory size of an array?

Answer:

arr = np.array([1,2,3])
print(arr.nbytes)  # Total bytes used

22. How to perform matrix operations?

Answer:

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

print(np.dot(a,b))  # Matrix multiplication
print(a.T)          # Transpose
print(np.linalg.inv(a))  # Inverse

23. How to handle missing values in NumPy?

Answer:

arr = np.array([1,2,np.nan,4])
np.isnan(arr)  # [False, False, True, False]
np.nan_to_num(arr)  # Replace nan with 0

24. How is NumPy used in real-world applications?

Answer:

  • Data analysis: Handling large datasets efficiently.

  • Machine learning: Feeding data into models.

  • Image processing: Representing images as arrays.

  • Scientific computing: Simulations and mathematical modeling.


25. What is the difference between np.array and np.asarray?

Answer:

  • np.array() creates a new array and copies data by default.

  • np.asarray() converts input to an array without copying if possible.

import numpy as np
lst = [1, 2, 3]
arr1 = np.array(lst)
arr2 = np.asarray(lst)

lst[0] = 100
print(arr1)  # [1,2,3]  -> independent copy
print(arr2)  # [100,2,3] -> shares data

26. How do you flatten a NumPy array?

Answer:
Flatten converts a multi-dimensional array into 1D.

arr = np.array([[1,2,3],[4,5,6]])
flat1 = arr.flatten()  # returns a copy
flat2 = arr.ravel()    # returns a view
print(flat1)  # [1 2 3 4 5 6]
print(flat2)  # [1 2 3 4 5 6]

27. What is the difference between flatten() and ravel()?

Answer:

Method Returns Copies Data?
flatten() 1D array copy Yes
ravel() 1D array view No (if possible)
  • ravel() is memory efficient.

  • flatten() is safer if you don’t want changes in the original array.


28. How to stack arrays vertically or horizontally?

Answer:

a = np.array([1,2,3])
b = np.array([4,5,6])

v_stack = np.vstack((a,b))  # vertical stack
h_stack = np.hstack((a,b))  # horizontal stack

print(v_stack)
# [[1 2 3]
#  [4 5 6]]
print(h_stack)  # [1 2 3 4 5 6]

29. How do you split an array?

Answer:

arr = np.array([1,2,3,4,5,6])
split_arr = np.split(arr, 3)  # Splits into 3 equal arrays
print(split_arr)  # [array([1,2]), array([3,4]), array([5,6])]
  • np.array_split() can split arrays unequally if size doesn’t divide evenly.


30. What are boolean arrays in NumPy?

Answer:
Boolean arrays are arrays of True/False values used for filtering.

arr = np.array([1,2,3,4,5])
bool_arr = arr > 3
print(bool_arr)  # [False False False True True]

filtered = arr[arr > 3]
print(filtered)  # [4 5]

31. How to perform advanced indexing?

Answer:
You can select specific elements using arrays of indices.

arr = np.array([10,20,30,40,50])
indices = [0,2,4]
print(arr[indices])  # [10,30,50]

For 2D arrays:

arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
rows = np.array([0,2])
cols = np.array([1,2])
print(arr2d[rows[:,None], cols])  # [[2 3] [8 9]]

32. What is the difference between np.dot and np.matmul?

Answer:

  • np.dot() performs dot product for vectors or matrix multiplication.

  • np.matmul() is specifically for matrix multiplication, especially for 2D or higher.

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

print(np.dot(a,b))      # [[19 22],[43 50]]
print(np.matmul(a,b))   # [[19 22],[43 50]]
  • Difference is more visible in higher-dimensional arrays (batch matrix multiplication).


33. How to calculate cumulative sum, product, and differences?

Answer:

arr = np.array([1,2,3,4])

print(np.cumsum(arr))  # [1 3 6 10]
print(np.cumprod(arr)) # [1 2 6 24]
print(np.diff(arr))    # [1 1 1]
  • Useful in statistics, time-series, and analysis.


34. How to sort a NumPy array?

Answer:

arr = np.array([3,1,4,2])
sorted_arr = np.sort(arr)
print(sorted_arr)  # [1 2 3 4]

# In-place sort
arr.sort()
print(arr)         # [1 2 3 4]

For 2D arrays:

arr2d = np.array([[3,2,1],[6,5,4]])
print(np.sort(arr2d, axis=1))  # Sort rows
print(np.sort(arr2d, axis=0))  # Sort columns

35. How to find unique elements in an array?

Answer:

arr = np.array([1,2,2,3,3,3])
unique = np.unique(arr)
print(unique)  # [1 2 3]
  • np.unique() also returns indices or counts:

values, counts = np.unique(arr, return_counts=True)
print(values, counts)  # [1 2 3] [1 2 3]

36. How to use where in NumPy?

Answer:
np.where() finds indices satisfying a condition.

arr = np.array([1,2,3,4,5])
indices = np.where(arr > 3)
print(indices)         # (array([3,4]),)
print(arr[indices])    # [4 5]

37. How to save and load NumPy arrays?

Answer:

arr = np.array([1,2,3,4])
np.save('my_array.npy', arr)          # Save as binary file
loaded_arr = np.load('my_array.npy')  # Load array
print(loaded_arr)
  • For multiple arrays: np.savez() and np.load().


38. How to calculate statistical measures?

Answer:

arr = np.array([1,2,3,4,5,6])

print(np.mean(arr))   # Average
print(np.median(arr)) # Median
print(np.std(arr))    # Standard deviation
print(np.var(arr))    # Variance
print(np.ptp(arr))    # Peak-to-peak (max-min)

39. How to perform element-wise logical operations?

Answer:

a = np.array([True, False, True])
b = np.array([False, False, True])

print(np.logical_and(a,b))  # [False False True]
print(np.logical_or(a,b))   # [True False True]
print(np.logical_not(a))    # [False True False]

40. How to handle multi-dimensional indexing with ix_?

Answer:

arr = np.arange(16).reshape(4,4)
rows = [0,2]
cols = [1,3]

print(arr[np.ix_(rows, cols)])
# [[1 3]
#  [9 11]]
  • Useful for selecting specific rows and columns simultaneously.


41. What is the difference between np.dot, np.vdot, and np.inner?

Answer:

  • np.dot(a, b): Performs matrix multiplication or dot product depending on input shapes.

  • np.vdot(a, b): Returns the dot product of flattened arrays, complex conjugate if complex.

  • np.inner(a, b): Returns sum of products of elements; works for vectors and higher-dimensional arrays differently than dot.

a = np.array([1,2])
b = np.array([3,4])

print(np.dot(a,b))  # 11
print(np.vdot(a,b)) # 11
print(np.inner(a,b))# 11

42. How to perform element-wise comparison?

Answer:

a = np.array([1,2,3])
b = np.array([2,2,1])

print(a == b)  # [False True False]
print(a != b)  # [True False True]
print(a > b)   # [False False True]
  • Can use these boolean arrays for filtering or conditional operations.


43. How to apply a function to each element?

Answer:

  • Using np.vectorize or universal functions (ufuncs):

arr = np.array([1,2,3,4])
def square(x):
    return x**2

vec_square = np.vectorize(square)
print(vec_square(arr))  # [1 4 9 16]

# Using ufunc
print(np.sqrt(arr))     # [1. 1.414 1.732 2.]

44. How to perform matrix trace, determinant, and rank?

Answer:

arr = np.array([[1,2],[3,4]])

print(np.trace(arr))          # 1+4 = 5
print(np.linalg.det(arr))     # -2.0
print(np.linalg.matrix_rank(arr)) # 2
  • Useful in linear algebra and ML algorithms.


45. How to find eigenvalues and eigenvectors?

Answer:

arr = np.array([[1,2],[2,1]])
values, vectors = np.linalg.eig(arr)

print(values)   # Eigenvalues
print(vectors)  # Eigenvectors (columns)
  • Frequently used in PCA and dimensionality reduction.


46. What is broadcasting in detail?

Answer:

  • Broadcasting allows operations between arrays of different shapes.

  • Rules:

    1. If shapes are different, the smaller shape is padded with 1s on left.

    2. Dimensions are compatible if they are equal or one of them is 1.

a = np.array([[1,2,3],[4,5,6]])
b = np.array([1,0,1])

print(a + b)
# [[2 2 4]
#  [5 5 7]]

47. How to perform fancy indexing?

Answer:

  • Fancy indexing allows selecting elements using arrays of indices.

arr = np.arange(10)
indices = [1,3,5,7]
print(arr[indices])  # [1 3 5 7]
  • Works for 2D arrays to select specific rows/columns:

arr2d = np.arange(16).reshape(4,4)
print(arr2d[[0,2],[1,3]])  # [1 11]

48. How to perform masking and conditional replacement?

Answer:

arr = np.array([1,2,3,4,5])
arr[arr % 2 == 0] = 0  # Replace even numbers with 0
print(arr)  # [1 0 3 0 5]

# Using np.where
arr = np.array([1,2,3,4,5])
new_arr = np.where(arr % 2 == 0, -1, arr)
print(new_arr)  # [1 -1 3 -1 5]

49. How to perform cumulative operations along an axis?

Answer:

arr = np.array([[1,2,3],[4,5,6]])

print(np.cumsum(arr, axis=0))  # Sum along columns
# [[1 2 3]
#  [5 7 9]]

print(np.cumprod(arr, axis=1)) # Product along rows
# [[1 2 6]
#  [4 20 120]]

50. How to perform linear algebra operations: inverse, pseudo-inverse, solving linear equations?

Answer:

A = np.array([[3,1],[1,2]])
B = np.array([9,8])

# Solve Ax = B
x = np.linalg.solve(A,B)
print(x)  # [2. 3.]

# Pseudo-inverse for non-square matrices
A = np.array([[1,2,3],[4,5,6]])
pinv = np.linalg.pinv(A)
print(pinv)
  • Used in regression, machine learning, and signal processing.


51. How to perform FFT (Fast Fourier Transform)?

Answer:

arr = np.array([1,2,3,4])
fft_arr = np.fft.fft(arr)
ifft_arr = np.fft.ifft(fft_arr)

print(fft_arr)   # Frequency domain
print(ifft_arr)  # Back to time domain
  • Useful in signal processing, image processing, and audio analysis.


52. How to handle structured arrays in NumPy?

Answer:

  • Structured arrays store heterogeneous data like a table.

data = np.array([(1, 'Alice', 3.5),
                 (2, 'Bob', 4.0)],
                dtype=[('id', 'i4'), ('name', 'U10'), ('gpa', 'f4')])

print(data['name'])  # ['Alice' 'Bob']
print(data['gpa'])   # [3.5 4.0]
  • Used in datasets where each column has a different type.


53. How to improve performance in NumPy?

Answer:

  1. Use vectorized operations instead of Python loops.

  2. Use np.dot / np.matmul for matrix multiplication.

  3. Use np.ravel() instead of flatten() if modification is not needed.

  4. Avoid frequent array resizing; preallocate arrays.

  5. Use numexpr or Cython for heavy computations.


54. How to generate random numbers with a fixed seed?

Answer:

np.random.seed(42)
arr = np.random.rand(3)
print(arr)  # Always same values each run
  • Important for reproducibility in ML experiments.


55. How to compute pairwise distance between points?

Answer:

points = np.array([[1,2],[4,6]])
dist = np.sqrt(np.sum((points[0] - points[1])**2))
print(dist)  # 5.0
  • Can be extended for multiple points using broadcasting.

Experienced Interview Questions

 

1. What is NumPy and why is it used?

Answer:
NumPy (Numerical Python) is a Python library used for numerical computing. It provides support for:

  • Multi-dimensional arrays (ndarray)

  • Mathematical operations on arrays

  • Linear algebra, Fourier transforms, and random number generation

Why use NumPy:

  • High-performance array operations compared to Python lists

  • Efficient memory storage

  • Integration with C/C++/Fortran code

  • Basis for libraries like Pandas, SciPy, and scikit-learn

Example:

import numpy as np
arr = np.array([1, 2, 3])
print(arr * 2)  # Output: [2 4 6]

2. What is an ndarray?

Answer:
ndarray is NumPy’s core data structure, representing a multidimensional, homogeneous array of fixed-size items.

Key attributes:

  • ndarray.ndim → number of dimensions

  • ndarray.shape → shape of the array

  • ndarray.size → total elements

  • ndarray.dtype → data type of elements

Example:

arr = np.array([[1,2,3],[4,5,6]])
print(arr.ndim)   # 2
print(arr.shape)  # (2,3)
print(arr.size)   # 6
print(arr.dtype)  # int64

3. How is NumPy different from Python lists?

Feature NumPy Array Python List
Storage Homogeneous, fixed type Heterogeneous
Performance Faster (vectorized operations) Slower (loops)
Memory Less memory More memory
Functionality Many mathematical functions Limited

Example:

import numpy as np
import time

arr = np.arange(1000000)
start = time.time()
arr2 = arr * 2
print("NumPy:", time.time() - start)

lst = list(range(1000000))
start = time.time()
lst2 = [x*2 for x in lst]
print("List:", time.time() - start)

4. Explain broadcasting in NumPy.

Answer:
Broadcasting allows operations on arrays of different shapes without making copies of data. NumPy automatically expands smaller arrays to match the shape of larger arrays.

Rules:

  1. If arrays have different dimensions, prepend 1 to the smaller shape.

  2. Arrays are compatible if dimensions are equal or one is 1.

  3. Arrays are broadcasted to the maximum dimension.

Example:

a = np.array([[1,2,3],[4,5,6]])
b = np.array([10,20,30])
print(a + b)
# Output:
# [[11 22 33]
#  [14 25 36]]

5. How can you create NumPy arrays?

Answer:
NumPy arrays can be created using:

  • np.array() → from Python lists/tuples

  • np.arange(start, stop, step) → like range()

  • np.zeros(shape) → array of zeros

  • np.ones(shape) → array of ones

  • np.empty(shape) → uninitialized array

  • np.linspace(start, stop, num) → evenly spaced numbers

Example:

arr1 = np.zeros((2,3))
arr2 = np.ones((3,3))
arr3 = np.linspace(0, 1, 5)

6. What are views and copies in NumPy?

Answer:

  • View: A new array object that shares data with the original array. Changes affect both arrays.

  • Copy: A new array object with its own data. Changes do not affect the original array.

Example:

a = np.array([1,2,3])
b = a.view()
c = a.copy()
b[0] = 10
c[1] = 20
print(a)  # [10 2 3] -> affected by view
print(c)  # [1 20 3] -> independent

7. Explain advanced indexing in NumPy.

Answer:

  • Integer indexing: Access elements using integers

  • Boolean indexing: Use a boolean array to filter elements

  • Fancy indexing: Index arrays using arrays of integers

Example:

a = np.array([10,20,30,40])
index = [0,2]
print(a[index])   # [10 30]

mask = a > 20
print(a[mask])    # [30 40]

8. How can you perform linear algebra in NumPy?

Answer:
NumPy provides numpy.linalg module for:

  • Matrix multiplication: np.dot(a,b) or @

  • Determinant: np.linalg.det()

  • Inverse: np.linalg.inv()

  • Eigenvalues: np.linalg.eig()

Example:

A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
C = np.dot(A,B)
inv_A = np.linalg.inv(A)
det_A = np.linalg.det(A)

9. What are common functions for array manipulation?

Answer:

  • reshape(), flatten(), ravel() → shape changes

  • concatenate(), vstack(), hstack() → join arrays

  • split(), hsplit(), vsplit() → split arrays

  • transpose() → change axes

Example:

a = np.arange(6).reshape(2,3)
print(a.flatten())     # [0 1 2 3 4 5]
print(np.hstack([a,a])) # horizontal stack

10. How do you handle NaN and Inf in NumPy?

Answer:
NumPy has np.nan, np.inf and functions:

  • np.isnan() → check NaN

  • np.isinf() → check infinity

  • np.nan_to_num() → replace NaN/Inf

  • np.nanmean(), np.nansum() → ignore NaN in computations

Example:

arr = np.array([1, np.nan, 3, np.inf])
print(np.isnan(arr))         # [False  True False False]
print(np.nan_to_num(arr))    # [1. 0. 3. inf]

11. What is the difference between ravel() and flatten()?

Feature ravel() flatten()
Return View if possible Copy always
Memory Efficient Takes more memory
Modifying original Yes No

Example:

a = np.array([[1,2],[3,4]])
b = a.ravel()
c = a.flatten()
b[0] = 10
print(a)  # [[10 2] [3 4]]
c[1] = 20
print(a)  # [[10 2] [3 4]] -> unaffected

12. How can you generate random numbers in NumPy?

Answer:
Use numpy.random module:

  • np.random.rand(d0,d1) → uniform [0,1)

  • np.random.randn(d0,d1) → normal distribution

  • np.random.randint(low, high, size) → random integers

  • np.random.choice(array, size) → random sampling

Example:

np.random.seed(0)  # reproducible
print(np.random.rand(2,3))
print(np.random.randint(1,10,5))

13. How can you improve performance using NumPy?

Answer:

  • Use vectorized operations instead of loops

  • Avoid Python lists for large data

  • Use dtype optimization (e.g., float32 vs float64)

  • Leverage numexpr or Cython for heavy computations

Example of vectorization:

a = np.arange(1000000)
# Loop
# for i in range(len(a)): a[i] = a[i]*2
# Vectorized
a = a*2

14. Explain memory layout (C-contiguous vs F-contiguous).

Answer:

  • C-contiguous (row-major): rows are stored consecutively in memory

  • F-contiguous (column-major): columns are stored consecutively in memory

Check memory layout:

a = np.array([[1,2],[3,4]], order='C')
b = np.array([[1,2],[3,4]], order='F')
print(a.flags)  # C_CONTIGUOUS = True
print(b.flags)  # F_CONTIGUOUS = True

15. Explain structured arrays in NumPy.

Answer:
Structured arrays store heterogeneous data like a table (columns with different types).

Example:

data = np.array([(1, 'Alice', 25), (2, 'Bob', 30)],
                dtype=[('id', 'i4'), ('name', 'U10'), ('age', 'i4')])
print(data['name'])  # ['Alice' 'Bob']

16. Advanced Question: Explain einsum in NumPy.

Answer:
numpy.einsum is a powerful function for summations, products, and tensor contractions using Einstein notation. It avoids creating intermediate arrays, improving speed and memory.

Example:

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
# Matrix multiplication using einsum
c = np.einsum('ij,jk->ik', a, b)
print(c)

17. Explain difference between np.dot(), np.matmul(), and @.

Function Use Case
np.dot(a,b) Dot product, 1D & 2D arrays
np.matmul(a,b) Matrix multiplication, 2D or higher
a @ b Python operator equivalent to matmul

18. How to handle large datasets in NumPy efficiently?

  • Use memory-mapped arrays (np.memmap)

  • Use float32 instead of float64 if precision allows

  • Use in-place operations to save memory

  • Avoid unnecessary copies

Example:

fp = np.memmap('data.dat', dtype='float32', mode='w+', shape=(1000,1000))

19. How do you save and load NumPy arrays?

Answer:

  • np.save('filename.npy', array) → save single array

  • np.load('filename.npy') → load array

  • np.savez('filename.npz', a=array1, b=array2) → save multiple arrays

Example:

np.save('arr.npy', a)
b = np.load('arr.npy')

20. Miscellaneous Advanced Questions for 4 Years Experience

  1. How does np.vectorize work?

  2. How to perform masking operations efficiently?

  3. Explain chunking large arrays for memory efficiency.

  4. How to combine structured arrays with views?

  5. Explain NumPy ufuncs and custom ufuncs.

  6. Explain strided tricks using as_strided.


21. What are NumPy ufuncs?

Answer:
A ufunc (universal function) is a vectorized function that operates element-wise on arrays. They are implemented in C for speed.

Examples of ufuncs:

  • np.add, np.subtract, np.multiply, np.divide

  • np.sin, np.cos, np.exp, np.log

Example:

a = np.array([1,2,3])
b = np.array([4,5,6])
print(np.add(a,b))  # [5 7 9]

Custom ufuncs:

from numpy import vectorize
def square(x):
    return x*x
vfunc = np.vectorize(square)
print(vfunc(np.array([1,2,3])))  # [1 4 9]

22. Explain in-place operations in NumPy

Answer:
In-place operations modify the existing array rather than creating a new one. This saves memory and improves speed.

Example:

a = np.array([1,2,3])
a += 5   # in-place addition
print(a) # [6 7 8]

23. Explain the difference between astype and view

Answer:

  • astype(dtype)creates a copy of the array with a new dtype

  • view(dtype)creates a view (shared data) of the array with a different dtype

Example:

a = np.array([1,2,3], dtype=np.int32)
b = a.astype(np.float64)  # copy
c = a.view(np.uint32)      # view

24. How can you perform cumulative operations?

Answer:
NumPy provides cumulative operations:

  • np.cumsum() → cumulative sum

  • np.cumprod() → cumulative product

Example:

arr = np.array([1,2,3,4])
print(np.cumsum(arr))  # [1 3 6 10]
print(np.cumprod(arr)) # [1 2 6 24]

25. Explain memory-mapped files (np.memmap)

Answer:
np.memmap allows you to handle large arrays on disk without loading them entirely into memory.

Example:

fp = np.memmap('data.dat', dtype='float32', mode='w+', shape=(1000,1000))
fp[:10,:10] = 1
fp.flush()  # write to disk

26. Explain as_strided and its use

Answer:
numpy.lib.stride_tricks.as_strided lets you create new views on arrays with custom strides. This is memory-efficient but requires careful use to avoid memory errors.

Example:

from numpy.lib.stride_tricks import as_strided
a = np.arange(10)
strided = as_strided(a, shape=(4,3), strides=(a.itemsize, a.itemsize))
print(strided)

27. How do you perform advanced broadcasting?

Answer:
Advanced broadcasting involves reshaping arrays or adding dummy dimensions to match shapes for operations.

Example:

a = np.array([1,2,3])[:, np.newaxis]  # shape (3,1)
b = np.array([10,20])                 # shape (2,)
print(a + b)
# [[11 21]
#  [12 22]
#  [13 23]]

28. Explain np.einsum for tensor operations

Answer:
einsum allows complex tensor contractions, summations, and multiplications with a single expression. It avoids temporary arrays.

Example:

A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
C = np.einsum('ij,jk->ik', A, B)  # Matrix multiplication

29. How can you optimize NumPy computations for large arrays?

Answer:

  • Use in-place operations

  • Reduce dtype size (float32 vs float64)

  • Avoid Python loops

  • Use vectorized operations

  • Use numexpr or Cython for very large computations


30. How do you perform masked operations?

Answer:
Use boolean indexing or np.ma module for masked arrays.

Example:

a = np.array([1,2,3,4,5])
mask = a > 3
print(a[mask])  # [4 5]

import numpy.ma as ma
masked = ma.masked_greater(a, 3)
print(masked)   # [1 2 3 -- --]

31. Explain np.lib.recfunctions

Answer:
recfunctions helps combine, split, or modify structured arrays. Useful in heterogeneous datasets.

Example:

from numpy.lib import recfunctions as rfn
a = np.array([(1,'Alice'),(2,'Bob')], dtype=[('id','i4'),('name','U10')])
b = np.array([(3,'Charlie')], dtype=a.dtype)
combined = rfn.stack_arrays([a,b], usemask=False)
print(combined)

32. Explain vectorized string operations

Answer:
numpy.char module allows vectorized string operations on arrays of strings.

Example:

arr = np.array(['apple', 'banana'])
print(np.char.upper(arr))  # ['APPLE' 'BANANA']
print(np.char.add(arr, '_fruit')) # ['apple_fruit' 'banana_fruit']

33. Explain difference between ravel() and flatten() in detail

Answer:

Feature ravel flatten
Copy No, returns view Yes, returns copy
Memory Efficient More memory usage
Modifying original array Affects original if view Does not affect original

34. Explain difference between np.copy() and np.deepcopy()

Answer:

  • np.copy() → shallow copy, independent array but shares objects if dtype=object

  • deepcopy() → fully independent including objects in array


35. How to efficiently compute row-wise or column-wise operations?

Answer:
Use axis parameter:

Example:

a = np.array([[1,2,3],[4,5,6]])
print(np.sum(a, axis=0))  # column-wise: [5 7 9]
print(np.sum(a, axis=1))  # row-wise: [6 15]

36. Explain np.tile() vs np.repeat()

Answer:

  • np.tile() → repeats the whole array

  • np.repeat() → repeats each element

Example:

a = np.array([1,2])
print(np.tile(a, 2))   # [1 2 1 2]
print(np.repeat(a, 2)) # [1 1 2 2]

37. How to perform element-wise comparisons efficiently?

Answer:
Use vectorized comparisons:

a = np.array([1,2,3])
b = np.array([2,2,4])
print(a == b)  # [False  True False]
print(a > b)   # [False False False]

38. Explain np.unique() and its return options

Answer:
np.unique() returns unique elements and optionally:

  • return_index → indices in original array

  • return_inverse → map original to unique

  • return_counts → frequency of each unique element

Example:

arr = np.array([1,2,2,3])
uniq, counts = np.unique(arr, return_counts=True)
print(uniq, counts)  # [1 2 3] [1 2 1]

39. Explain np.where() and its uses

Answer:
np.where(condition, x, y) selects x where condition is True and y where False.

Example:

a = np.array([1,2,3,4])
b = np.where(a>2, a*2, a+1)
print(b)  # [2 3 6 8]

40. Explain advanced indexing with multiple arrays

Answer:
Fancy indexing allows selection using multiple integer arrays.

Example:

a = np.arange(9).reshape(3,3)
row = np.array([0,1])
col = np.array([1,2])
print(a[row, col])  # [1 5]

41. How to efficiently compute pairwise distances?

Answer:
Use broadcasting instead of loops:

points = np.array([[0,0],[1,1],[2,2]])
diff = points[:,np.newaxis,:] - points[np.newaxis,:,:]
dist = np.sqrt(np.sum(diff**2, axis=-1))

42. Explain np.add.reduce and np.multiply.reduce

Answer:
These are ufunc reductions for summing or multiplying along an axis:

a = np.array([1,2,3])
print(np.add.reduce(a))  # 6
print(np.multiply.reduce(a))  # 6

43. Explain np.fromfunction

Answer:
Creates an array by applying a function to indices.

a = np.fromfunction(lambda i,j: i+j, (3,3), dtype=int)
print(a)

44. Explain np.meshgrid and its use

Answer:
Generates coordinate matrices from coordinate vectors. Used in plotting and vectorized computations.

x = np.array([1,2])
y = np.array([3,4])
X, Y = np.meshgrid(x, y)

45. Explain structured and record arrays

Answer:
Structured arrays allow heterogeneous data like a table.
Record arrays allow attribute access with arr.fieldname.


46. How do you handle missing values (NaN) in large arrays?

  • Use np.isnan() to detect

  • Use np.nanmean(), np.nansum(), etc. to ignore NaNs

  • Replace NaNs with np.nan_to_num()


47. Explain differences between np.loadtxt and np.genfromtxt

Feature loadtxt genfromtxt
Missing values Cannot handle Can handle
Data type Single Mixed
Speed Faster Slower

48. Explain difference between np.hstack, np.vstack, np.concatenate

  • hstack → stack horizontally

  • vstack → stack vertically

  • concatenate → flexible axis selection


49. How to implement matrix masking efficiently?

Answer:

A = np.array([[1,2],[3,4]])
mask = A > 2
A[mask] = 0

50. Explain np.dot vs np.tensordot

  • np.dot → 1D/2D dot product

  • np.tensordot → generalized contraction along axes

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
np.tensordot(a,b, axes=1)