Top Interview Questions
Natural Language Processing (NLP) is a crucial subfield of artificial intelligence (AI) that focuses on the interaction between computers and humans through natural language. The ultimate goal of NLP is to enable computers to understand, interpret, and generate human language in a way that is both meaningful and useful. It combines elements from linguistics, computer science, and machine learning to process and analyze large amounts of natural language data. NLP has become increasingly important in today’s digital world, driving innovations in search engines, virtual assistants, sentiment analysis, machine translation, and more.
NLP involves several critical components that work together to process language effectively:
Syntax: Syntax refers to the grammatical structure of a sentence. NLP systems analyze syntax to understand the relationships between words, phrases, and clauses. Techniques such as parsing (breaking down a sentence into its components) help in identifying parts of speech like nouns, verbs, and adjectives.
Semantics: Semantics focuses on the meaning of words and sentences. It deals with understanding context, disambiguating word meanings, and interpreting the intended message. For instance, the word “bank” could refer to a financial institution or the side of a river. Semantic analysis helps resolve such ambiguities.
Pragmatics: Pragmatics deals with understanding language in context. It interprets meaning based on social norms, intentions, and the situational context. For example, the phrase “Can you pass the salt?” is usually a request, not a literal question about ability.
Morphology: Morphology studies the structure of words and their meaningful components, called morphemes. NLP systems analyze how words are formed and modified (e.g., “running” is derived from the root word “run”).
Discourse: Discourse analysis examines the structure and meaning of longer texts beyond individual sentences. It helps NLP systems understand the flow of conversation, coherence, and overall intent.
Phonology and Phonetics: Though more relevant to speech recognition, these aspects deal with the sounds of language and their patterns. Understanding phonetics is crucial for converting spoken language into text accurately.
NLP has a wide range of applications across industries. Here are some prominent use cases:
Machine Translation: NLP powers translation services like Google Translate. By analyzing the grammar and semantics of the source language, the system can generate accurate translations in the target language. Advanced models use neural networks and transformers to improve translation quality.
Sentiment Analysis: Businesses use NLP to analyze customer feedback, reviews, and social media posts to understand public sentiment. By detecting positive, negative, or neutral sentiments, organizations can make data-driven decisions for marketing, product development, and customer service.
Chatbots and Virtual Assistants: Virtual assistants like Siri, Alexa, and Google Assistant rely on NLP to understand spoken commands and provide meaningful responses. Chatbots use NLP for customer service, enabling automated interaction with users in natural language.
Information Retrieval: Search engines like Google utilize NLP to understand queries and retrieve relevant documents. Techniques like keyword extraction, query expansion, and semantic search improve the accuracy of search results.
Text Summarization: NLP helps summarize large texts into concise versions while retaining the essential information. Automatic summarization is valuable for news aggregation, research papers, and content curation.
Speech Recognition and Generation: NLP plays a vital role in converting speech to text and vice versa. This technology is used in transcription services, voice-controlled devices, and accessibility tools for individuals with disabilities.
Spam Detection and Filtering: Email services employ NLP techniques to detect spam, phishing attempts, and malicious content by analyzing text patterns and language characteristics.
Named Entity Recognition (NER): NER identifies entities such as names of people, locations, dates, and organizations within text. It is crucial in applications like information extraction, knowledge graphs, and question-answering systems.
NLP has evolved from rule-based approaches to more advanced statistical and deep learning methods:
Rule-Based NLP: Early NLP systems relied on handcrafted rules and linguistic expertise. These systems used dictionaries, grammar rules, and patterns to analyze text. While accurate in controlled domains, rule-based methods struggled with ambiguity and large-scale text.
Statistical NLP: With the rise of machine learning, statistical methods emerged. Algorithms like Hidden Markov Models (HMM), Naive Bayes classifiers, and Conditional Random Fields (CRF) were used to model language probabilistically. These approaches leveraged large corpora to learn patterns and improve performance.
Vector Space Models and Word Embeddings: Techniques like Word2Vec, GloVe, and FastText represented words as vectors in continuous space, capturing semantic similarities. For example, the vectors for “king” and “queen” are closely related in embedding space. Word embeddings revolutionized NLP by enabling machines to understand word relationships.
Deep Learning and Neural Networks: Neural networks, especially recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, improved sequence modeling in NLP tasks such as translation, summarization, and text generation. Transformers, introduced in models like BERT, GPT, and T5, further enhanced NLP by capturing long-range dependencies and context effectively.
Transfer Learning in NLP: Pretrained language models, such as GPT, BERT, and RoBERTa, have transformed NLP by enabling fine-tuning on specific tasks with limited data. These models learn contextual representations from massive text corpora and achieve state-of-the-art results across various applications.
Attention Mechanism: Attention mechanisms allow models to focus on relevant parts of the input text when generating output. This is especially important in translation, question answering, and summarization tasks, enabling models to capture context more effectively.
Despite significant advancements, NLP faces several challenges:
Ambiguity: Words and sentences often have multiple meanings. Resolving ambiguity requires understanding context, which is not always straightforward for machines.
Context Understanding: Humans rely on shared knowledge and experience to interpret language. NLP models may struggle with sarcasm, idioms, and figurative language.
Multilingual Processing: Processing multiple languages with different syntax, grammar, and semantics remains a challenge. Low-resource languages often lack sufficient training data for NLP models.
Data Quality: NLP models rely heavily on high-quality text corpora. Poor-quality data, noise, and biases in training data can affect performance and fairness.
Real-Time Processing: Applications like chatbots and voice assistants require real-time language understanding and generation, demanding highly efficient NLP models.
The future of NLP is promising, with advancements in AI making human-computer interaction more natural. Some anticipated trends include:
Improved Conversational AI: NLP will continue to enhance chatbots, virtual assistants, and customer support systems, making interactions more human-like.
Multimodal NLP: Integrating NLP with computer vision and audio processing will allow systems to understand language in conjunction with images, videos, and sounds.
Low-Resource Language Support: NLP research is expanding to support underrepresented languages, making AI tools more inclusive globally.
Explainable NLP: As AI models become more complex, explainability will be crucial for understanding model decisions, especially in critical domains like healthcare and law.
Ethical NLP: Addressing biases and ensuring fairness in NLP applications will be a priority, ensuring AI systems serve society responsibly.
Answer:
Natural Language Processing (NLP) is a branch of artificial intelligence that enables computers to understand, interpret, and generate human language. It bridges the gap between human communication and computer understanding.
Example: Chatbots, Google Translate, sentiment analysis, and virtual assistants like Siri or Alexa use NLP.
Answer:
NLP tasks can be broadly categorized as:
Text Analysis: Tokenization, POS tagging, parsing.
Text Classification: Spam detection, sentiment analysis.
Named Entity Recognition (NER): Identifying names, dates, locations.
Machine Translation: Translating text from one language to another.
Question Answering & Chatbots: Answering user queries automatically.
Summarization: Generating summaries from large texts.
Speech Recognition & Generation: Converting speech to text and vice versa.
Answer:
Ambiguity: Words or sentences can have multiple meanings.
Context Understanding: Understanding the meaning of words in different contexts.
Idioms and Slang: Hard to interpret phrases and informal language.
Multilingual Support: Handling multiple languages is complex.
Sarcasm Detection: Hard for machines to detect tone or sarcasm.
Answer:
Tokenization is the process of breaking down text into smaller units called tokens (words, sentences, or subwords).
Word Tokenization: Splits text into individual words.
Example: "I love NLP" → ["I", "love", "NLP"]
Sentence Tokenization: Splits text into sentences.
Example: "I love NLP. It is fun." → ["I love NLP.", "It is fun."]
Use: Tokenization is the first step in almost all NLP tasks.
Answer:
Stop words are common words in a language that do not add significant meaning to text, such as “is”, “the”, “and”.
Example:
Original: "I love natural language processing"
After removing stop words: "love natural language processing"
Use: Reduces the size of data and focuses on important words.
Answer:
Stemming: Reduces a word to its root form by chopping off prefixes/suffixes.
Example: "running", "runner" → "run"
Tools: Porter Stemmer, Snowball Stemmer
Lemmatization: Reduces a word to its dictionary form using vocabulary and POS tagging.
Example: "better" → "good", "running" → "run"
Difference: Lemmatization is more accurate; stemming is faster but less precise.
Answer:
POS (Part-of-Speech) tagging assigns grammatical categories (noun, verb, adjective, etc.) to each word in a sentence.
Example:
Sentence: "I love NLP"
POS tags: [('I', 'PRP'), ('love', 'VBP'), ('NLP', 'NNP')]
Use: Useful in syntax parsing, sentiment analysis, and information extraction.
Answer:
NER identifies and classifies entities in text into predefined categories like Person, Organization, Location, Date, Time, etc.
Example:
Sentence: "Barack Obama was born in Hawaii in 1961"
NER Result: {"Barack Obama": "Person", "Hawaii": "Location", "1961": "Date"}
Use: Key in information extraction, chatbots, and search engines.
Answer:
BoW is a feature extraction technique in NLP that represents text as a collection of words, ignoring grammar and order, but keeping frequency.
Example:
Text 1: "I love NLP"
Text 2: "I love AI"
Vocabulary: ["I", "love", "NLP", "AI"]
BoW vectors:
Text1 → [1, 1, 1, 0]
Text2 → [1, 1, 0, 1]
Use: Text classification, spam detection.
Answer:
TF-IDF (Term Frequency-Inverse Document Frequency) weighs words based on their importance in a document relative to a corpus.
TF: Frequency of a term in a document.
IDF: Reduces weight of common words across documents.
Use: Feature extraction for text mining, search engines, and recommendations.
Answer:
Word embeddings are dense vector representations of words capturing semantic meaning. Unlike BoW, embeddings consider context and similarity.
Popular Models:
Word2Vec: Continuous Bag of Words (CBOW) and Skip-Gram
GloVe: Global Vectors for Word Representation
FastText: Handles subword information
Example:
vector("king") - vector("man") + vector("woman") ≈ vector("queen")
Answer:
Sentiment analysis determines the emotional tone of text: positive, negative, or neutral.
Example:
"I love this product!" → Positive
"This is the worst experience." → Negative
Use: Product reviews, social media monitoring, customer feedback.
Answer:
N-grams are contiguous sequences of n items (words/characters) in text.
Unigram: 1 word → "I love NLP" → ["I", "love", "NLP"]
Bigram: 2 words → "I love NLP" → [("I love"), ("love NLP")]
Trigram: 3 words → [("I love NLP")]
Use: Language modeling, predictive text, spelling correction.
Answer:
WSD determines the correct meaning of a word in context.
Example:
"I went to the bank to deposit money" → bank = financial institution
"The fisherman sat on the bank of the river" → bank = river edge
Use: Machine translation, information retrieval.
Answer:
Seq2Seq models transform a sequence in one domain to another. Widely used in machine translation, chatbots, summarization.
Architecture:
Encoder: Converts input to a fixed-length vector
Decoder: Generates output from the vector
Popular Variants: LSTM, GRU, Transformer.
Answer:
Attention allows models to focus on important words in a sequence rather than treating all equally.
Example: In translation:
Input: "I love NLP"
Output: "J'aime le NLP"
Attention helps the model know which input words to emphasize when generating each output word.
Use: Transformers, BERT, GPT, machine translation.
Answer:
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained model that understands context from both left and right of a word in a sentence.
Use: Question answering, NER, sentiment analysis.
| NLP | NLU |
|---|---|
| Deals with processing human language. | Focuses on understanding meaning. |
| Includes tasks like tokenization, POS tagging, and parsing. | Includes tasks like intent recognition and entity extraction. |
| Example: Text preprocessing | Example: Chatbot understanding queries |
Python Libraries: NLTK, SpaCy, TextBlob, Gensim
Deep Learning Frameworks: TensorFlow, PyTorch, Hugging Face Transformers
Other Tools: Stanford NLP, OpenNLP
Answer:
Use subword embeddings (FastText)
Use character-level embeddings
Apply unknown token <UNK> for rare words
Use contextual embeddings like BERT that can handle unseen words better
Answer:
Language Modeling predicts the probability of a sequence of words. It helps machines understand the structure of language.
Example:
Input: "I love"
Predicted next word: "NLP"
Use: Text prediction, autocomplete, speech recognition.
Types of Language Models:
Statistical Language Models: N-grams
Neural Language Models: RNN, LSTM, Transformer
Answer:
Cosine similarity measures the similarity between two vectors by calculating the cosine of the angle between them.
Formula:
[
\text{Cosine Similarity} = \frac{A \cdot B}{||A|| \times ||B||}
]
Use: Document similarity, clustering, information retrieval.
Example:
"I love NLP" and "I enjoy NLP" → High similarity
"I love NLP" and "The sky is blue" → Low similarity
Answer:
Topic modeling is an unsupervised technique that identifies hidden topics in a collection of documents.
Popular Methods:
LDA (Latent Dirichlet Allocation)
NMF (Non-negative Matrix Factorization)
Use: Content categorization, recommendation systems, trend analysis.
Answer:
| Feature | Stemming | Lemmatization |
|---|---|---|
| Output | Root form | Dictionary form |
| Accuracy | Less accurate | More accurate |
| Example | "running" → "run" |
"better" → "good" |
| Library | NLTK | SpaCy, NLTK |
Answer:
Dependency parsing analyzes grammatical structure and establishes relationships between “head” words and dependent words.
Example:
Sentence: "She loves NLP"
loves → root
She → subject
NLP → object
Use: Question answering, information extraction, machine translation.
Answer:
Chunking groups words into meaningful phrases (noun phrases, verb phrases) using POS tags.
Example:
Sentence: "The quick brown fox"
Chunked → [The quick brown fox] → Noun Phrase (NP)
Use: Information extraction, named entity recognition.
Answer:
Regex is a tool to match and manipulate text patterns. Used in text cleaning, tokenization, and extraction.
Example:
Extract email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,}/
Find digits: \d+
Use: Data preprocessing, pattern-based searches.
Answer:
Embeddings are dense vector representations that capture semantic meaning.
Types:
Word2Vec – CBOW, Skip-gram
GloVe – Co-occurrence statistics
FastText – Subword-level embeddings
Contextual embeddings – BERT, GPT, RoBERTa
Use: Sentiment analysis, recommendation, text classification.
Answer:
Transformers use attention mechanisms instead of sequential RNNs to process text. They allow parallelization and capture long-range dependencies.
Key Components:
Encoder: Processes input sequence
Decoder: Generates output sequence
Self-Attention: Captures relationships between words
Popular Models: BERT, GPT, T5
| Feature | BERT | GPT |
|---|---|---|
| Training | Masked language modeling | Causal language modeling |
| Direction | Bidirectional | Left-to-right |
| Use case | NLU tasks | NLG tasks (text generation) |
| Example | Sentiment analysis, QA | Chatbots, story generation |
Answer:
Use <UNK> token for unknown words
Use subword embeddings (FastText)
Use character-level embeddings
Use contextual embeddings (BERT, GPT)
Answer:
Text classification is assigning predefined categories to text.
Example:
Spam detection: ["This is spam", "Hello friend"]
Sentiment: ["I love this", "I hate this"]
Techniques:
Naive Bayes, SVM, Logistic Regression, Deep Learning (CNN, LSTM, Transformers)
Answer:
A chatbot is a system that interacts with users using natural language.
Types:
Rule-Based Chatbots: Predefined responses
AI-Based Chatbots: Use NLP & ML for understanding context
Example: Siri, Google Assistant
Answer:
Text summarization condenses a long document into a short summary while retaining meaning.
Types:
Extractive: Picks key sentences
Abstractive: Generates new sentences
Use: News aggregation, reports, emails
Answer:
An N-gram model predicts the next word based on the previous n-1 words.
Example:
Unigram: P(word)
Bigram: P(word_n | word_n-1)
Trigram: P(word_n | word_n-2, word_n-1)
Use: Speech recognition, autocomplete, spelling correction
Answer:
| Metric | Cosine Similarity | Euclidean Distance |
|---|---|---|
| Measures | Angle between vectors | Straight-line distance |
| Range | -1 to 1 | 0 to ∞ |
| Use | Semantic similarity | General distance measure |
| Better for | Text data | Numeric embeddings |
Answer:
TF: Number of times a term appears in a document.
IDF: Measures importance across corpus; rare words get higher weight.
TF-IDF formula:
[
\text{TF-IDF} = TF \times \log(\frac{N}{DF})
]
Use: Feature extraction for text classification, search engines
| NLP | Text Mining |
|---|---|
| Focuses on understanding & generating human language | Focuses on extracting useful information from text |
| Uses linguistics & ML techniques | Uses NLP + data mining |
| Example: Sentiment analysis | Example: Trend analysis from articles |
Answer:
WSD determines the correct meaning of a word in context.
Example:
"Bank" → financial institution or river bank?
Use: Machine translation, QA, semantic search
| Term | Description |
|---|---|
| AI | Broad field of simulating human intelligence |
| ML | Subset of AI; systems learn from data |
| NLP | Subset of AI; systems understand/generate human language |
Answer:
Rule-Based NLP: Uses handcrafted linguistic rules. Limited scalability.
Example: Regex-based entity extraction.
Statistical NLP: Uses probabilistic models and frequency-based methods.
Example: N-gram models, HMM for POS tagging.
Neural NLP: Uses deep learning to model complex patterns.
Example: LSTM, GRU, Transformers.
Experience Tip: In real projects, hybrid approaches often perform best.
Answer:
Resampling Techniques: Oversample minority or undersample majority.
Weighted Loss Functions: Apply class weights in loss calculation.
Data Augmentation: Back-translation, synonym replacement.
Focal Loss: Focuses training on hard examples.
Example: Sentiment analysis with 90% neutral reviews and 10% positive/negative reviews.
Answer:
Non-Contextual (Word2Vec, GloVe): A word has a single vector regardless of context.
Example: "bank" has same vector in “river bank” and “financial bank”.
Contextual (BERT, GPT, RoBERTa): Word embeddings depend on surrounding words.
Example: "bank" vectors differ based on sentence context.
Use: Contextual embeddings significantly improve NER, QA, and sentiment analysis.
Answer:
Bidirectional: BERT reads the entire sentence simultaneously, capturing context from both sides.
Transformer Architecture: Uses self-attention instead of sequential processing.
Pretraining & Fine-tuning: Pretrained on large corpora, then fine-tuned for tasks.
Performance: Handles long-range dependencies better than RNNs/LSTMs.
Use Case: Question answering, classification, named entity recognition.
Answer:
Load pretrained model (e.g., BERT, RoBERTa).
Add task-specific layers (classification head, token classifier).
Prepare domain-specific dataset and tokenize.
Fine-tune using small learning rate (2e-5 to 5e-5 typical).
Monitor metrics (F1-score, accuracy).
Example: Legal document classification using FinBERT.
Answer:
Self-Attention: Computes attention scores between all words in a sentence.
Multi-Head Attention: Captures multiple aspects of relationships simultaneously.
Positional Encoding: Adds sequence order information.
Feed-Forward Networks: Process attention outputs.
Encoder-Decoder Architecture:
Encoder: Processes input sequence.
Decoder: Generates output sequence.
Use: Translation, summarization, question answering.
Answer:
Task-specific metrics:
Text Classification: Accuracy, F1-score, Precision, Recall
NER: Precision, Recall, F1-score
Language Generation: BLEU, ROUGE, METEOR
Similarity: Cosine similarity, Spearman correlation
Experience Tip: Always use multiple metrics, especially for imbalanced datasets.
Answer:
Attention: Allows model to focus on important words for generating outputs.
Types:
Bahdanau (Additive) Attention: Scores calculated using neural networks.
Luong (Multiplicative) Attention: Scores computed via dot product.
Self-Attention: Words attend to all other words in the same sequence.
Use: Transformers, summarization, QA tasks.
Answer:
Subword tokenization (BPE, WordPiece).
Character-level embeddings.
Using <UNK> token with fallback logic.
Leveraging contextual models (BERT, GPT) which can handle rare words.
Example: "biodegradability" might not exist in training corpus; subword tokenization splits it intelligently.
Answer:
Encoder: Converts input sequence to a fixed-length vector.
Decoder: Generates output sequence from the vector.
Attention: Allows decoder to reference encoder outputs dynamically.
Use Case: Neural Machine Translation, text summarization.
Answer:
Export model (TorchScript, ONNX, TensorFlow SavedModel).
Use API frameworks (FastAPI, Flask).
Containerize (Docker/Kubernetes).
Monitor latency, throughput, and model drift.
Implement caching for frequent requests.
Example: Chatbot API serving millions of users with low latency.
Answer:
Ambiguity (e.g., "Apple" → company or fruit)
Domain-specific entities (e.g., chemicals, drugs)
Nested entities (entities inside other entities)
Data scarcity for supervised learning
Solution: Transfer learning, weak supervision, or active learning.
Answer:
Use multilingual pretrained models (mBERT, XLM-RoBERTa).
Translate text into a common language.
Train separate models per language (resource-intensive).
Tokenization that handles language-specific morphology.
Answer:
BLEU: Measures n-gram overlap with reference texts.
ROUGE: Measures recall of n-grams or sequences.
METEOR: Considers synonym matches.
Perplexity: Measures uncertainty of model predictions.
Experience Tip: Combine automatic metrics with human evaluation.
Answer:
Quantization: Reduce model size (float32 → int8).
Distillation: Train smaller model to mimic large model.
Pruning: Remove redundant weights.
Batching requests & caching: Reduce inference latency.
Hardware acceleration: Use GPUs/TPUs.
Example: Deploying BERT for real-time chatbots requires optimization.
Answer:
Fine-tune pretrained models on domain-specific data.
Use domain-specific embeddings (BioBERT, FinBERT).
Use data augmentation and transfer learning.
Monitor performance with domain-specific evaluation metrics.
Example: Healthcare NER tasks using BioBERT.
Answer:
Input text is tokenized (WordPiece/BPE).
CLS token embedding represents the sequence.
Pass embedding through feed-forward layers for classification.
Fine-tune on task-specific dataset.
Example: Sentiment analysis, spam detection, intent recognition.
Answer:
Truncation: Limit sequence length.
Sliding windows: Break text into overlapping chunks.
Longformer / BigBird: Use sparse attention for long sequences.
Hierarchical models: Encode paragraphs separately, then aggregate.
Answer:
Align embeddings across languages to map semantically similar words close in vector space.
Techniques: MUSE, LASER, multilingual BERT.
Use Case: Cross-lingual retrieval, translation.
Answer Examples:
Model drift due to changing user behavior in chatbots.
Ambiguity in entity recognition in finance domain.
Latency issues deploying transformer-based models for real-time queries.
Data scarcity for low-resource languages.
Handling informal/slang text from social media.
Answer:
| Model | Key Features | Use Case |
|---|---|---|
| BERT | Bidirectional, Masked LM pretraining | NER, QA, classification |
| RoBERTa | Improved BERT: more data, longer training, dynamic masking | Same as BERT with better performance |
| ALBERT | Parameter reduction via factorized embedding and cross-layer sharing | Memory-efficient for large datasets |
Experience Tip: Use ALBERT or DistilBERT for production where latency is critical.
Answer:
Choose a pretrained model (BERT, SpaCy, Flair).
Fine-tune on domain-specific dataset.
Convert model to optimized format (ONNX/TorchScript).
Deploy as API with batching and caching.
Monitor performance, handle unknown entities with dictionary-based fallback.
Example: Healthcare NER: extracting drug names, symptoms, diseases.
Answer:
Static embeddings: Word has one vector representation. Example: Word2Vec, GloVe.
Contextual embeddings: Word vector changes depending on surrounding context. Example: BERT, ELMo, GPT.
Scenario: “Apple” in “Apple released a new iPhone” vs “I ate an apple.” Contextual embeddings distinguish meaning.
Answer:
Resampling: Oversample minority or undersample majority.
Weighted loss functions: Apply class weights during training.
Data augmentation: Synonym replacement, back translation.
Advanced Loss Functions: Focal loss to focus on difficult examples.
Experience Tip: Evaluate using F1-score instead of accuracy for imbalanced datasets.
Answer:
Long-range dependencies: RNNs/LSTMs struggle with very long sequences.
Exposure bias: During training, model sees ground truth but during inference, it predicts its own output.
OOV words: Model cannot generate unseen words unless using subword techniques.
Attention complexity: Memory-intensive for long sequences.
Solution: Use Transformers, subword tokenization, and teacher forcing during training.
Answer:
Feature-based: Use pretrained embeddings as fixed features for downstream tasks.
Fine-tuning: Update pretrained model weights along with task-specific layers.
Example:
Feature-based: Word2Vec embeddings for sentiment analysis fed into an LSTM.
Fine-tuning: BERT weights updated on a sentiment dataset for better task adaptation.
Answer:
MLM predicts randomly masked words in a sentence using surrounding context.
Example:
Input: "I love [MASK] processing"
Model predicts: "natural language"
Use: Pretraining models like BERT to understand context bidirectionally.
Answer:
Allows the decoder to focus on relevant parts of input sequence at each step.
Reduces information bottleneck caused by encoding long sequences into a single vector.
Improves translation accuracy, summarization, and QA performance.
Example: Translating “I am learning NLP” into French focuses on individual words instead of the whole sentence at once.
| Type | Method | Example |
|---|---|---|
| Extractive | Selects key sentences from text | News summarization by picking top sentences |
| Abstractive | Generates new sentences using language model | GPT-generated summary that paraphrases content |
Use Case: Abstractive summarization is more human-like but harder to train.
Answer:
Truncation: Limit input length.
Sliding Window: Break long text into overlapping chunks.
Sparse Attention Models: Use Longformer, BigBird, Reformer for memory efficiency.
Hierarchical Models: Encode paragraphs separately, then aggregate embeddings.
Scenario: Legal documents or research papers often exceed 512 tokens.
Answer:
Automatic metrics: BLEU, ROUGE, METEOR, perplexity.
Human evaluation: Coherence, fluency, relevance.
Task-specific metrics: Question-answering accuracy, summarization compression ratio.
Tip: Combine automatic metrics with human assessment for reliable evaluation.
Answer:
Cross-lingual NLP: Process multiple languages using shared representations.
Multilingual embeddings: Map semantically similar words from different languages close in vector space.
Techniques: mBERT, XLM-RoBERTa, MUSE.
Use Case: Multilingual chatbots, cross-lingual search, machine translation.
Answer:
Quantization: Convert float32 weights to int8 to reduce memory.
Model distillation: Train smaller model to mimic large model.
Pruning: Remove redundant parameters.
Batching & caching: Reduce inference latency.
Hardware acceleration: Use GPUs/TPUs, mixed precision training.
Scenario: Deploying BERT-based chatbot with <100ms latency.
Answer:
Sarcasm is context-dependent and subtle.
Approaches:
Use context-aware embeddings (BERT, RoBERTa).
Incorporate user history, conversation context.
Fine-tune on sarcasm-labeled datasets.
Combine textual features with sentiment and emoji signals.
Example: "Oh great, another Monday!" → Negative sentiment despite positive wording.
Answer:
Zero-shot: Model predicts labels it has never seen, using natural language descriptions.
Few-shot: Model fine-tunes on a small number of labeled examples.
Example: GPT-3 can classify sentiment without explicit training using instructions (prompting).
Answer:
Fine-tune pretrained models on domain-specific corpus (e.g., BioBERT for medical texts).
Use domain-specific embeddings.
Data augmentation to create domain-relevant examples.
Regularization to prevent overfitting on small domain datasets.
Scenario: Legal document classification using a small labeled dataset.
Answer:
Word-level tokenization: Split text into words (NLTK, SpaCy).
Subword tokenization: Handles OOV words (BPE, WordPiece).
Character-level tokenization: Useful for morphologically rich languages.
Sentencepiece: Combines BPE and unigram for multilingual tokenization.
Use Case: Transformers rely on subword tokenization to manage rare words.
Answer:
Weighted cross-entropy loss or focal loss.
Oversample minority classes or undersample majority classes.
Data augmentation using synonym replacement, back translation.
Ensemble methods to improve performance on rare classes.
Example: Intent recognition in chatbots with rare intents.
Answer:
ELMo generates embeddings based on entire sentence context.
Uses bi-directional LSTM to capture forward and backward context.
Word representation changes dynamically depending on surrounding words.
Use Case: NER, coreference resolution, sentiment analysis.
Answer:
Latency issues with large Transformer models.
Data drift/model drift over time as language changes.
Handling OOV words and slang in user-generated text.
Multilingual support for global products.
Resource constraints (memory, GPU availability).
Monitoring & logging for inference quality and errors.