Question Bank - Fundamentals of Machine Learning and NLP

 Question Bank - Fundamentals of Machine Learning and NLP


Unit 1: Introduction to Machine Learning

  1. A company is analyzing customer data to generate insights, build intelligent systems, and automate predictions. Explain how Data Science (data analysis), AI (intelligence), ML (learning from data), and DL (neural networks) are related in this scenario.
  2. A business uses Netflix recommendations, Google Maps navigation, and fraud detection systems. Justify the need of ML using automation, prediction, big data handling, real-time decisions.
  3. A project involves collecting raw data, cleaning it, building a model, training, testing, and deploying it. Describe the Machine Learning life cycle (data collection → preprocessing → training → testing → deployment).
  4. A company is building a product recommendation system like Amazon. Identify the required skills such as Python, statistics, ML algorithms, data handling, domain knowledge.
  5. In different applications:
    • Predicting marks using past data
    • Grouping customers without labels
    • Training a robot using rewards
      Differentiate using Supervised (labeled), Unsupervised (no labels), Reinforcement (reward-based) learning.
  6. A student marks prediction model performs very well on training data but poorly on test data. Explain using terms features, labels, training, testing, overfitting, underfitting, bias, variance, and suggest solutions.
  7. While building a model, matrices are used for data, probability for uncertainty, statistics for analysis, and calculus for optimization. Explain the role of Linear Algebra, Probability, Statistics, Calculus.

Unit 2: Supervised Machine Learning

  1. A real estate company wants to predict house prices based on size and location. Identify and justify Regression (continuous output) technique.
  2. A dataset shows relationship between house size and price. Explain Simple Linear Regression (y = mx + c, slope, intercept).
  3. A dataset includes size, location, and number of rooms affecting price. Explain Multiple Linear Regression (multiple inputs) and its use.
  4. A dataset shows a curved relationship between variables. Explain when to use Polynomial Regression (non-linear, curve fitting).
  5. A model is evaluated using values like error and accuracy. Explain R², MSE, RMSE and interpret their meaning.
  6. A bank wants to classify customers into defaulters and non-defaulters. Identify and explain Classification (binary output) approach.
  7. A model predicts probability of customer default between 0 and 1. Explain Logistic Regression (sigmoid function, probability interpretation).
  8. A model classifies data based on nearest points using distance. Explain KNN (K value, nearest neighbors, Euclidean distance).
  9. A model separates two classes using a boundary with maximum gap. Explain SVM (hyperplane, margin, support vectors).
  10. A classification model gives results like TP, TN, FP, FN. Evaluate using Accuracy, Precision, Recall, F1-score, Confusion Matrix.

Unit 3: Unsupervised Machine Learning

  1. A company groups customers based on similar buying behavior without labeled data. Explain clustering (grouping, similarity, segmentation).
  2. A retail company wants to divide customers for targeted marketing. Explain the need using segmentation, personalization, and choose K-Means (fast, simple).
  3. While applying K-Means, different values of K are tested and plotted in a graph. Explain Elbow Method (WCSS, optimal K, bend point).
  4. A dataset requires hierarchical grouping and visualization using a tree structure. Compare Hierarchical Clustering (dendrogram) and K-Means (centroid-based).
  5. In clustering, terms like center, distance, and similarity are used. Define cluster, centroid, distance, similarity.
  6. A dataset contains noise and outliers, and clustering should ignore them. Explain DBSCAN (density-based, eps, min_samples, noise handling).
  7. A supermarket analyzes which products are bought together. Explain Association Rule Learning (market basket, relationships).
  8. A model finds frequent itemsets and rules like bread → butter. Explain Apriori (support, confidence, lift).

Unit 4: Natural Language Processing

  1. A system processes human language like text and speech. Explain NLP (text processing, language understanding, AI communication).
  2. Applications like chatbots, translation, and sentiment detection are used in real life. Explain NLP applications (chatbot, sentiment, translation, voice).
  3. A text dataset is cleaned by splitting words, removing stopwords, and reducing words to root form. Explain tokenization, stopword removal, stemming, normalization.
  4. Words like “running” are converted to “run” or meaningful base form. Differentiate stemming vs lemmatization.
  5. Text is converted into numbers using word frequency and importance. Explain BoW (frequency) and TF-IDF (importance).
  6. Customer reviews are classified as positive or negative. Explain sentiment analysis (classification techniques).
  7. A sentence is broken into sequences like single words, pairs, or triples. Explain n-grams (unigram, bigram, trigram).
  8. A model processes sequence data but fails to remember long-term patterns, while another handles long memory. Compare RNN vs LSTM.
  9. A modern model understands context using attention mechanism and pre-trained models. Explain Transformers and BERT (self-attention, context).

 

टिप्पणी पोस्ट करा

0 टिप्पण्या