Skip to main content
  1. Resources/
  2. Study Materials/
  3. Information Technology Engineering/
  4. IT Semester 4/
  5. Fundamentals of Machine Learning (4341603)/

Fundamentals of Machine Learning (4341603) - Summer 2023 Solution

·
Study-Material Solutions Machine-Learning 4341603 2023 Summer
Milav Dabgar
Author
Milav Dabgar
Experienced lecturer in the electrical and electronic manufacturing industry. Skilled in Embedded Systems, Image Processing, Data Science, MATLAB, Python, STM32. Strong education professional with a Master’s degree in Communication Systems Engineering from L.D. College of Engineering - Ahmedabad.
Table of Contents

Question 1(a) [3 marks]
#

Define human learning. List out types of human learning.

Answer:

Human learning is the process by which humans acquire new knowledge, skills, behaviors, or modify existing ones through experience, study, or instruction.

Types of Human Learning:

TypeDescription
Supervised LearningLearning with guidance from teacher/mentor
Unsupervised LearningSelf-directed learning without external guidance
Reinforcement LearningLearning through trial and error with feedback

Mnemonic: “SUR - Supervised, Unsupervised, Reinforcement”

Question 1(b) [4 marks]
#

Differentiate between qualitative data and quantitative data.

Answer:

Table: Qualitative vs Quantitative Data

FeatureQualitative DataQuantitative Data
NatureDescriptive, categoricalNumerical, measurable
AnalysisSubjective interpretationStatistical analysis
ExamplesColors, names, genderHeight, weight, age
RepresentationWords, categoriesNumbers, graphs

Mnemonic: “QUAN-Numbers, QUAL-Words”

Question 1(c) [7 marks]
#

Compare the different types of machine learning.

Answer:

Table: Types of Machine Learning Comparison

TypeTraining DataGoalExamples
SupervisedLabeled dataPredict outcomesClassification, Regression
UnsupervisedUnlabeled dataFind patternsClustering, Association
ReinforcementReward/penaltyMaximize rewardsGaming, Robotics

Key Differences:

  • Supervised: Uses input-output pairs for training
  • Unsupervised: Discovers hidden patterns in data
  • Reinforcement: Learns through interaction with environment

Mnemonic: “SUR-LAP: Supervised-Labeled, Unsupervised-Reveal, Reinforcement-Action”

Question 1(c OR) [7 marks]
#

Define machine learning. Explain any four applications of machine learning in brief.

Answer:

Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed.

Four Applications:

ApplicationDescription
Email Spam DetectionClassifies emails as spam or legitimate
Image RecognitionIdentifies objects in photos
Recommendation SystemsSuggests products/content to users
Medical DiagnosisAssists doctors in disease detection

Mnemonic: “SIRM - Spam, Image, Recommendation, Medical”

Question 2(a) [3 marks]
#

Relate the appropriate data type of following examples.

Answer:

Data Type Classification:

ExampleData Type
Nationality of studentsCategorical (Nominal)
Education status of studentsCategorical (Ordinal)
Height of studentsNumerical (Continuous)

Mnemonic: “NCN - Nominal, Categorical, Numerical”

Question 2(b) [4 marks]
#

Explain data pre-processing in brief.

Answer:

Data pre-processing is the technique of preparing raw data for machine learning algorithms.

Key Steps:

StepPurpose
Data CleaningRemove errors and inconsistencies
Data IntegrationCombine data from multiple sources
Data TransformationConvert data to suitable format
Data ReductionReduce data size while preserving information

Mnemonic: “CITR - Clean, Integrate, Transform, Reduce”

Question 2(c) [7 marks]
#

Show K-fold cross validation in detail.

Answer:

K-fold cross validation is a technique to evaluate model performance by dividing data into K equal parts.

Process:

graph LR
    A[Original Dataset] --> B[Split into K folds]
    B --> C[Use K-1 folds for training]
    C --> D[Use 1 fold for testing]
    D --> E[Repeat K times]
    E --> F[Average results]

Steps:

  • Divide: Split dataset into K equal parts
  • Train: Use K-1 folds for training
  • Test: Use remaining fold for validation
  • Repeat: Perform K iterations
  • Average: Calculate mean performance

Advantages:

  • Reduces overfitting
  • Better use of limited data
  • More reliable performance estimate

Mnemonic: “DTRA - Divide, Train, Repeat, Average”

Question 2(a OR) [3 marks]
#

Define following terms: i) Mean, ii) Outliers, iii) Interquartile range

Answer:

Statistical Terms:

TermDefinition
MeanAverage of all values in dataset
OutliersData points significantly different from others
Interquartile RangeDifference between 75th and 25th percentiles

Mnemonic: “MOI - Mean, Outliers, Interquartile”

Question 2(b OR) [4 marks]
#

Explain structure of confusion matrix.

Answer:

Confusion Matrix Structure:

Predicted
ActualPositiveNegative
PositiveTrue Positive (TP)False Negative (FN)
NegativeFalse Positive (FP)True Negative (TN)

Components:

  • TP: Correctly predicted positive cases
  • TN: Correctly predicted negative cases
  • FP: Incorrectly predicted as positive
  • FN: Incorrectly predicted as negative

Mnemonic: “TTFF - True True, False False”

Question 2(c OR) [7 marks]
#

Prepare short note on feature subset selection.

Answer:

Feature subset selection is the process of selecting relevant features from the original feature set.

Methods:

MethodDescription
Filter MethodsUse statistical measures to rank features
Wrapper MethodsUse ML algorithms to evaluate feature subsets
Embedded MethodsFeature selection during model training

Benefits:

  • Reduced complexity: Fewer features, simpler models
  • Improved performance: Eliminates noise and irrelevant features
  • Faster training: Less computational overhead

Popular Techniques:

  • Chi-square test
  • Recursive Feature Elimination
  • LASSO regularization

Mnemonic: “FWE - Filter, Wrapper, Embedded”

Question 3(a) [3 marks]
#

Give the difference between predictive model and descriptive model.

Answer:

Model Type Comparison:

FeaturePredictive ModelDescriptive Model
PurposeForecast future outcomesUnderstand current patterns
OutputPredictions/classificationsInsights/summaries
ExamplesRegression, classificationClustering, association rules

Mnemonic: “PF-DC: Predictive-Future, Descriptive-Current”

Question 3(b) [4 marks]
#

Discuss the difference between classification and regression.

Answer:

Classification vs Regression:

AspectClassificationRegression
OutputDiscrete categoriesContinuous values
GoalPredict class labelsPredict numerical values
ExamplesSpam detection, image recognitionPrice prediction, temperature
EvaluationAccuracy, precision, recallMSE, RMSE, R-squared

Mnemonic: “CCNM - Classification-Categories, Regression-Numbers”

Question 3(c) [7 marks]
#

Define classification. Illustrate classification learning steps in details.

Answer:

Classification is a supervised learning technique that predicts discrete class labels for input data.

Classification Learning Steps:

graph LR
    A[Data Collection] --> B[Data Preprocessing]
    B --> C[Feature Selection]
    C --> D[Train-Test Split]
    D --> E[Model Training]
    E --> F[Model Evaluation]
    F --> G[Model Deployment]

Detailed Steps:

  • Data Collection: Gather labeled training data
  • Preprocessing: Clean and prepare data
  • Feature Selection: Choose relevant attributes
  • Split Data: Divide into training and testing sets
  • Training: Build model using training data
  • Evaluation: Test model performance
  • Deployment: Use model for predictions

Mnemonic: “DCFSTED - Data, Clean, Features, Split, Train, Evaluate, Deploy”

Question 3(a OR) [3 marks]
#

Give the difference between bagging and boosting.

Answer:

Bagging vs Boosting:

FeatureBaggingBoosting
SamplingBootstrap samplingSequential weighted sampling
TrainingParallel trainingSequential training
FocusReduce varianceReduce bias

Mnemonic: “BPV-BSB: Bagging-Parallel-Variance, Boosting-Sequential-Bias”

Question 3(b OR) [4 marks]
#

Explain different types of logistic regression in brief.

Answer:

Types of Logistic Regression:

TypeClassesUse Case
Binary2 classesYes/No, Pass/Fail
Multinomial3+ classes (unordered)Color classification
Ordinal3+ classes (ordered)Rating scales

Mnemonic: “BMO - Binary, Multinomial, Ordinal”

Question 3(c OR) [7 marks]
#

Write and show the use of k-NN algorithms.

Answer:

K-Nearest Neighbors (k-NN) is a lazy learning algorithm that classifies data points based on the majority class of k nearest neighbors.

Algorithm:

12345.....CCSFFAhaeoosollrrsoceisuccrgeltlenaagvtksrcaesellnisaudefsseiaiissrco/oteanvfast:antilkcoauennvee:etirtogmaohagabjetlooelrrossiftttrykpaoininoenititnegghbpoorisnts

Distance Calculation:

  • Euclidean Distance: √[(x₁-x₂)² + (y₁-y₂)²]

Applications:

  • Recommendation systems: Similar user preferences
  • Image recognition: Pattern matching
  • Medical diagnosis: Symptom similarity

Advantages:

  • Simple to implement
  • No training required
  • Works well with small datasets

Mnemonic: “CDSA - Choose, Distance, Select, Assign”

Question 4(a) [3 marks]
#

List out applications of support vector machine.

Answer:

SVM Applications:

ApplicationDomain
Text ClassificationDocument categorization
Image RecognitionFace detection
BioinformaticsGene classification

Mnemonic: “TIB - Text, Image, Bio”

Question 4(b) [4 marks]
#

Create pseudo code for k-means algorithm.

Answer:

K-means Pseudo Code:

BEGIN K-means
1. Initialize k cluster centroids randomly
2. REPEAT
   a. Assign each point to nearest centroid
   b. Update centroids as mean of assigned points
   c. Calculate total within-cluster sum of squares
3. UNTIL convergence or max iterations
4. RETURN final clusters and centroids
END

Mnemonic: “IAUC - Initialize, Assign, Update, Check”

Question 4(c) [7 marks]
#

Write and explain applications of unsupervised learning.

Answer:

Unsupervised learning discovers hidden patterns in data without labeled examples.

Major Applications:

ApplicationDescriptionExample
Customer SegmentationGroup customers by behaviorMarket research
Anomaly DetectionIdentify unusual patternsFraud detection
Data CompressionReduce dimensionalityImage compression
Association RulesFind item relationshipsMarket basket analysis

Clustering Applications:

  • Market research: Customer grouping
  • Social network analysis: Community detection
  • Gene sequencing: Biological classification

Dimensionality Reduction:

  • Visualization: High-dimensional data plotting
  • Feature extraction: Noise reduction

Mnemonic: “CADA - Customer, Anomaly, Data, Association”

Question 4(a OR) [3 marks]
#

List out applications of regression.

Answer:

Regression Applications:

ApplicationPurpose
Stock Price PredictionFinancial forecasting
Sales ForecastingBusiness planning
Medical DiagnosisRisk assessment

Mnemonic: “SSM - Stock, Sales, Medical”

Question 4(b OR) [4 marks]
#

Define following terms: i) Support ii) Confidence

Answer:

Association Rule Terms:

TermDefinitionFormula
SupportFrequency of itemset in databaseSupport(A) =
ConfidenceConditional probability of ruleConfidence(A→B) = Support(A∪B) / Support(A)

Example:

  • If 30% transactions contain bread and milk: Support = 0.3
  • If 80% of bread buyers also buy milk: Confidence = 0.8

Mnemonic: “SF-CP: Support-Frequency, Confidence-Probability”

Question 4(c OR) [7 marks]
#

Explain apriori algorithm in detail.

Answer:

Apriori algorithm finds frequent itemsets in transactional data using the apriori property.

Algorithm Steps:

graph LR
    A[Find frequent 1-itemsets] --> B[Generate candidate 2-itemsets]
    B --> C[Prune using apriori property]
    C --> D[Count support in database]
    D --> E[Find frequent k-itemsets]
    E --> F{More candidates?}
    F -->|Yes| B
    F -->|No| G[Generate rules]

Apriori Property:

  • If an itemset is frequent, all its subsets are frequent
  • If an itemset is infrequent, all its supersets are infrequent

Steps:

  1. Scan database: Count 1-item support
  2. Generate candidates: Create k+1 itemsets from frequent k-itemsets
  3. Prune: Remove candidates with infrequent subsets
  4. Count support: Scan database for candidate frequencies
  5. Repeat: Until no new frequent itemsets found

Applications:

  • Market basket analysis
  • Web usage patterns
  • Protein sequences

Mnemonic: “SGPCR - Scan, Generate, Prune, Count, Repeat”

Question 5(a) [3 marks]
#

List out the major features of matplotlib.

Answer:

Matplotlib Features:

FeatureDescription
Multiple Plot TypesLine, bar, scatter, histogram
CustomizationColors, styles, labels
Export OptionsPNG, PDF, SVG formats

Mnemonic: “MCE - Multiple, Customization, Export”

Question 5(b) [4 marks]
#

How to load iris dataset in Numpy program? Explain.

Answer:

Loading Iris Dataset in NumPy:

import numpy as np
from sklearn.datasets import load_iris

# Load iris dataset
iris = load_iris()
data = iris.data    # Features
target = iris.target # Labels

Steps:

  • Import: Import required libraries
  • Load: Use sklearn’s load_iris() function
  • Extract: Get features and target arrays
  • Access: Use .data and .target attributes

Mnemonic: “ILEA - Import, Load, Extract, Access”

Question 5(c) [7 marks]
#

Explain features and applications of Pandas.

Answer:

Pandas is a powerful data manipulation and analysis library for Python.

Key Features:

FeatureDescription
DataFrame2D labeled data structure
Series1D labeled array
Data I/ORead/write various file formats
Data CleaningHandle missing values
GroupingGroup and aggregate operations

Applications:

ApplicationUse Case
Data AnalysisStatistical analysis
Data CleaningPreprocessing for ML
Financial AnalysisStock market data
Web ScrapingParse HTML tables

Common Operations:

  • Reading data: pd.read_csv(), pd.read_excel()
  • Filtering: df[df[‘column’] > value]
  • Grouping: df.groupby(‘column’).mean()

Mnemonic: “DSDCG - DataFrame, Series, Data I/O, Cleaning, Grouping”

Question 5(a OR) [3 marks]
#

List out the applications of matplotlib.

Answer:

Matplotlib Applications:

ApplicationPurpose
Scientific VisualizationResearch data plotting
Business AnalyticsDashboard creation
Educational ContentTeaching materials

Mnemonic: “SBE - Scientific, Business, Educational”

Question 5(b OR) [4 marks]
#

Develop and explain the steps to import csv file in Pandas.

Answer:

Steps to Import CSV in Pandas:

import pandas as pd

# Step 1: Import pandas library
# Step 2: Use read_csv() function
df = pd.read_csv('filename.csv')

# Optional parameters
df = pd.read_csv('file.csv', 
                 header=0,     # First row as header
                 sep=',',      # Comma separator
                 index_col=0)  # First column as index

Process:

  • Import: Import pandas library
  • Read: Use pd.read_csv() function
  • Specify: Add file path and parameters
  • Store: Assign to DataFrame variable

Mnemonic: “IRSS - Import, Read, Specify, Store”

Question 5(c OR) [7 marks]
#

Explain features and applications of Scikit-Learn.

Answer:

Scikit-Learn is a comprehensive machine learning library for Python.

Key Features:

FeatureDescription
AlgorithmsClassification, regression, clustering
PreprocessingData scaling and transformation
Model SelectionCross-validation and grid search
MetricsPerformance evaluation tools

Applications:

DomainUse Case
HealthcareDisease prediction
FinanceCredit scoring
MarketingCustomer segmentation
TechnologyRecommendation systems

Algorithm Categories:

  • Supervised: SVM, Random Forest, Linear Regression
  • Unsupervised: K-means, DBSCAN, PCA
  • Ensemble: Bagging, Boosting

Workflow:

  1. Data preparation: Preprocessing
  2. Model selection: Choose algorithm
  3. Training: Fit model to data
  4. Evaluation: Assess performance
  5. Prediction: Make forecasts

Mnemonic: “APME - Algorithms, Preprocessing, Metrics, Evaluation”

Related

Electronic Circuits & Networks (4331101) - Summer 2023 Solution
Study-Material Solutions Electronic-Circuits Networks 4331101 2023 Summer
Linear Integrated Circuit (4341105) - Summer 2023 Solution
18 mins
Study-Material Solutions Linear-Integrated-Circuit 4341105 2023 Summer
Digital Communication (4341102) - Summer 2023 Solution
20 mins
Study-Material Solutions Digital-Communication 4341102 2023 Summer
Object Oriented Programming With Java (4341602) - Summer 2023 Solution
Study-Material Solutions Java Oop 4341602 2023 Summer
Essentials of Digital Marketing (4341601) - Summer 2023 Solution
Study-Material Solutions Digital-Marketing 4341601 2023 Summer
Microprocessor and Microcontroller (4341101) - Summer 2023 Solution
23 mins
Study-Material Solutions Microprocessor 4341101 2023 Summer