Skip to main content
  1. Resources/
  2. Study Materials/
  3. Information Technology Engineering/
  4. IT Semester 4/
  5. Fundamentals of Machine Learning (4341603)/

Fundamentals of Machine Learning (4341603) - Summer 2025 Solution

·
Study-Material Solutions Machine-Learning 4341603 2025 Summer
Milav Dabgar
Author
Milav Dabgar
Experienced lecturer in the electrical and electronic manufacturing industry. Skilled in Embedded Systems, Image Processing, Data Science, MATLAB, Python, STM32. Strong education professional with a Master’s degree in Communication Systems Engineering from L.D. College of Engineering - Ahmedabad.
Table of Contents

Question 1(a) [3 marks]
#

Define machine Learning. Give any two applications of machine learning.

Answer:

Machine Learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed for every task.

Applications:

  • Email spam detection: Automatically identifies and filters spam emails
  • Recommendation systems: Suggests products on e-commerce sites like Amazon

Table: ML vs Traditional Programming

Traditional ProgrammingMachine Learning
Input data + Program → OutputInput data + Output → Program
Rules are explicitly codedRules are learned from data

Mnemonic: “ML = Make Learning from data”

Question 1(b) [4 marks]
#

Define: Under fitting and overfitting.

Answer:

Underfitting occurs when a model is too simple to capture underlying patterns in data, resulting in poor performance on both training and test data.

Overfitting occurs when a model learns training data too well, including noise, causing poor performance on new unseen data.

Table: Comparison

AspectUnderfittingOverfitting
Training accuracyLowHigh
Test accuracyLowLow
Model complexityToo simpleToo complex
SolutionIncrease complexityReduce complexity

Mnemonic: “Under = Under-performs, Over = Over-learns”

Question 1(c) [7 marks]
#

Describe different types of machine learning with suitable example.

Answer:

Table: Types of Machine Learning

TypeDescriptionExample
SupervisedUses labeled training dataEmail classification
UnsupervisedNo labeled data, finds patternsCustomer segmentation
ReinforcementLearns through rewards/penaltiesGame playing AI

Supervised Learning uses input-output pairs to train models. The algorithm learns from examples to predict outcomes for new data.

Unsupervised Learning discovers hidden patterns in data without target labels. It groups similar data points together.

Reinforcement Learning trains agents to make decisions by rewarding good actions and penalizing bad ones.

Diagram:

graph TD
    A[Machine Learning] --> B[Supervised Learning]
    A --> C[Unsupervised Learning]
    A --> D[Reinforcement Learning]
    B --> E[Classification]
    B --> F[Regression]
    C --> G[Clustering]
    C --> H[Association Rules]

Mnemonic: “Super Un-supervised Reinforces learning”

Question 1(c) OR [7 marks]
#

Describe different tools and technology used in the field machine learning.

Answer:

Table: ML Tools and Technologies

CategoryToolsPurpose
ProgrammingPython, RCore development
LibrariesScikit-learn, TensorFlowModel building
Data ProcessingPandas, NumPyData manipulation
VisualizationMatplotlib, SeabornData plotting

Python is the most popular language due to its simplicity and extensive libraries.

Scikit-learn provides simple tools for data mining and analysis, perfect for beginners.

TensorFlow and PyTorch are advanced frameworks for deep learning applications.

Jupyter Notebook offers interactive development environment for experimentation.

Diagram:

graph LR
    A[Data] --> B[Pandas/NumPy]
    B --> C[Scikit-learn]
    C --> D[Model]
    D --> E[Matplotlib]

Mnemonic: “Python Pandas Scikit Tensor Jupyter”

Question 2(a) [3 marks]
#

Give the difference between Qualitative data and Quantitative data.

Answer:

Table: Qualitative vs Quantitative Data

Qualitative DataQuantitative Data
Non-numerical categoriesNumerical values
Colors, names, gradesHeight, weight, price
Cannot be measuredCan be measured

Qualitative data describes qualities or characteristics that cannot be measured numerically.

Quantitative data represents measurable quantities expressed as numbers.

Mnemonic: “Quality = Categories, Quantity = Numbers”

Question 2(b) [4 marks]
#

Find the mean and median for the following data: 3,4,5,5,7,8,9,11,12,14.

Answer:

Given data: 3, 4, 5, 5, 7, 8, 9, 11, 12, 14

Mean calculation:

  • Sum = 3+4+5+5+7+8+9+11+12+14 = 78
  • Count = 10 numbers
  • Mean = 78/10 = 7.8

Median calculation:

  • Data is already sorted
  • For 10 numbers: Median = (5th + 6th value)/2
  • Median = (7+8)/2 = 7.5

Table: Results

MeasureValue
Mean7.8
Median7.5

Mnemonic: “Mean = Average, Median = Middle”

Question 2(c) [7 marks]
#

Describe machine learning activities in detail.

Answer:

Table: Machine Learning Activities

ActivityDescriptionExample
Data CollectionGathering relevant dataSurvey responses
Data PreprocessingCleaning and preparing dataRemoving duplicates
Feature SelectionChoosing important variablesAge, income for loans
Model TrainingTeaching algorithm patternsFeeding training data
Model EvaluationTesting model performanceAccuracy measurement

Data Collection involves gathering information from various sources like databases, sensors, or surveys.

Data Preprocessing includes cleaning, transforming, and organizing raw data for analysis.

Feature Selection identifies the most relevant variables that contribute to predictions.

Model Training uses algorithms to learn patterns from prepared training data.

Model Evaluation tests how well the trained model performs on new, unseen data.

Diagram:

flowchart TD
    A[Data Collection] --> B[Data Preprocessing]
    B --> C[Feature Selection]
    C --> D[Model Training]
    D --> E[Model Evaluation]
    E --> F[Deployment]

Mnemonic: “Collect Process Feature Train Evaluate Deploy”

Question 2(a) OR [3 marks]
#

Give the difference between predicative model and descriptive model.

Answer:

Table: Predictive vs Descriptive Models

Predictive ModelDescriptive Model
Forecasts future outcomesExplains current patterns
Uses supervised learningUses unsupervised learning
Stock price predictionCustomer segmentation

Predictive models use historical data to make predictions about future events or unknown outcomes.

Descriptive models analyze existing data to understand current patterns and relationships.

Mnemonic: “Predict = Future, Describe = Present”

Question 2(b) OR [4 marks]
#

Classify the following using appropriate data type: hair color, gender, blood group type, time of day.

Answer:

Table: Data Type Classification

DataTypeReason
Hair colorNominalCategories with no order
GenderNominalCategories with no order
Blood groupNominalCategories with no order
Time of dayContinuousMeasurable quantity

Nominal data represents categories without any natural ordering.

Continuous data can take any value within a range and is measurable.

Mnemonic: “Names = Nominal, Numbers = Numerical”

Question 2(c) OR [7 marks]
#

Explain various methods used in data pre-processing.

Answer:

Table: Data Preprocessing Methods

MethodPurposeExample
Data CleaningRemove errors and inconsistenciesFix typos, remove duplicates
Data IntegrationCombine multiple sourcesMerge customer databases
Data TransformationConvert to suitable formatNormalize values 0-1
Data ReductionReduce dataset sizeSelect important features

Data Cleaning removes or corrects erroneous, incomplete, or irrelevant data.

Data Integration combines data from multiple sources into a unified dataset.

Data Transformation converts data into appropriate formats for analysis.

Data Reduction decreases dataset size while maintaining information quality.

Diagram:

graph TD
    A[Raw Data] --> B[Data Cleaning]
    B --> C[Data Integration]
    C --> D[Data Transformation]
    D --> E[Data Reduction]
    E --> F[Processed Data]

Mnemonic: “Clean Integrate Transform Reduce”

Question 3(a) [3 marks]
#

Give difference between classification and regression.

Answer:

Table: Classification vs Regression

ClassificationRegression
Discrete outputContinuous output
Predicts categoriesPredicts numerical values
Email: spam/not spamHouse price prediction

Classification predicts discrete categories or classes from input data.

Regression predicts continuous numerical values from input data.

Mnemonic: “Class = Categories, Regress = Real numbers”

Question 3(b) [4 marks]
#

Write confusion matrix using appropriate example. Calculate accuracy and error rate for it.

Answer:

Example: Email Classification

Table: Confusion Matrix

Predicted SpamPredicted Not Spam
Actual Spam85 (TP)15 (FN)
Actual Not Spam10 (FP)90 (TN)

Calculations:

  • Accuracy = (TP+TN)/(TP+TN+FP+FN) = (85+90)/200 = 87.5%
  • Error Rate = (FP+FN)/(TP+TN+FP+FN) = (10+15)/200 = 12.5%

Key Terms:

  • TP: True Positive - Correctly predicted spam
  • TN: True Negative - Correctly predicted not spam

Mnemonic: “True Positive True Negative = Correct predictions”

Question 3(c) [7 marks]
#

Explain KNN algorithm in detail.

Answer:

K-Nearest Neighbors (KNN) is a simple classification algorithm that classifies data points based on the majority class of their K nearest neighbors.

Table: KNN Algorithm Steps

StepDescriptionExample
Choose KSelect number of neighborsK=3
Calculate DistanceFind distance to all pointsEuclidean distance
Find NeighborsIdentify K closest points3 nearest points
VoteMajority class wins2 cats, 1 dog → cat

Working Process:

  1. Calculate distances between test point and all training points
  2. Sort distances and select K nearest neighbors
  3. Count votes from each class among neighbors
  4. Assign class with majority votes

Diagram:

graph TD
    A[New Data Point] --> B[Calculate Distances]
    B --> C[Find K Nearest]
    C --> D[Majority Vote]
    D --> E[Predict Class]

Advantages:

  • Simple to implement and understand
  • No training required - lazy learning algorithm

Mnemonic: “K Nearest Neighbors Vote for classification”

Question 3(a) OR [3 marks]
#

Give any three applications of multiple linear regression.

Answer:

Applications of Multiple Linear Regression:

Table: Applications

ApplicationVariablesPurpose
House Price PredictionSize, location, ageEstimate property value
Sales ForecastingAdvertising, season, pricePredict revenue
Medical DiagnosisSymptoms, age, historyRisk assessment

Multiple Linear Regression uses multiple input variables to predict a continuous output variable.

Mnemonic: “Multiple inputs, One output”

Question 3(b) OR [4 marks]
#

Explain bagging, boosting and stacking in detail.

Answer:

Table: Ensemble Methods

MethodApproachExample
BaggingParallel training, average resultsRandom Forest
BoostingSequential training, learn from errorsAdaBoost
StackingMeta-learner combines modelsNeural network combiner

Bagging trains multiple models on different data subsets and averages predictions.

Boosting trains models sequentially, each learning from previous model’s mistakes.

Stacking uses a meta-model to learn how to combine predictions from base models.

Mnemonic: “Bag parallel, Boost sequential, Stack meta”

Question 3(c) OR [7 marks]
#

Explain single linear regression with its application.

Answer:

Single Linear Regression finds the best straight line relationship between one input variable (X) and one output variable (Y).

Formula: Y = a + bX

  • a: Y-intercept
  • b: Slope of line

Table: Application Example - House Price vs Size

House Size (sq ft)Price (lakhs)
100050
150075
2000100

Working Process:

  1. Collect data with input-output pairs
  2. Plot points on scatter graph
  3. Find best line that minimizes error
  4. Make predictions using line equation

Diagram:

Pric75e5010001500Size

Applications:

  • Sales vs Advertising: More ads → More sales
  • Temperature vs Ice cream sales: Hot weather → More sales

Mnemonic: “One X predicts One Y with a line”

Question 4(a) [3 marks]
#

Define the following: (1)support (2) confidence.

Answer:

Support measures how frequently an itemset appears in the dataset.

Confidence measures how often items in consequent appear when antecedent is present.

Table: Definitions

MeasureFormulaExample
SupportCount(itemset)/Total transactionsBread appears in 60% transactions
ConfidenceSupport(A∪B)/Support(A)80% who buy bread also buy butter

Support = Frequency of occurrence Confidence = Reliability of rule

Mnemonic: “Support = How often, Confidence = How reliable”

Question 4(b) [4 marks]
#

Explain applications of unsupervised learning.

Answer:

Table: Unsupervised Learning Applications

ApplicationPurposeExample
Customer SegmentationGroup similar customersMarketing campaigns
Data CompressionReduce data sizeImage compression
Anomaly DetectionFind unusual patternsFraud detection
Recommendation SystemsSuggest similar itemsMusic recommendations

Customer Segmentation groups customers with similar buying behavior for targeted marketing.

Data Compression reduces storage space by finding patterns and removing redundancy.

Anomaly Detection identifies unusual patterns that may indicate fraud or errors.

Mnemonic: “Segment Compress Detect Recommend”

Question 4(c) [7 marks]
#

Write and explain apriori algorithm with suitable example.

Answer:

Apriori Algorithm finds frequent itemsets and generates association rules for market basket analysis.

Table: Algorithm Steps

StepDescriptionExample
Find frequent 1-itemsetsCount individual items{Bread}:4, {Milk}:3
Generate 2-itemsetsCombine frequent items{Bread,Milk}:2
Apply minimum supportFilter infrequent setsKeep if support ≥ 50%
Generate rulesCreate if-then rulesBread → Milk

Example Dataset:

  • Transaction 1: {Bread, Milk, Eggs}
  • Transaction 2: {Bread, Milk}
  • Transaction 3: {Bread, Eggs}
  • Transaction 4: {Milk, Eggs}

Working Process:

  1. Scan database to count item frequencies
  2. Generate candidate itemsets of increasing size
  3. Prune infrequent itemsets below minimum support
  4. Generate association rules from frequent itemsets

Diagram:

flowchart TD
    A[Transaction Database] --> B[Find Frequent 1-itemsets]
    B --> C[Generate 2-itemsets]
    C --> D[Apply Min Support]
    D --> E[Generate Rules]

Mnemonic: “A-priori knowledge helps find frequent patterns”

Question 4(a) OR [3 marks]
#

List out the difference between clustering and classification.

Answer:

Table: Clustering vs Classification

ClusteringClassification
Unsupervised learningSupervised learning
No labeled dataUses labeled training data
Groups similar dataAssigns predefined labels

Clustering discovers hidden groups in unlabeled data.

Classification assigns new data to known categories using trained models.

Mnemonic: “Cluster = Groups unknown, Classify = Labels known”

Question 4(b) OR [4 marks]
#

Explain the clustering process in detail.

Answer:

Table: Clustering Process Steps

StepDescriptionPurpose
Data PreparationClean and normalize dataEnsure quality input
Distance MetricChoose similarity measureEuclidean, Manhattan
Algorithm SelectionPick clustering methodK-means, Hierarchical
Cluster ValidationEvaluate cluster qualitySilhouette score

Clustering Process groups similar data points together based on their characteristics.

Key decisions include choosing the number of clusters and appropriate distance metrics.

Validation ensures clusters are meaningful and well-separated.

Mnemonic: “Prepare Distance Algorithm Validate”

Question 4(c) OR [7 marks]
#

Write and explain K-means clustering algorithm with suitable example.

Answer:

K-means partitions data into K clusters by minimizing within-cluster sum of squares.

Table: Algorithm Steps

StepDescriptionExample
Initialize centroidsRandom K center pointsC1(2,3), C2(8,7)
Assign pointsEach point to nearest centroidPoint(1,2) → C1
Update centroidsMean of assigned pointsNew C1(1.5, 2.5)
RepeatUntil centroids stop movingConvergence

Example: Customer Income vs Age

  • Customer 1: (Income=30k, Age=25)
  • Customer 2: (Income=35k, Age=30)
  • Customer 3: (Income=70k, Age=45)
  • Customer 4: (Income=75k, Age=50)

Working Process:

  1. Choose K=2 clusters for young/old customers
  2. Initialize centroids randomly
  3. Calculate distances from each customer to centroids
  4. Assign customers to nearest centroid
  5. Update centroid positions to center of assigned customers
  6. Repeat until stable

Diagram:

flowchart TD
    A[Choose K] --> B[Initialize Centroids]
    B --> C[Assign Points to Nearest Centroid]
    C --> D[Update Centroid Positions]
    D --> E{Converged?}
    E -->|No| C
    E -->|Yes| F[Final Clusters]

Advantages:

  • Simple and fast for large datasets
  • Works well with spherical clusters

Mnemonic: “K centroids Mean their assigned points”

Question 5(a) [3 marks]
#

List the applications of matplotlib.

Answer:

Table: Matplotlib Applications

ApplicationPurposeExample
Data VisualizationCreate charts and graphsBar charts, histograms
Scientific PlottingResearch presentationsMathematical functions
Dashboard CreationInteractive displaysBusiness metrics

Matplotlib is Python’s primary plotting library for creating static, animated, and interactive visualizations.

Key features include support for multiple plot types and customizable styling.

Mnemonic: “Mat-plot-lib = Math Plotting Library”

Question 5(b) [4 marks]
#

Write down code to plot a vertical line and horizontal line using matplotlib.

Answer:

Code Block:

import matplotlib.pyplot as plt

# Create figure
plt.figure(figsize=(8, 6))

# Plot vertical line at x=3
plt.axvline(x=3, color='red', linestyle='--', label='Vertical Line')

# Plot horizontal line at y=2
plt.axhline(y=2, color='blue', linestyle='-', label='Horizontal Line')

# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Vertical and Horizontal Lines')
plt.legend()
plt.grid(True)
plt.show()

Key Functions:

  • axvline(): Creates vertical line
  • axhline(): Creates horizontal line

Mnemonic: “axvline = Vertical, axhline = Horizontal”

Question 5(c) [7 marks]
#

Explain features and applications of Scikit-Learn.

Answer:

Table: Scikit-Learn Features

FeatureDescriptionExample
Simple APIEasy to use interfacefit(), predict()
Multiple AlgorithmsVarious ML methodsSVM, Random Forest
Data PreprocessingBuilt-in data toolsStandardScaler
Model EvaluationPerformance metricsaccuracy_score

Scikit-Learn is Python’s most popular machine learning library providing simple tools for data analysis.

Key Strengths:

  • Consistent interface across all algorithms
  • Excellent documentation with examples
  • Active community support and development

Applications:

  • Classification: Email spam detection
  • Regression: House price prediction
  • Clustering: Customer segmentation
  • Dimensionality Reduction: Data visualization

Diagram:

graph TD
    A[Scikit-Learn] --> B[Classification]
    A --> C[Regression]
    A --> D[Clustering]
    A --> E[Preprocessing]
    B --> F[SVM, Decision Trees]
    C --> G[Linear, Polynomial]
    D --> H[K-means, DBSCAN]
    E --> I[Scaling, Encoding]

Mnemonic: “Scikit = Science Kit for machine learning”

Question 5(a) OR [3 marks]
#

Give the purpose of NumPy in machine learning.

Answer:

Table: NumPy Purpose in ML

PurposeDescriptionBenefit
Numerical ComputingFast array operationsEfficient calculations
Foundation LibraryBase for other librariesPandas, Scikit-learn use it
Mathematical FunctionsBuilt-in math operationsStatistics, linear algebra

NumPy provides the foundation for numerical computing in Python machine learning applications.

Essential for handling large datasets and performing mathematical operations efficiently.

Mnemonic: “Num-Py = Numerical Python”

Question 5(b) OR [4 marks]
#

Write down steps to import csv file in pandas.

Answer:

Code Block:

import pandas as pd

# Step 1: Import pandas library
# Step 2: Use read_csv() function
data = pd.read_csv('filename.csv')

# Step 3: Display first few rows
print(data.head())

# Optional: Specify parameters
data = pd.read_csv('file.csv', 
                   delimiter=',',
                   header=0,
                   index_col=0)

Steps:

  1. Import pandas library
  2. Use read_csv() function with filename
  3. Verify data with head() method

Mnemonic: “Import Read Verify”

Question 5(c) OR [7 marks]
#

Explain features and applications of Pandas.

Answer:

Table: Pandas Features

FeatureDescriptionExample
Data StructuresDataFrame and SeriesTabular data handling
Data I/ORead/write multiple formatsCSV, Excel, JSON
Data CleaningHandle missing valuesdropna(), fillna()
Data AnalysisStatistical operationsgroupby(), describe()

Pandas is the primary data manipulation library in Python for machine learning projects.

Key Capabilities:

  • Data Loading from various file formats
  • Data Cleaning and preprocessing operations
  • Data Transformation and reshaping
  • Statistical Analysis and aggregation

Applications:

  • Data Preprocessing: Clean datasets before ML
  • Exploratory Analysis: Understand data patterns
  • Feature Engineering: Create new variables
  • Data Integration: Merge multiple data sources

Diagram:

graph TD
    A[Raw Data] --> B[Pandas DataFrame]
    B --> C[Data Cleaning]
    C --> D[Data Analysis]
    D --> E[Feature Engineering]
    E --> F[ML Ready Data]

Advantages:

  • Intuitive syntax for data operations
  • High performance with optimized operations
  • Integration with other ML libraries

Mnemonic: “Pandas = Panel Data for analysis”

Related

Fundamentals of Machine Learning (4341603) - Summer 2023 Solution
Study-Material Solutions Machine-Learning 4341603 2023 Summer
Fundamentals of Machine Learning (4341603) - Summer 2024 Solution
Study-Material Solutions Machine-Learning 4341603 2024 Summer
Computer Networking (4343202) - Summer 2025 Solution
14 mins
Study-Material Solutions Networking 4343202 2025 Summer
Database Management System (1333204) - Summer 2025 Solution
17 mins
Study-Material Solutions Database 1333204 2025 Summer
Fundamentals of Software Development (4331604) - Summer 2025 Solution
Study-Material Solutions Software-Development 4331604 2025 Summer
Linear Integrated Circuit (4341105) - Summer 2025 Solution
16 mins
Study-Material Solutions Linear-Integrated-Circuit 4341105 2025 Summer