Skip to main content
  1. Resources/
  2. Study Materials/
  3. Information Technology Engineering/
  4. IT Semester 4/
  5. Fundamentals of Machine Learning (4341603)/

Fundamentals of Machine Learning (4341603) - Winter 2023 Solution

·
Study-Material Solutions Machine-Learning 4341603 2023 Winter
Milav Dabgar
Author
Milav Dabgar
Experienced lecturer in the electrical and electronic manufacturing industry. Skilled in Embedded Systems, Image Processing, Data Science, MATLAB, Python, STM32. Strong education professional with a Master’s degree in Communication Systems Engineering from L.D. College of Engineering - Ahmedabad.
Table of Contents

Question 1(a) [3 marks]
#

Define human learning and explain how machine learning is different from human learning?

Answer:

Table: Human Learning vs Machine Learning

AspectHuman LearningMachine Learning
MethodExperience, trial and errorData and algorithms
SpeedSlow, gradualFast processing
Data RequirementLimited examples neededLarge datasets required
  • Human Learning: Process of acquiring knowledge through experience, observation, and reasoning
  • Machine Learning: Automated learning from data using algorithms to identify patterns

Mnemonic: “Humans Experience, Machines Analyze Data” (HEMAD)


Question 1(b) [4 marks]
#

Describe the use of machine learning in finance and banking.

Answer:

Applications in Finance and Banking:

ApplicationPurposeBenefit
Fraud DetectionIdentify suspicious transactionsReduce financial losses
Credit ScoringAssess loan default riskBetter lending decisions
Algorithmic TradingAutomated trading decisionsFaster market responses
  • Risk Assessment: ML analyzes customer data to predict creditworthiness
  • Customer Service: Chatbots provide 24/7 support using NLP
  • Regulatory Compliance: Automated monitoring for suspicious activities

Mnemonic: “Finance Needs Smart Analysis” (FNSA)


Question 1(c) [7 marks]
#

Give difference between Supervised Learning, Unsupervised Learning and Reinforcement Learning.

Answer:

Comparison Table:

FeatureSupervised LearningUnsupervised LearningReinforcement Learning
Data TypeLabeled dataUnlabeled dataEnvironment interaction
GoalPredict outputFind patternsMaximize rewards
ExamplesClassification, RegressionClustering, AssociationGame playing, Robotics
FeedbackImmediateNoneDelayed rewards

Key Characteristics:

  • Supervised Learning: Teacher-guided learning with correct answers provided
  • Unsupervised Learning: Self-discovery of hidden patterns in data
  • Reinforcement Learning: Learning through trial and error with rewards/penalties

Mnemonic: “Supervised Teachers, Unsupervised Explores, Reinforcement Rewards” (STUER)


Question 1(c OR) [7 marks]
#

Explain different tools and technology used in machine learning.

Answer:

ML Tools and Technologies:

CategoryToolsPurpose
ProgrammingPython, R, JavaAlgorithm implementation
LibrariesScikit-learn, TensorFlowReady-made algorithms
VisualizationMatplotlib, SeabornData visualization
Data ProcessingPandas, NumPyData manipulation

Key Technologies:

  • Cloud Platforms: AWS, Google Cloud for scalable computing
  • Development Environments: Jupyter Notebook, Google Colab
  • Big Data Tools: Spark, Hadoop for large datasets

Mnemonic: “Python Libraries Visualize Data Effectively” (PLVDE)


Question 2(a) [3 marks]
#

Define outliers with one example.

Answer:

Definition: Outliers are data points that significantly differ from other observations in a dataset.

Example Table:

Student Heights (cm)Classification
165, 170, 168, 172Normal values
195Outlier (too tall)
140Outlier (too short)
  • Detection: Values beyond 1.5 × IQR from quartiles
  • Impact: Can skew statistical analysis and model performance

Mnemonic: “Outliers Stand Apart” (OSA)


Question 2(b) [4 marks]
#

Explain regression steps in detail.

Answer:

Regression Process Steps:

flowchart TD
    A[Data Collection] --> B[Data Preprocessing]
    B --> C[Feature Selection]
    C --> D[Model Training]
    D --> E[Model Evaluation]
    E --> F[Prediction]

Detailed Steps:

  • Data Collection: Gather relevant dataset with input-output pairs
  • Preprocessing: Clean data, handle missing values, normalize features
  • Feature Selection: Choose relevant variables that affect target
  • Model Training: Fit regression line to minimize prediction errors

Mnemonic: “Data Preprocessing Features Train Evaluation Predicts” (DPFTEP)


Question 2(c) [7 marks]
#

Define Accuracy and for the following binary classifier’s confusion matrix, find the various measurement parameters like 1. Accuracy 2. Precision.

Answer:

Confusion Matrix Analysis:

Predicted NoPredicted Yes
Actual No10 (TN)3 (FP)
Actual Yes2 (FN)15 (TP)

Calculations:

MetricFormulaCalculationResult
Accuracy(TP+TN)/(TP+TN+FP+FN)(15+10)/(15+10+3+2)83.33%
PrecisionTP/(TP+FP)15/(15+3)83.33%

Definitions:

  • Accuracy: Proportion of correct predictions out of total predictions
  • Precision: Proportion of true positive predictions out of all positive predictions

Mnemonic: “Accuracy Counts All, Precision Picks Positives” (ACAPP)


Question 2(a OR) [3 marks]
#

Identify basic steps of feature subset selection.

Answer:

Feature Subset Selection Steps:

flowchart LR
    A[Original Features] --> B[Generate Subsets]
    B --> C[Evaluate Subsets]
    C --> D[Select Best Subset]

Basic Steps:

  • Generation: Create different combinations of features
  • Evaluation: Test each subset using performance metrics
  • Selection: Choose optimal subset based on criteria

Mnemonic: “Generate, Evaluate, Select” (GES)


Question 2(b OR) [4 marks]
#

Discuss the strength and weakness of the KNN algorithm.

Answer:

KNN Algorithm Analysis:

StrengthsWeaknesses
Simple to understandComputationally expensive
No training requiredSensitive to irrelevant features
Works with non-linear dataPerformance degrades with high dimensions
Effective for small datasetsRequires optimal K value selection

Key Points:

  • Lazy Learning: No explicit training phase required
  • Distance-Based: Classification based on neighbor proximity
  • Memory-Intensive: Stores entire training dataset

Mnemonic: “Simple but Slow, Effective but Expensive” (SBSEBE)


Question 2(c OR) [7 marks]
#

Define Error-rate and for the following binary classifier’s confusion matrix, find the various measurement parameters like 1. Error value 2. Recall.

Answer:

Confusion Matrix Analysis:

Predicted NoPredicted Yes
Actual No20 (TN)3 (FP)
Actual Yes2 (FN)15 (TP)

Calculations:

MetricFormulaCalculationResult
Error Rate(FP+FN)/(TP+TN+FP+FN)(3+2)/(15+20+3+2)12.5%
RecallTP/(TP+FN)15/(15+2)88.24%

Definitions:

  • Error Rate: Proportion of incorrect predictions out of total predictions
  • Recall: Proportion of actual positives correctly identified

Mnemonic: “Error Excludes, Recall Retrieves” (EERR)


Question 3(a) [3 marks]
#

Give any three examples of unsupervised learning.

Answer:

Unsupervised Learning Examples:

ExampleDescriptionApplication
Customer SegmentationGroup customers by behaviorMarketing strategies
Document ClassificationOrganize documents by topicsInformation retrieval
Gene SequencingGroup similar DNA patternsMedical research
  • Market Basket Analysis: Finding product purchase patterns
  • Social Network Analysis: Identifying community structures
  • Anomaly Detection: Detecting unusual patterns in data

Mnemonic: “Customers, Documents, Genes Group Automatically” (CDGGA)


Question 3(b) [4 marks]
#

Find Mean and Median for the following data: 4,6,7,8,9,12,14,15,20

Answer:

Statistical Calculations:

StatisticCalculationResult
Mean(4+6+7+8+9+12+14+15+20)/910.56
MedianMiddle value (5th position)9

Step-by-step:

  • Data: Already sorted: 4,6,7,8,9,12,14,15,20
  • Mean: Sum all values ÷ count = 95 ÷ 9 = 10.56
  • Median: Middle value in sorted list = 9 (5th position)

Mnemonic: “Mean Averages All, Median Middle Value” (MAAMV)


Question 3(c) [7 marks]
#

Describe k-fold cross validation method in detail.

Answer:

K-Fold Cross Validation Process:

flowchart TD
    A[Original Dataset] --> B[Split into K folds]
    B --> C[Train on K-1 folds]
    C --> D[Test on 1 fold]
    D --> E[Repeat K times]
    E --> F[Average Results]

Process Steps:

StepDescriptionPurpose
1. Data DivisionSplit data into K equal partsEnsure balanced testing
2. Iterative TrainingUse K-1 folds for trainingMaximum data utilization
3. ValidationTest on remaining foldUnbiased evaluation
4. AveragingCalculate mean performanceRobust performance estimate

Advantages:

  • Unbiased Estimation: Each data point used for both training and testing
  • Reduced Overfitting: Multiple validation rounds increase reliability
  • Efficient Data Use: All data utilized for both training and validation

Mnemonic: “K-fold Keeps Keen Knowledge” (KKKK)


Question 3(a OR) [3 marks]
#

Give any three applications of multiple linear regression.

Answer:

Multiple Linear Regression Applications:

ApplicationVariablesPurpose
House Price PredictionSize, location, ageReal estate valuation
Sales ForecastingMarketing spend, season, economyBusiness planning
Medical DiagnosisSymptoms, age, historyDisease prediction
  • Stock Market Analysis: Multiple economic indicators predict stock prices
  • Academic Performance: Study hours, attendance, previous grades predict scores
  • Marketing ROI: Various marketing channels impact sales revenue

Mnemonic: “Houses, Sales, Medicine Predict Multiple Variables” (HSMPV)


Question 3(b OR) [4 marks]
#

Find Standard Deviation for the following data: 4,15,20,28,35,45

Answer:

Standard Deviation Calculation:

StepCalculationValue
Mean(4+15+20+28+35+45)/624.5
VarianceΣ(xi-mean)²/n236.92
Std Dev√Variance15.39

Detailed Calculation:

  • Deviations from mean: (-20.5)², (-9.5)², (-4.5)², (3.5)², (10.5)², (20.5)²
  • Squared deviations: 420.25, 90.25, 20.25, 12.25, 110.25, 420.25
  • Sum: 1073.5
  • Variance: 1073.5/6 = 178.92
  • Standard Deviation: √178.92 = 13.38

Mnemonic: “Deviation Measures Data Spread” (DMDS)


Question 3(c OR) [7 marks]
#

Explain Bagging, Boosting in detail.

Answer:

Ensemble Methods Comparison:

AspectBaggingBoosting
StrategyParallel trainingSequential training
Data SamplingRandom with replacementWeighted sampling
CombinationSimple averaging/votingWeighted combination
Bias-VarianceReduces varianceReduces bias

Bagging (Bootstrap Aggregating):

flowchart LR
    A[Original Data] --> B[Bootstrap Sample 1]
    A --> C[Bootstrap Sample 2]
    A --> D[Bootstrap Sample n]
    B --> E[Model 1]
    C --> F[Model 2]
    D --> G[Model n]
    E --> H[Final Prediction]
    F --> H
    G --> H

Boosting Process:

  • Sequential Learning: Each model learns from previous model’s mistakes
  • Weight Adjustment: Increase weight of misclassified examples
  • Final Prediction: Weighted combination of all models

Key Differences:

  • Bagging: Independent models trained in parallel, reduces overfitting
  • Boosting: Dependent models trained sequentially, improves accuracy

Mnemonic: “Bagging Builds Parallel, Boosting Builds Sequential” (BBPBS)


Question 4(a) [3 marks]
#

Define: Support, Confidence.

Answer:

Association Rule Metrics:

MetricDefinitionFormula
SupportFrequency of itemset in transactionsSupport(A) = Count(A)/Total transactions
ConfidenceConditional probability of ruleConfidence(A→B) = Support(A∪B)/Support(A)

Example:

  • Support(Bread) = 0.6 (60% transactions contain bread)
  • Confidence(Bread→Butter) = 0.8 (80% of bread buyers also buy butter)

Applications:

  • Market Basket Analysis: Finding product associations
  • Recommendation Systems: Suggesting related items

Mnemonic: “Support Shows Frequency, Confidence Shows Connection” (SSFC)


Question 4(b) [4 marks]
#

Illustrate any two applications of logistic regression.

Answer:

Logistic Regression Applications:

ApplicationInput VariablesOutputUse Case
Email Spam DetectionWord frequency, sender, subjectSpam/Not SpamEmail filtering
Medical DiagnosisSymptoms, age, test resultsDisease/No DiseaseHealthcare

Key Features:

  • Binary Classification: Predicts probability between 0 and 1
  • S-shaped Curve: Uses sigmoid function for probability estimation
  • Linear Decision Boundary: Separates classes with linear boundary

Real-world Examples:

  • Marketing: Customer purchase probability based on demographics
  • Finance: Credit approval based on credit history and income

Mnemonic: “Logistic Limits Linear Logic” (LLLL)


Question 4(c) [7 marks]
#

Discuss the main purpose of Numpy and Pandas in machine learning.

Answer:

NumPy and Pandas in ML:

LibraryPurposeKey Features
NumPyNumerical computingArrays, mathematical functions
PandasData manipulationDataFrames, data cleaning

NumPy Functions:

graph LR
    A[NumPy] --> B[Array Operations]
    A --> C[Mathematical Functions]
    A --> D[Linear Algebra]
    A --> E[Random Numbers]

Pandas Capabilities:

  • Data Import/Export: Read CSV, Excel, JSON files
  • Data Cleaning: Handle missing values, duplicates
  • Data Transformation: Group, merge, pivot operations
  • Statistical Analysis: Descriptive statistics, correlation

Integration with ML:

  • Data Preprocessing: Clean and prepare data for algorithms
  • Feature Engineering: Create new features from existing data
  • Model Input: Convert data to format required by ML algorithms

Key Benefits:

  • Performance: Optimized C/C++ backend for speed
  • Memory Efficiency: Efficient data storage and manipulation
  • Ecosystem Integration: Works seamlessly with scikit-learn, matplotlib

Mnemonic: “NumPy Numbers, Pandas Processes Data” (NNPD)


Question 4(a OR) [3 marks]
#

Give any three examples of Supervised Learning.

Answer:

Supervised Learning Examples:

ExampleTypeInput → Output
Email ClassificationClassificationEmail features → Spam/Not Spam
House Price PredictionRegressionHouse features → Price
Image RecognitionClassificationPixel values → Object class
  • Medical Diagnosis: Patient symptoms → Disease classification
  • Stock Price Prediction: Market indicators → Future price
  • Speech Recognition: Audio signals → Text transcription

Mnemonic: “Emails, Houses, Images Learn Supervised” (EHILS)


Question 4(b OR) [4 marks]
#

Explain any two applications of the apriori algorithm.

Answer:

Apriori Algorithm Applications:

ApplicationDescriptionBusiness Value
Market Basket AnalysisFind products bought togetherCross-selling strategies
Web Usage MiningDiscover website navigation patternsImprove user experience

Market Basket Analysis:

  • Example: “Customers who buy bread and milk also buy eggs”
  • Business Impact: Product placement, promotional offers
  • Implementation: Analyze transaction data to find frequent itemsets

Web Usage Mining:

  • Example: “Users visiting page A often visit page B next”
  • Website Optimization: Improve navigation, recommend content
  • User Experience: Personalized website layouts

Algorithm Process:

  • Generate Candidates: Create frequent itemsets
  • Prune: Remove infrequent items
  • Generate Rules: Create association rules with confidence

Mnemonic: “Apriori Analyzes Associations Automatically” (AAAA)


Question 4(c OR) [7 marks]
#

Explain the features and applications of Matplotlib.

Answer:

Matplotlib Features and Applications:

Feature CategoryCapabilitiesApplications
Plot TypesLine, bar, scatter, histogramData exploration
CustomizationColors, labels, stylesProfessional presentations
SubplotsMultiple plots in one figureComparative analysis
3D PlottingThree-dimensional visualizationsScientific modeling

Key Features:

graph TD
    A[Matplotlib] --> B[2D Plotting]
    A --> C[3D Plotting]
    A --> D[Interactive Plots]
    A --> E[Publication Quality]
    B --> F[Line Charts]
    B --> G[Bar Charts]
    B --> H[Scatter Plots]
    C --> I[Surface Plots]
    C --> J[3D Scatter]

Applications in Machine Learning:

  • Data Exploration: Visualize data distribution and patterns
  • Model Performance: Plot accuracy, loss curves during training
  • Result Presentation: Display predictions vs actual values
  • Feature Analysis: Correlation matrices, feature importance plots

Advanced Capabilities:

  • Animation: Create animated plots for time-series data
  • Interactive Widgets: Add sliders, buttons for user interaction
  • Integration: Works with Jupyter notebooks, web applications

Benefits:

  • Flexibility: Highly customizable plotting options
  • Community: Large user base with extensive documentation
  • Compatibility: Integrates with NumPy, Pandas seamlessly

Mnemonic: “Matplotlib Makes Meaningful Visual Displays” (MMVD)


Question 5(a) [3 marks]
#

List out the major features of Numpy.

Answer:

NumPy Major Features:

FeatureDescriptionBenefit
N-dimensional ArraysEfficient array operationsFast mathematical computations
BroadcastingOperations on different sized arraysFlexible array manipulation
Linear AlgebraMatrix operations, decompositionsScientific computing support
  • Universal Functions: Element-wise operations on arrays
  • Memory Efficiency: Contiguous memory layout for speed
  • C/C++ Integration: Interface with compiled languages

Mnemonic: “NumPy Numbers Need Neat Operations” (NNNNO)


Question 5(b) [4 marks]
#

How to load an iris dataset csv file in a Pandas Dataframe program? Explain with example.

Answer:

Loading Iris Dataset:

import pandas as pd

# Method 1: Load from file
df = pd.read_csv('iris.csv')

# Method 2: Load from sklearn
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Display basic information
print(df.head())
print(df.info())
print(df.describe())

Code Explanation:

  • pd.read_csv(): Reads CSV file into DataFrame
  • columns parameter: Assigns column names
  • head(): Shows first 5 rows
  • info(): Displays data types and memory usage

Mnemonic: “Pandas Reads CSV Files Easily” (PRCFE)


Question 5(c) [7 marks]
#

Compare and Contrast Supervised Learning and Unsupervised Learning.

Answer:

Comprehensive Comparison:

AspectSupervised LearningUnsupervised Learning
Data TypeLabeled (input-output pairs)Unlabeled (input only)
Learning GoalPredict target variableDiscover hidden patterns
EvaluationAccuracy, precision, recallSilhouette score, inertia
ComplexityLess complex to evaluateMore complex to validate
ApplicationsClassification, regressionClustering, dimensionality reduction

Detailed Comparison:

graph LR
    A[Machine Learning] --> B[Supervised]
    A --> C[Unsupervised]
    B --> D[Classification]
    B --> E[Regression]
    C --> F[Clustering]
    C --> G[Association Rules]

Supervised Learning Characteristics:

  • Training Process: Learn from examples with known correct answers
  • Performance Measurement: Direct comparison with actual outcomes
  • Common Algorithms: Decision trees, SVM, neural networks
  • Business Applications: Fraud detection, medical diagnosis, price prediction

Unsupervised Learning Characteristics:

  • Exploration: Find unknown patterns without guidance
  • Validation Challenges: No ground truth for direct comparison
  • Common Algorithms: K-means, hierarchical clustering, PCA
  • Business Applications: Customer segmentation, market research, anomaly detection

Key Contrasts:

  • Feedback: Supervised has immediate feedback, unsupervised relies on domain expertise
  • Data Requirements: Supervised needs expensive labeled data, unsupervised uses readily available unlabeled data
  • Problem Types: Supervised solves prediction problems, unsupervised solves discovery problems

Mnemonic: “Supervised Seeks Specific Solutions, Unsupervised Uncovers Unknown” (SSSUU)


Question 5(a OR) [3 marks]
#

List out the applications of Pandas.

Answer:

Pandas Applications:

ApplicationDescriptionIndustry
Data CleaningHandle missing values, duplicatesAll industries
Financial AnalysisStock market, trading dataFinance
Business IntelligenceSales reports, KPI analysisBusiness
  • Scientific Research: Experimental data analysis
  • Web Analytics: Website traffic, user behavior analysis
  • Healthcare: Patient records, clinical trial data

Mnemonic: “Pandas Processes Data Perfectly” (PPDP)


Question 5(b OR) [4 marks]
#

How to plot a vertical line and horizontal line in matplotlib? Explain with examples.

Answer:

Matplotlib Line Plotting:

import matplotlib.pyplot as plt
import numpy as np

# Create sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Plot the main curve
plt.plot(x, y, label='sin(x)')

# Vertical line at x = 5
plt.axvline(x=5, color='red', linestyle='--', label='Vertical Line')

# Horizontal line at y = 0.5
plt.axhline(y=0.5, color='green', linestyle=':', label='Horizontal Line')

# Formatting
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.title('Vertical and Horizontal Lines')
plt.grid(True)
plt.show()

Key Functions:

  • axvline(): Creates vertical line at specified x-coordinate
  • axhline(): Creates horizontal line at specified y-coordinate
  • Parameters: color, linestyle, linewidth, alpha

Mnemonic: “Matplotlib Makes Lines Easily” (MMLE)


Question 5(c OR) [7 marks]
#

Describe the concept of clustering using appropriate real-world examples.

Answer:

Clustering Concept and Applications:

Clustering TypeReal-World ExampleBusiness Impact
Customer SegmentationGroup customers by purchase behaviorTargeted marketing campaigns
Image SegmentationMedical imaging for tumor detectionImproved diagnosis accuracy
Gene AnalysisGroup genes with similar expressionDrug discovery and treatment

Clustering Process:

flowchart TD
    A[Raw Data] --> B[Feature Selection]
    B --> C[Distance Calculation]
    C --> D[Cluster Formation]
    D --> E[Cluster Validation]
    E --> F[Business Insights]

Detailed Examples:

1. Customer Segmentation:

  • Data: Purchase history, demographics, website behavior
  • Clusters: High-value customers, price-sensitive buyers, occasional shoppers
  • Business Value: Customized marketing, product recommendations, retention strategies

2. Social Media Analysis:

  • Data: User interactions, post topics, engagement patterns
  • Clusters: Influencers, casual users, brand advocates
  • Applications: Viral marketing, content strategy, community management

3. Market Research:

  • Data: Survey responses, product preferences, demographics
  • Clusters: Market segments with similar needs
  • Insights: Product development, pricing strategy, market positioning

Clustering Algorithms:

  • K-Means: Partitions data into k clusters
  • Hierarchical: Creates tree-like cluster structure
  • DBSCAN: Finds clusters of varying density

Validation Methods:

  • Silhouette Score: Measures cluster quality
  • Elbow Method: Determines optimal number of clusters
  • Domain Expertise: Business knowledge validation

Benefits:

  • Pattern Discovery: Reveals hidden data structures
  • Decision Support: Provides insights for business decisions
  • Automation: Reduces manual data analysis effort

Mnemonic: “Clustering Creates Clear Categories” (CCCC)

Related

Fundamentals of Machine Learning (4341603) - Summer 2023 Solution
Study-Material Solutions Machine-Learning 4341603 2023 Summer
Linear Integrated Circuit (4341105) - Winter 2023 Solution
14 mins
Study-Material Solutions Linear-Integrated-Circuit 4341105 2023 Winter
Consumer Electronics & Maintenance (4341107) - Winter 2023 Solution
15 mins
Study-Material Solutions Consumer-Electronics 4341107 2023 Winter
Advanced Python Programming (4321602) - Winter 2023 Solution
36 mins
Study-Material Solutions Python 4321602 2023 Winter
Antenna & Wave Propagation (4341106) - Winter 2023 Solution
14 mins
Study-Material Solutions Antenna Wave-Propagation 4341106 2023 Winter
Database Management System (1333204) - Winter 2023 Solution
15 mins
Study-Material Solutions Database 1333204 2023 Winter