Naive Bayes Classification Using Python Programming

Name: Tejas Sahoo
Roll No: K057 Branch: BTech Cyber Security

Aim:

To implement supervised classification using Python programming with Naive Bayes.

Introduction:

Naive Bayes classification is a probabilistic classifier based on Bayes’ theorem. It assumes that the features are independent, which is a simplifying assumption often not matching reality.

Advantages:

  1. Extremely fast for both training and prediction.
  2. Very interpretable and requires few parameters.

Applications:

  1. Text classification and spam filtering.
  2. Predictive modeling in various domains.

Output

       

Description :

Return the first n rows.

This function returns the first n rows for the object based

on position. It is useful for quickly testing if your object

has the right type of data in it.

For negative values of n, this function returns all rows except

the last n rows, equivalent to df[:-n].

Accuracy :

-                                    print(accuracy_score(y_test,y_pred)*100)

b) What is the significance of data preprocessing in classification?

Data preprocessing is required for cleaning the data and making it suitable for a machine learning model which also increases the accuracy and efficiency of a machine learning model. It is a data mining technique that transforms raw data into an understandable format. Raw data (real world data) is always incomplete and that data cannot be sent through a model. That would cause certain errors.

It involves below steps:

●                                    Getting the dataset

●                                    Importing libraries

●                                    Importing datasets

●                                    Finding Missing Data

●                                    Encoding Categorical Data

●                                    Splitting dataset into training and test set

●                                    Feature scaling

Conclusion:

We successfully applied the concept of supervised data mining and got a model with an accuracy of 87.69%. We also generated a list of actual vs predicted status through our model prediction using the Naïve Bayes classifier.