ABSTRACT

In this paper, we propose a feature-free method for detecting phishing websites using the Normalized Compression Distance (NCD), a parameter-free similarity measure which computes the similarity of two websites by compressing them, thus eliminating the need to perform any feature extraction. It also removes any dependence on a specific set of website features. This method examines the HTML of webpages and computes their similarity with known phishing websites, in order to classify them. We use the Furthest Point First algorithm to perform phishing prototype extractions, in order to select instances that are representative of a cluster of phishing webpages. We also introduce the use of an incremental learning algorithm as a framework for continuous and adaptive detection without extracting new features when concept drift occurs. On a large dataset, our proposed method significantly outperforms previous methods in detecting phishing websites, with an AUC score of 98.68%, a high true positive rate (TPR) of around 90%, while maintaining a low false positive rate (FPR) of 0.58%. Our approach uses prototypes, eliminating the need to retain long term data in the future, and is feasible to deploy in real systems with a processing time of roughly 0.3 seconds.

EXISTING SYSTEM

  • Malicious Web sites are the basis of  most of the criminal activities over the  internet.
  • The dangers that arise due to  the malicious sites are enormous and the  end-users must be prohibited from  visiting such sites.
  • The users should  prohibit themselves from clicking on  such Uniform Resource Locator (URL).
  • In order to prevent such  attacks, the paper proposes the use of  machine learning algorithms to detect
  • Phishing Websites. The Existing PWD  (Phishing Website Detection) model is trained using an existing dataset which  contains URLs, each with unique  features, and is applied to three different
  • machine learning classififiers—support vector machine, logistic regression and  Naïve Bayes. After training and testing the algorithms, it is observed that Naïve  Bayes classififier recorded the highest  accuracy

DISADVANTAGES

  • Low Accuracy Due to Training Loss
  • Many Website features not included for the consideration

PROPOSED SYSTEM

  • Collect dataset containing phishing and legitimate websites from the open source platforms.
  • Write a code to extract the required features from the URL database.
  • Analyze and preprocess the dataset by using EDA techniques.
  • Divide the dataset into training and testing sets.
  • Run selected machine learning and deep neural network algorithm (DNN) on the dataset.
  • Write a code for displaying the evaluation result considering accuracy metrics.
  • Compare the obtained results for trained models and specify which is better.
  • DNNThis is also one of the classification  algorithm which is supervised and is easy to use. It can used for both  classification and regression applications, but it is more famous to be used  in classification applications. In this algorithm each point which is a data  item is plotted in a dimensional space, this space is also known as n  dimensional plane, where the ‘n’ represents the number of features of the  data. The classification is done based on the differentiation in the classes, these classes are data set points present in different planes.

    ADVANTAGES

    • -Provide clear idea about the effective level of each classifier on phishing email detection
    • -High level of accuracy by take the advantages of classifiers many
    • – High level of accuracy.
    • Fast in classification process fast ,less consuming memory, high accuracy, Evolving with time, online working

    HARDWARE SOFTWARE REQUIREMENTS

Software Requirements:

  • Front End – Anaconda IDE
  • Backend – SQL
  • Language – Python 3.8

Hardware Requirements:

  • Hard Disk: Greater than 500 GB
  • RAM: Greater than 4 GB
  • Processor: I3 and Above

PROJECTS VIDEO

 

 Including Packages =======================

* Base Paper

* Complete Source Code

* Complete Documentation

* Complete Presentation Slides

* Flow Diagram

* Database File

* Screenshots

* Execution Procedure

* Readme File

* Addons

* Video Tutorials