LO3 – Software Engineering Approach to Software and Systems Development
Design and Modeling of the Phishing Detection Solution
The Software Solution, is based on heuristics implementation of an expert system that is capable of detecting possible phishing websites using URL, Domain, and Abnormal features. The proposed system will be capable of accepting a URL as user input and detect whether it is a legitimate, suspicious or a phished URL. This determination will be based on several grouped heuristics implementations as per feature category.
The system architecture entails a User Interface design used for URL input by a user, which then displays the results to the user. Upon input of the URL by the user, the phishing detection system utilizes standard inbuilt JAVA functions to extract the resulting URL’s site features, which are collected and utilized in the URL input classification phase. This is illustrated in figure 1 below, which represents the system architecture. It involves five major steps: (1) User Input, (2) Features Extraction, (3) Classification of the URL, (4) Results Output, and (5) Evaluation.
User Input – Involves input of the URL (either legitimate or Phishing) or upload of URL containing file. Systems then extracts URL features such as: number of page visits, number of visitors to URL, giving an overview of URL category.
Feature Extraction – Relevant URL features are extracted to help categorize whether legitimate or phishing URLs based on three groups: URL address based features, domain-based features, and abnormal-based features.
URL Classification – Entails subjecting URL features to different heuristics to determine whether it is legitimate, suspicious or phished.
Classified URLs Output – Classification results are displayed in the form of tables and graphs providing a visual representation of the URLs. In addition, in case of phishing category, a severity scale (of 1-5) is applied to each URL to gauge its impact.
Evaluation – Classification results are evaluated based on recall and precision methods. A confusion matrix is applied in the evaluation of the classification process using true negatives, false negatives, false positives and true positives.
Figure 1: System Architecture of the Phishing Detection System Based on Heuristics
Use Case Diagrams
Phishing Detection Based on Heuristics Use Case Diagram
URL-related Features Use Case Diagram
Abnormal-based Features Use Case Diagram
Domain-based Features Use Case Diagram
System Class Diagram
Features Class Diagram
Classification Class Diagram
Sequence Diagrams / Communication Diagrams
URL Classification and Evaluation
Data Verification and Validation
Data verification and validation of the phishing detection system based on heuristics will be based on Use Case Testing, where negative and positive test cases will be executed. This is will involve feeding the system with both malicious and legitimate URLs to validate and verify whether URL classification is accurate, correct or not. In addition, various heuristics combinations will be applied.
Unit testing method will be applied to validate each software module unit as per the Usecase model. For each test case the following elements will be specified for system validation: the test case description; the input URL; heuristics used; results; and severity of results.
Importance of Data Verification and Validation
Ensure correct implementation of system data flows as outlined in the system design specifications
Ensure adequacy of implemented system, where the system solution satisfies proposed requirements
Eliminate possible redundancy, by removing unnecessary system design elements, hence significantly improving system design quality
Promote logical consistency of solution design across the entire system, while eliminating any contradictory portions
Ensure there is internal completeness of the system design based on various conditions combinations
Promote design principle compliance, such as, low coupling, high cohesion, and concerns separation
To ensure module integration and interfacing is accurate through explicit communication between modules (Kung 2007)
Kendall, K & Kendall, J 2011, Systems Analysis and Design, Prentice Hall, New Jersey, USA.
Kung, D 2007, ‘Software Verification and Validation’, Encyclopedia of Computer Science and Engineering, viewed 2 March 2017