Protect Your Online Security

Deep Learning-Based Phishing URL Intelligent Detection System

We utilize advanced artificial intelligence technology to provide high-precision phishing website detection services, protecting your personal information security.

System Features

High Accuracy

99.8% Detection Precision

Our detection model is trained on data from millions of websites, capable of precisely identifying various phishing website features, with a detection accuracy of up to 99.8%, far surpassing traditional rule-based detection methods.

Real-time Detection

Millisecond-level Response

Based on an optimized neural network architecture, our system can complete URL feature extraction and risk assessment within milliseconds, providing immediate security protection.

Batch Detection

Supports Multi-URL Analysis

The system supports batch URL detection and file import functionality. Enterprise users can check hundreds or thousands of URLs simultaneously, improving efficiency and suitable for large-scale security audits.

System Statistics

200,000+
URLs Scanned
99.8%
Detection Accuracy
100,000+
Phishing Sites Identified
1,000+
Active Users

Phishing Website Type Distribution

Monthly Detection Trend

Technology Implementation

System Design

System Design Architecture Diagram

Our system utilizes a multi-stage processing workflow to detect phishing URLs through feature extraction and deep learning models. The system primarily includes the following components:

Feature Extraction Module

Manual Features
  • URL Length
  • Special Character Count
  • Numeric Ratio
  • Sensitive Word Detection
  • Domain Length
Automatic Features
  • Branch 1: Character-level Features
  • Branch 2: Word-level Features
  • Branch 3: N-grams Analysis
  • Branch 4: TF-IDF Vectorization
Domain Features
  • pr_pos: Position Feature
  • pr_val: Value Feature
  • harmonic_pos: Harmonic Position
  • harmonic_val: Harmonic Value

Detection Process

  1. Data Input → Retrieve URL samples from URLhaus and Common CRAWL
  2. Similarity Filtering → Ensure Data Set Diversity
  3. Feature Extraction → Generate Multi-dimensional Feature Vectors
  4. Model Prediction → Use MGCF-Net Deep Learning Model
  5. Output Classification → Phishing URL/Legal URL

Data Set Construction

Our data set comes from URLhaus and Common CRAWL, and the construction process is as follows:

Data Source Sample Quantity Processing Method
URLhaus (Phishing URLs) 368,319 Samples 50% Data Set Composition
Common CRAWL (Legal URLs) About 370,000 Samples 50% Data Set Composition
Similarity Filtering About 200,000 Samples Used for Model Training
Adversarial Sample Generation About 32,000 Samples Enhance Model Robustness
Data Set Construction Flow Diagram

Data Set Division

Training Set

80%

Validation Set

10%

Test Set

10%

Adversarial Sample Generation

Character Replacement Technique

Replace characters like "o" with "0", "-" with "_", etc.

Subdomain Addition

Add brand as a subdomain to the domain

Consecutive Character Replacement

Replace dots in the domain with consecutive characters

Deceptive Path

Add a fake path like "/secure/login/"

Feature Extraction Method

N-grams (n=3) Analysis

We use trigrams to extract semantic information from URLs:

http://www.example.com/login.php?user=admin&action=delete
http www example com login php user admin action delete

By analyzing each part of the URL: protocol, hostname, path, and parameters, we can capture the structural features of the URL.

N-grams Analysis Diagram

TF-IDF Vectorization

Term Frequency (TF):

Measures the frequency of term t in URL u.

TF(t,u) = Number of times term t appears in URL u / Total number of terms in URL u

Inverse Document Frequency (IDF):

Evaluates the rareness of term t in the entire corpus.

IDF(t) = log(Total number of URLs in the corpus / (Number of URLs containing term t + 1))

TF-IDF Weight:

TF-IDF(t,d) = TF(t,d) × IDF(t)

This weight can highlight key features in the URL while reducing the impact of common terms.

TF-IDF Vectorization Process

Neural Network Model

MGCF-Net
  • Use Word Embedding to Capture Semantic Information
  • CNN Extracts Local Context Features
  • BiLSTM Captures Global Sequence Information
  • Semantic Fusion Layer Integrates Features
  • Cross Attention Mechanism Enhances Feature Representation
  • Integrate Manual Features and Domain Knowledge
DeepCNN_Light_Hybrid
  • Lightweight Architecture, Suitable for Low Resource Environments
  • Deep CNN Extracts Semantic Representation
  • Combine Manual Features to Enhance Performance
  • Domain Name Reputation Assessment
  • Efficient Feature Concatenation Strategy

Our Development Team

Meet the talented professionals behind this phishing detection system

Team member

Huang Hao

Leading the Project

Oversees the entire project, coordinates the team, and ensures the project's goals are achieved through efficient collaboration and planning.

Leadership Strategy
Team member

Chen Zijie

Designing the Web Interface

Creates and designs the web interface, ensuring the user experience is intuitive, engaging, and aligned with project goals.

UI/UX Design Web Development
Team member

Zhao Chuyu

Analyzing the Data

Focuses on analyzing data to extract meaningful insights, helping guide decisions that drive the project forward.

Data Analysis Data Insights
Team member

Tan Mingshu

Managing Version Control

Handles version control and ensures smooth integration of the team’s work, maintaining a seamless development workflow.

Version Control Git Management
Team member

Li Zhuyi

Extracting Key Features

Focuses on identifying and extracting essential features from raw data to enhance model performance.

Feature Engineering Data Processing
Team member

Wen Tianshu

Collecting Valuable Sources

Gathering and curating the datasets needed for the project, ensuring a high quality and diverse data foundation.

Data Collection Source Management
Team member

Meng Yu

Drawing Illustrations

Creates visual illustrations, charts, and diagrams to represent data and models in an easily digestible format.

Illustration Data Visualization
Team member

Zhou Yitong

Creating Artistic Designs

Designs creative and artistic visuals to enhance the project's aesthetic appeal and visual communication.

Art Creative Design