Affordable Ubiquitous Access (AUA)
Network Security Research Group aim is to carry out network security related research as a group encompass both Network Security as well as Information Security.
Wireless Sensor & Communication Technology (WiSeCT) Research Group
Phishing Dataset – A Phishing and Legitimate Dataset for Rapid Benchmarking
The Phishing Websites Dataset contains a total of 30,000 samples of webpages, namely, 15,000 legitimate samples and 15,000 phishing samples. All webpage elements (i.e., images, URLs, HTML, screenshot and WHOIS information) are organized according to different folder for each sample.
Anti-phishing research is one of the active research fields in information security. Most of the researches are using their own dataset for the experiment. This makes the benchmarking become challenging and inefficient. The main objective of this project is to propose and construct a standard offline dataset that is universal and suitable for a wide range of anti-phishing researches. The dataset encompasses samples of phishing and legitimate webpages with a distribution of 50 percent each type. In order to make the dataset as comprehensive as possible, the project has considered major anti-phishing researches from the literature and performed a thorough investigation. The works include identifying the raw elements needed, source of the sample, size and influencing factor on the dataset that form the basis criteria for the dataset construction. The final outcome of this project is a readily downloadable dataset that has 30,000 samples of phishing and legitimate webpages. The samples in the dataset consist of all the required elements that have been used in different researches from the literature. The dataset will be useful and suitable for a wide range of anti-phishing researches in conducting the benchmarking and rapid proof of concept experiments. In addition, it is also useful for machine learning based phishing detection and phishing features analysis.
Details of the dataset:
Folder Structure: 2 folders (1st folder with 150 zip files and 2nd 15 zip files)
Option 1: Download as smaller zip file (if you want to try several first before downloading the whole dataset):
Option 2: Download as large zip file:
Contains 2 large zip file: (1) Legitimate.zip (56GB) and (2) Phishing.zip(7.3GB)
How to cite:
K. L. Chiew, E. H. Chang, C. L. Tan, J. Abdullah and K. S. C. Yong (2018), “Building Standard Offline Anti-Phishing Dataset for Benchmarking”, International Journal of Engineering and Technology, 7 (4.31) pp. 7-14.
Phishing Dataset for Machine Learning: Feature Evaluation
A total of 48 features are extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. These features were selected after surveying 11 research papers that focus on machine learning-based phishing website detection, published between year 2007 and 2016. A detailed listing of the features is provided in the Appendix section of our paper here: https://www.sciencedirect.com/science/article/pii/S0020025519300763
Details of the dataset:
Size: 1.30 MB
Type: ARFF (Weka-ready)
How to cite:
K. L. Chiew, C. L. Tan, K. Wong, K. S. C. Yong, and W. K. Tiong (2019), "A new hybrid ensemble feature selection framework for machine learning-based phishing detection system", Information Sciences, 484, pp. 153-166.
If you need further information or to collaborate on this dataset, please contact:
Associate Professor Dr Chiew Kang Leng
Faculty of Computer Science and Information Technology
Universiti Malaysia Sarawak
94300 Kota Samarahan
Phone: +6082 58 3735
AGREEMENT AND DISCLAIMER
By downloading the dataset, I hereby agree to the following terms and conditions:
1. The dataset should be only used for non-commercial research and educational purposes.
2. The copyright of all components in the dataset fully belongs to the original owners.
4. In no event will we be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from loss of data or profits arising out of, or in connection with, the use of this dataset.
5. We reserve the right to terminate access to the dataset at any time without notice.
6. This agreement must be retained with the dataset.
1. An Energy-Aware Soft-Handoff Algorithm to Support Strong Mobility In Mobile Wireless Sensor Network (MWSN) (Associate Professor Dr Kartinah Binti Zen)
2. Enhancing Soft Skills Assessment for Future Ready Graduates (Associate Professor Dr Kartinah Binti Zen)
3. Performance Evaluation of Teaching and Learning Activities at Selected Schools Through Computational Thinking in Sarawak (Associate Professor Dr Kartinah Binti Zen)
4. Intelligent Event Correlation for End-To-End Internet Performance Monitoring (Associate Professor Dr Johari Bin Abdullah)
5. A Dynamic and Automated Signature Detection Framework for Malware Analysis (Associate Professor Dr Johari Bin Abdullah)
6. Proposal for Utilizing Rm50k Seed Fund for Centre for Big Data Analytics and Visualization (Associate Professor Dr Johari Bin Abdullah)
7. Automated Analysis of Palm Oil Tree Health Using Deep Learning Approach (Associate Professor Dr Johari Bin Abdullah)
8. Secondary Security Layer for Anti-Phishing Technique using Image Processing Algorithm (Associate Professor Dr Chiew Kang Leng)
9. Enhancing The Phishing Website Detection Model Through Uniform Resources Locator Analysis (Associate Professor Dr Chiew Kang Leng)
10. Enhancing Pharming Attack Detection Model (Associate Professor Dr Chiew Kang Leng)
11. Enhancing Phishing Attack Detection Model Through Data Mining and Data Analysis for Better Detection Accuracy (Associate Professor Dr Chiew Kang Leng)
12. Modelling Crowd Management for Public Healthcare: A Case Study of Hospital Duchess of Kent (Associate Professor Dr Chiew Kang Leng)
13. Sport Management Framework Using Biometric Identification (Associate Professor Dr Chiew Kang Leng)
14. Social Based Neighbor Selection Protocol for Data Dissemination in Opportunistic Networks (Associate Professor Dr Halikul Bin Lenando)
15. A CSI-based Throughput and QoS Enhancement Scheme for TCP over Cooperative Broadband Wireless Net (Associate Professor Dr Tan Chong Eng)
16. An Optimized and Weighted Genetic Algorithm approach for PAPR Reduction in Large Sub-Carriers OFDM Systems adopting (Associate Professor Dr Tan Chong Eng)
17. Resource efficient scheme and intelligent mesh network for long range rural WiFi connectivity (Associate Professor Dr Tan Chong Eng)
18. Cost Effective and Self-Sufficient Power Supply Model for Wireless End User Access Equipment in Rural Long Houses (Associate Professor Dr Tan Chong Eng)
19. The Project for Electrification by Solar Energy in Long Puah (Associate Professor Dr Tan Chong Eng)
20. Intelligent Power Switching Algorithm for Rural Hybrid Renewable Energy System (Associate Professor Dr Tan Chong Eng)
21. Community Self-Sustainable Cultural Mapping for Buayan, Sabah (Associate Professor Dr Tan Chong Eng)
22. Mobile Application Testing Model (Associate Professor Dr Mohamad Nazim Bin Jambli)
23. A Dynamic Energy-Savvy Routing Algorithm for Mobile Ad-Hoc and Sensor Networks (Associate Professor Dr Mohamad Nazim Bin Jambli)
24. A Dynamic Clustering Routing Algorithm to Prolong Network Lifetime in Mobile Ad-hoc Sensor Networks (Associate Professor Dr Mohamad Nazim Bin Jambli)
25. ICT Technopreneurship for Rural Community (ICT4RC) (Associate Professor Dr Mohamad Nazim Bin Jambli)
26. A Lightweight Routing Algorithm to Support A High Mobility Nodes for Data Hovering in Distributed Sensor Networks (Associate Professor Dr Mohamad Nazim Bin Jambli)
27. MMR: Mobile Medical Record Application (Associate Professor Dr Mohamad Nazim Bin Jambli)
28. Designing of Security Protocols for Mobile Multihop Relay Based WiMAX Networks (Dr Adnan Shahid Khan)
29. Secure and Dynamic Multiple Junction Selection Routing Protocol in VANET (Dr Adnan Shahid Khan)
30. IoT based Energy Efficient Solar Panel Tracking and Monitoring System (Dr Adnan Shahid Khan)
31. Securing Internet-of-things (IoT) through Blockchain (Dr Adnan Shahid Khan)
32. Blockchain Based Cybersecurity (Dr Adnan Shahid Khan)
33. Quantum Cryptographic Scheme for Mobile Multihop D2D Using Deep Learning Approach (Dr Adnan Shahid Khan)
34. Efficient load balancing algorithm for Domain Name Services (DNS) in Web Based Application and Services (Dr Mohamad Imran Bin Bandan)
35. Reliability-aware Checkpoint Insertion Technique for Cloud Services Architecture (Dr Mohamad Imran Bin Bandan)
36. An Efficient Reliability-aware Checkpointing Algorithm for Tasks Processing in Cloud Environment (Dr Mohamad Imran Bin Bandan)
37. Effective Compression Scheme for High Latency Networks in Supporting Real-Time Applications (Dr Lau Sei Ping)
38. Investigating the Performance of Flooding Protocol in Distributed Traffic-Aware Lighting Scheme Management Network (Dr Lau Sei Ping)
39. An Attendance Taking and Offline Job Verification System (Dr Lau Sei Ping)
40. Efficient Routing for Data Aggregation in Wireless Sensor Network to Minimize Energy Consumption (Mr Ahmad Hadinata Bin Fauzi)
41. Mitigating MAC Layer Attacks in 5G Cellular Networks (Mr Ahmad Hadinata Bin Fauzi)
42. Sarawak Traditional Food Locator using Mobile Apps (Mr Ahmad Hadinata Bin Fauzi)
43. Tyre Tread Wear Indicator Analysis Using Digital Image Processing (Mr Ahmad Hadinata Bin Fauzi)
44. Scalable Rekeying Secrecy Model for D2D Group Communication in 5G Cellular Networks (Mr Rajan Thangaveloo)
45. Strengthening Alternative Assessment of Network Computing Undergraduate Programme (WC11) Towards Future Ready (Madam Seleviawati Binti Tarmizi)
46. Impact Study on Computational Thinking Techniques Towards STEM Based and Non-STEM Based Subjects at Selected Schools (Madam Seleviawati Binti Tarmizi)
47. Energy-Aware Clustering Routing Algorithm for Mobile Ad-Hoc Sensor Networks (Madam Azlina Binti Ahmadi Julaihi)
48. Simulating Crowd Behaviour in Emergency Evacuation Situation using Modified Social Force Model (Madam Azlina Binti Ahmadi Julaihi)