Malicious URL Detection using Natural Language Processing and Machine Learning

  • Bugata Nageswara Rao, K. Narasimha Raju, Dekka Satish
Keywords: Malicious, URL, Machine Learning, Kaggle, Python


Malicious (Phishing) URLs are a critical threat to cybersecurity because then they can lead to scams in which individuals lose money, personally identifiable information, and accounts. It is crucial to be able to respond appropriately to these attacks. The most reliable procedure for controlling this issue is to utilize blacklists, whereas this method has a lot of problems responding against new URLs. Machine Learning is a process where a system learns from training and this training is useful for predictions. In today's digital world, machine learning becomes a buzzword as it can able to solve most cybersecurity problems. In this work, we collected a phishing URLs dataset from Kaggle (which contains more than 5,00000 URLs), and a machine learning-oriented solution is provided for malicious URL detection. As the URLs are in text format, we also applied various text preprocessing, text encoding techniques. First, we applied three text encoding techniques hashing vectorization, Count Vectorization, TF-IDF Vectorization. Later, we applied five machine learning algorithms namely SVM, Decision Tree, K-NN, Logistic Regression, Random Forest. We achieved an accuracy of 97.8% with random forest. Our model outperforms the previous models for malicious URL detection. We used python for implementation

How to Cite
Dekka Satish, B. N. R. K. N. R. (2022). Malicious URL Detection using Natural Language Processing and Machine Learning. Design Engineering, (1), 248-255. Retrieved from