Malware Similarity Unveiling Using Twin Neural Networks
Abstract
To mitigate the risks affiliated with the progress of big data, a better understanding of malware's characteristics and interrelationships is required. In largest portion of data repositories, the essence of malign content can be found. As a result, the primary goal of malware professional is to design a barrier. Neural networks can perform effectively on almost any task in the current era of deep learning, but they require more input data to do so. For some tasks, such as virus likeness detection, facial recognition, and signature verification, more input data is not necessarily the answer. With minimal input, the system must detect irregularities. A state of art is to design a system with new type of neural network architecture called Siamese Networks. To get better predictions, this model learns from a little bit of data. In recent years Siamese networks have become more prominent, because of their ability to learn from a small amount of input. The goal of this paper is to explain what is Twin network and development of malware similarity unveiling using twin network. This approach, first and foremost, sends the malware code semantics. It also makes use of Siamese organisation to eliminate semantic data elements. Finally, in high-dimensional space, the cosine distance is employed to estimate component vector similarity. A Google colab is used to conduct experiments on malimg dataset. As part of the sub network that develops feature embeddings, ResNet50 has been pre-trained. Each network examines a pair of input files to determine whether they belong to the same family. The results of the experiments show that suggested technique outperforms in a variety of areas, including more accurate results and greater performance.