A Combined Feature Selection Techniques and Distance Measures based Approach for Author Profiling

  • T. Raghunadha Reddya, P. Vijaya Pal Reddyb, M. Tejaswinic, N. Shravyad, Ch. Upendrae
Keywords: Author Profiling, gender Prediction, Age Prediction, Feature Selection Techniques

Abstract

Author Profiling is a task of determining the demographic characteristics like Gender, Age, location, Nativity language, Personality traits etc. about the author of a document. Author profiling (AP) is used in various applications such as marketing, forensic science, security and linguistic profile. The researchers proposed different types of solutions to author profiling based on stylistic features, content based features and deep learning techniques. The content based features proved their significance to improve the performance of author profiles prediction. Several approaches faced a problem of high dimensionality of features when experimented with content based features. In this work, the experiment conducted with various feature selection techniques to avoid the high dimensionality problem as well as determining the most relevant features to distinguish the writing styles of the authors. Different feature selection techniques like information gain, gini index, chi square, mutual information and relative discriminative criterion are used in this work to identify relevant features and remove redundant features. The documents are represented as vectors by using identified features and forwarded to different machine learning algorithms. Various machine learning algorithms such as support vector machine, naïve bayes, decision tree, random forest, k-nearest neighbor and logistic regression are used to evaluate the proposed feature selection technique based approach. The experiment continued with six distance measures such as Euclidean Distance, Manhattan Distance, Minkowski Distance, Cosine, Jaccard and Dice to compute the similarity between training and test documents. The PAN competition 2014 reviews dataset is used for gender and age prediction of the author. The experimental results attained in this work are good when compared with several popular solutions to author profiling.

Published
2021-08-07
How to Cite
N. Shravyad, Ch. Upendrae, T. R. R. P. V. P. R. M. T. (2021). A Combined Feature Selection Techniques and Distance Measures based Approach for Author Profiling . Design Engineering, 7021-7040. Retrieved from http://www.thedesignengineering.com/index.php/DE/article/view/3220
Section
Articles