Abstract:
Authorship attribution is the natural language processing task of the author identification of an input text. The main goal of this task is to define the salient characteristics of documents that capture the author's writing style. In this paper, we analyze language-independent features for authorship attribution. All experiments were realized on the corpus of Ukrainian scientific papers. For the experiments we used Bayes Based Algorithms (Naive Bayes Multinomial), Support Vector Machine (SMO) and Decision Trees (LMT, J48) methods. The experimental results of the scientific text classification demonstrated that Decision Trees method in most cases outperforms other machine learning methods, and the proposed in the paper language-independent features are appropriate for the Ukrainian scientific documents authorship attribution.