QuanTA Server - Quantitative Text Analysis
The QuanTA ("Quantitative Text Analysis") server is part of the QuanTA Research Project. This project studies quantitative aspects of texts in order to describe linguistic and textual regularities such as, e.g., letter frequencies, word frequencies, word length frequencies, sentence length frequencies, etc. Particular emphasis is laid on the frequency distributions of these units, as well as on the regularities of their interdependencies, etc.
The objectives of the QuanTA Server are unique: Several thousand texts from different languages are stored in a text data base, making each individual text available for statistical analyses. At the same time, analytical tools and scripts for the quantitative study of these texts are implemented on the QuanTA Server. It is planned to make the data of the QuanTA Server available by way of a corpus interface.
The poject text database currently contains more than 4000 texts from more than 100 different authors. Thus far, the project concentrates on Slavic languages, mainly Croatian, Russian and Slovenian, but current work also includes Czech and Slovak texts.
The range and number of texts is carefully chosen with regard to specific characteristics, not only language, but also author, text type, epoch, etc. Statistical results will therefore be obtainable both for individual texts and subgroups of texts selected according to on eor more of these characteristics. All files are checked for encoding errors before uploading. Thus far, access is limited to project members, only.
The files are being specifically tagged for analysis (see below). The analysis scripts are developed and adapted for execution on the server, both with a web interface and over XML-RPC. Results will be available for viewing in the browser and as file download for importing in SPSS and Excel.
For the quantitative analyses, a particular tagging system has been developed.