SONIVIS Text Mining GUI
From SONIVIS:Wiki
Contents |
General
The SONIVIS tool has the following possibilities for using text mining: the Content Analysis Perspective, transformers and metrics. The Content Analysis Perspective including the Term Table View and the Term Clustering View.
TermTransformer
The TermTransformer extracts the terms of each revision text of a page. These data will be needed for loading content-based terms in the term table and for content-based clustering.
Which terms should be extracted, is defined by an additional view. The speed of the following analysis (term table, clustering) can be increased by deleting more terms in the transformer phase. But it is also possible that important terms are lost. The control parameters in the transformer are:
| Control Parameter | Description |
| Minimum Term Length | Only terms with the defined minimum length are extracted. |
| Minimum Term Frequency per Document | It indicates how often a term have to occur at least in a text. |
| Delete Pattern for Meta Elements | With this pattern, unimportant text elements will be deleted. An unimportant text element is for example the definition of the text color. In the file, regular expressions are defined line by line, which indicate the elements which can be deleted. |
| Blacklist of general Terms | The file contains terms (defined line by line), which are deleted. The terms are general terms like names. |
| Blacklist of terms (language specific, actual: german, english) | The file contains the terms of a language, which are no stopwords. These terms are deleted. |
| Stemming (language specific, actual: german, english) | Here the language specific stemming can be activated. |
It is possible to define a stop word list and activate stemming for more than one language. This can be helpful, if the texts are from different languages.
Term Table
The term table view represents a list of terms and their properties: the page the term occur (column: Page), the time when the term was added to the page (column: Created), the actor who added the term (column: Term), the type of the term (column: Term Type), the number of adds (column: +), the number of deletes (column: -), the sum of adds and deletes (column: Sum) and the absolute sum of adds and deletes (column: |Sum|).
The table can be loaded and manipulated over a second view (Table Manipulation View). This view can be opened over the Y-Button. The selected Term Types will be loaded in the table. For each column in the table a filter can be defined and grouping can be activated. The filter defines the values which should be loaded into the table. The values are separated by „;“. For example with the page filter „Main;Network“ only the pages „Main“ and „Network“ will be loaded. Grouping works like a „GROUP BY“ definition in MySQL. The attributes of the grouped column will be added and counted together. For example the grouping of terms effects, that the attributes of each terms will be counted and added together. The table shows for each term the number of pages of the term, the number of actors of the term, the total number of adds and deletes,...
The text field of the filter represents the table manipulation definition as a string. It is possible to change the table manipulation definition in the text field as well. Furthermore the definition can be saved and used for other analysis.
The export button allows the storage of the term table in csv format.
An example for the usage of the term table is given here.

