![]() Footnote 1 We identify and evaluate four reasons why the meaning of a text may get lost in translation. This paper evaluates the usefulness of machine translation for automated bag-of-words models. But does the meaning of these texts get lost in Google translation? That is, do we lose (too much) information if we Google Translate texts before we analyze them? Or does doing so leave us like the poor souls who journeyed west for gold but were left with nothing? On the plus side, nowadays this can be automated by using machine translation, such as Google Translate. In order to make comparisons across countries, researchers first need to translate texts from several languages into one. But researchers interested in cross-country comparisons face a problem: people speak different languages. Many researchers have noticed its potential and are now using methods such as topic modeling, scaling and sentiment analysis to analyze political texts (for an overview see Grimmer and Stewart Reference Grimmer and Stewart2013). We conclude that Google Translate is a useful tool for comparative researchers when using bag-of-words text models.Īutomated text analysis is like a gold rush. With regard to LDA topic models, we find topical prevalence and topical content to be highly similar with again only small differences across languages. What is more, we find considerable overlap in the set of features generated from human-translated and machine-translated texts. ![]() We first find TDMs for both text corpora to be highly similar, with minor differences across languages. We evaluate results at both the document and the corpus level. We use the europarl dataset and compare term-document matrices (TDMs) as well as topic model results from gold standard translated text and machine-translated text. But in doing so, do we get lost in translation? This paper evaluates the usefulness of machine translation for bag-of-words models-such as topic models. To address this issue, some analysts have suggested using Google Translate to convert all texts into English before starting the analysis (Lucas et al. 2015). Yet, comparative researchers are presented with a big challenge: across countries people speak different languages. Automated text analysis allows researchers to analyze large quantities of text.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |