您当前的位置:home > Detailed Browse

Article Detail

古文LIWC词典的构建及初步分析

Classical Chinese LIWC: A Brief Introduction and Pilot Analysis

Submit Time: 2019-12-20
Author: 范妙榕 1,2 ; 邢付贵 1,2 ; 刘兴云 1,2 ; 朱廷劭 2 ;
Institute: 1.中国科学院大学 北京 100049; 2.中国科学院心理研究所,北京 100101;

Abstracts

[背景]LIWC(基于语词计量的文本分析)以关键词的词频统计为基础,可对个体和群体的表达语句的心理学意义等方面进行量化分析。由于文言文的表达方式与现代汉语存在明显的差异,为了分析文言文文本的心理学意义,我们在简体中文LIWC词典(Simplified Chinese LIWC 2015年版本, 简称SC-LIWC)的基础上,构建了古文LIWC(Classical Chinese LIWC,以下简称CC-LIWC)词典。[目的]本研究的目的是探究如何构建CC-LIWC词典并介绍如何使用该词典对古文文本进行分析。[方法]获取在线汉语词典的全部词汇及其对应解释,保留文言文词及其现代文译文,并从译文中寻找SC-LIWC词,将SC-LIWC词与文言文词进行匹配。对匹配结果进行人工标注,确保结果的一致性与准确性。[结果]最终生成的CC-LIWC包含了81个词类与49136个文言文词条。[局限]古文中一词多义、一词多性的情况较为普遍,对词典中词汇的分类存在一定影响。[结论]使用CC-LIWC对《论语(节选)》、《孤愤》进行词频分析,分析结果体现了儒家的中庸与法家的注重逻辑辩证的区别,说明CC-LIWC词典能够有效区分文本的表达倾向。
[英文摘要][Background] Based on counting frequency of specially selected words, LIWC (known as Linguistic Inquiry and Word Count) is a useful tool to analyze expressions of writings or other texts created by individuals or group, for purpose of figuring out the psychological meanings inside the texts. In ancient China, the classical style of writing has a striking difference with modern times. In order to analyze the psychological meanings of classical Chinese text, we construct a Classical Chinese version of LIWC dictionary (known as CC-LIWC), based on the 2015 edition of Simplified Chinese LIWC (known as SC-LIWC). [Objective] In this paper, we show the constructing process of CC-LIWC and give an example of how to use the dictionary to analyze classical Chinese text. [Methods] First, we obtain all the words (including modern Chinese and Classical Chinese words) and their corresponding explanations from the online Chinese dictionary and keep the classical Chinese words with their modern translation; second, we search SC-LIWC words in the explanations. In this way, SC-LIWC words are mapping with the classical Chinese words; finally, we invite ancient Chinese based professionals to check the mapping results manually to ensure the consistency and accuracy of the results. [Results] The final dictionary includes 81 categories and 49136 classical Chinese entries. [Limitations] In classical Chinese context, polysemy or diversity of a word is very common, which affects the classification of words in the dictionary. [Conclusion] we use CC-LIWC to analyze The Analects(excerpts) and The Isolated Indignation. The result shows the difference between the moderation of Confucian and the dialectical thinking of Legalist. Therefore, CC-LIWC dictionary can distinguish the expression tendency of text efficiently.
Download Comment Hits:20046 Downloads:3557
From: 朱廷劭
DOI:10.12074/201912.00027
Recommended references: 范妙榕,邢付贵,刘兴云,朱廷劭.(2019).古文LIWC词典的构建及初步分析.[ChinaXiv:201912.00027] (Click&Copy)
Version History
[V1] 2019-12-20 15:57:51 chinaXiv:201912.00027V1 Download
Related Paper

Download

Current Browse

Change Subject Browse

Cross Subject Browse