How to Interpret rime's userdb.txt File (d)

doggie · March 2, 2025, 8:22am

This article is transcoded by SimpRead, original address kymotzhong.info

Preface

The situation is that the words “原因” (reason) and “元音” (vowel) often conflict in my input. Since I work in linguistics, I always need to input the word “元音” (vowel)… but “原因” (reason), as a more commonly used word in practice, naturally should take the first place in input suggestions.

There are some methods, such as using an additional vocabulary to fix the order, as shown in this issue. But I don’t want to change my schema because of this.

So I thought of modifying the word frequency in the user dictionary. The binary format is hard to change, so I modify the frequency generated during synchronization. Since during resynchronization the word frequency always takes the maximum value, as long as we write a very large value, the original value can be safely overwritten.

It should be emphasized that this is a not recommended approach.

Practice

Because I have already modified mine, I will take synchronized data from an earlier device as an example.

yuan2 yin1  元音  c=253 d=3.46751e-11 t=317773
yuan2 yin1  原因  c=104 d=0.00588705 t=317773

c is the total input count; I don’t know what d exactly abbreviates, deviation maybe? But in any case, changing d can change the candidate order of words, and the larger the value, the higher the priority.

To minimize the potential impact of modifications, I created a new folder dummy/ in the synchronization directory, created a terra_pinyin.userdb.txt with the following content:

# Rime user dictionary
#@/db_name  terra_pinyin
#@/db_type  userdb
#@/rime_version 1.7.3
#@/tick 393271
#@/user_id  dummy
yuan2 yin1  原因  c=500 d=114514 t=393271

Then I synchronized. Looking at the terra_pinyin.userdb.txt generated during synchronization, I found the d for “原因” changed to 10000, so there seems to be an upper limit.

Anyway, now it is guaranteed.

Additionally, this method is not recommended for single characters. I tried it, and it causes the short code to also be placed first, affecting normal input. For example, any character with pinyin starting with l, if processed this way, will be ranked before one of the most commonly used characters — “了”. It is a little better if it is a phrase with two or more characters and is indeed common.

© kymot 2024 Partial rights reserved.
Unless otherwise stated, the content is published under CC-BY 4.0.
This site does not have a comment system, but accepts comments and communication in email form.
Guangdong ICP License No. 2021086886
Generator: Quarto; Theme: Litera.

doggie · March 3, 2025, 3:52pm

github.com/rime/home

詞典快照文件 *.userdb.txt 中的 c、d、t 三個值分別是什麼意思

已打开 01:09PM - 22 Oct 15 UTC

已关闭 10:19AM - 14 May 19 UTC

Wujidadi

請問老大，詞典快照文件中每一個詞後面的 c、d、t 三個值分別是什麼意思，例如下面這個： c=1 d=9.15797e-25 t=783834 c=4 d=4….06581e-31 t=783834 我自己猜 c 應該是該詞打過的次數？ d 則關係到候選詞列表中的排列先後？ t 是時間？但是 c 好像又會減少？我把兩個同碼詞其中之一的 d 調高之後（使之排在前面），另外一個詞的 d 不變，但 c 降低了譬如 c=2 d=1 t=783834 → c=2 d=1.2 t=784000 c=2 d=1 t=783834 → c=1 d=1 t=784000

Topic	Replies	Views
【转载】鼠须管输入法配置 By 哈呜 / 2021年7月4日 🛠工具与编程 rime , ri	57	March 20, 2025
rime输入法词库配置 🛠工具与编程 rime	202	July 29, 2024
Rime 词典管理及其自动化 🛠工具与编程输入法 , rime	157	July 29, 2024
将基于mediawiki框架的百科的词库转成rime词库 🛠工具与编程	9	March 4, 2025
rime资源大集合 🛠工具与编程 rime	42	March 4, 2025

How to Interpret rime's userdb.txt File (d)

Preface

Practice

Related topics