Sign Up for Free

RunKit +

Try any Node.js package right in your browser

This is a playground to test code. It runs a full Node.js environment and already has all of npm’s 400,000 packages pre-installed, including @nahanil/zh-tokenizer with all npm packages installed. Try it out:

var zhTokenizer = require("@nahanil/zh-tokenizer")

This service is provided by RunKit and is not affiliated with npm, Inc or the package authors.

@nahanil/zh-tokenizer v0.1.3

Tokenize Chinese texts into words.

@nahanil/zh-tokenizer

pipeline status coverage report NPM version    

Tokenizes Chinese texts into words using CC-CEDICT.

Extended from https://github.com/takumif/cedict-lookup

Installation

Use npm to install:

npm install @nahanil/zh-tokenizer --save

Updated Usage

Make sure to provide the CC-CEDICT data. Will not work with simplified characters

const tokenizer = require('@nahanil/zh-tokenizer')('./cedict.txt')
console.log(tokenizer.tokenize('我是中国人。'))

Usage

Make sure to provide the CC-CEDICT data.

const tokenizer = require('@nahanil/zh-tokenizer')('./cedict.txt')
console.log(tokenizer.tokenize('我是中国人。'))
const tokenizer = require('@nahanil/zh-tokenizer')('./cedict.txt', 'traditional')
console.log(tokenizer.tokenize('我是中國人。'))

Output:

[ { traditional: '我',
    simplified: '我',
    pinyin: 'wo3',
    pinyinPretty: 'wǒ',
    english: 'I/me/my' },
  { traditional: '是',
    simplified: '是',
    pinyin: 'shi4',
    pinyinPretty: 'shì',
    english: 'is/are/am/yes/to be\nvariant of 是[shi4]/(used in given names)' },
  { traditional: '中國人',
    simplified: '中国人',
    pinyin: 'zhong1 guo2 ren2',
    pinyinPretty: 'zhōng guó rén',
    english: 'Chinese person' },
  { traditional: '。',
    simplified: '。',
    pinyin: null,
    pinyinPretty: null,
    english: null } ]

Metadata

RunKit is a free, in-browser JavaScript dev environment for prototyping Node.js code, with every npm package installed. Sign up to share your code.
Sign Up for Free