This is a playground
to test code. It runs a full Node.js
environment and already has all of npm
’s 400,000 packages pre-installed, including detergent
with all npm
packages installed. Try it out:
require()
any package directly from npmawait
any promise instead of using callbacks (example)This service is provided by RunKit and is not affiliated with npm, Inc or the package authors.
a tool to prepare text for pasting into HTML
Online web app: https://detergent.io
npm i detergent
Consume via a require()
:
const { det, opts, version } = require("detergent");
or as an ES Module:
import { det, opts, version } from "detergent";
or for web pages, as a production-ready minified script file (so-called "UMD build"), straight from CDN:
<script src="https://cdn.jsdelivr.net/npm/detergent/dist/detergent.umd.js"></script>
// in which case you get a global variable "detergent" which you consume like this:
const { det, opts, version } = detergent;
This package has three builds in dist/
folder:
Type | Key in package.json | Path | Size |
---|---|---|---|
Main export - CommonJS version, transpiled to ES5, contains require and module.exports | main | dist/detergent.cjs.js | 47 KB |
ES module build that Webpack/Rollup understands. Untranspiled ES6 code with import /export . | module | dist/detergent.esm.js | 52 KB |
UMD build for browsers, transpiled, minified, containing iife 's and has all dependencies baked-in | browser | dist/detergent.umd.js | 392 KB |
Detergent is a tool which cleans and prepares text so you can paste it safely into HTML template:
For starters, Detergent will:
Then Detergent will optionally:
£
into £
)Adobe Photoshop and Illustrator both place ETX characters when you insert linebreaks using SHIFT+ENTER to break the line but keep the text within the same paragraph (that's opposed to normal line breaks using ENTER alone which breaks paragraphs). When a text with an ETX character is pasted into HTML template, it is invisible in the code editor but might surface up later as "�" when CMS or ESP or other platform attempts to read the code.
Detergent has optional features to improve the English style:
between last two wordsExtra features are:
<BR>
's to appear: with a closing slash (XHTML) or without (HTML), so your HTML code should be passing the W3C validator.
instead of  
) so you can read and recognise them. Not all named HTML entities work in all email clients, so we did the testing, found out which-ones don't render correctly and set those to be numeric.The main function is exported in a plain object under key detergent
, so please import it like that:
const { det } = require("detergent");
// or request everything:
const { det, opts, version } = require("detergent");
// this gives extra plain object `exportedOpts` with default options. Handy when
// developing front-ends that consume the Detergent.
det
is the main function. See its API below.
opts
is default options' object. You pass it (or its tweaked version) to det
.
version
returns same-named package.json key's value - the version of the particular copy of Detergent you've got.
det()
InputThe det
above is a function. You pass two input arguments to it:
Input argument | Type | Obligatory? | Description |
---|---|---|---|
input | String | yes | The string you want to clean. |
options | Object | no | Options object. See its key arrangement below. |
det()
options objectOptions object's key | Type of its value | Default | Description |
---|---|---|---|
{ | |||
fixBrokenEntities | Boolean | True | should we try to fix any broken named HTML entities like &nsp; ("b" missing) |
removeWidows | Boolean | True | replace the last space in paragraph with a non-breaking space |
convertEntities | Boolean | True | encode all non-ASCII chars |
convertDashes | Boolean | True | typographically-correct the n/m-dashes |
convertApostrophes | Boolean | True | typographically-correct the apostrophes |
replaceLineBreaks | Boolean | True | replace all line breaks with br 's |
removeLineBreaks | Boolean | False | put everything on one line (removes any line breaks, inserting space where necessary) |
useXHTML | Boolean | True | add closing slashes on br 's |
dontEncodeNonLatin | Boolean | True | skip non-latin character encoding (for example, CJK, Alefbet Ivri or Arabic abjad) |
addMissingSpaces | Boolean | True | adds missing spaces after dots/colons/semicolons, unless it's an URL |
convertDotsToEllipsis | Boolean | True | convert three dots into … - ellipsis character. When set to false , all encoded ellipses will be converted to three dots. |
stripHtml | Boolean | True | by default, all HTML tags are stripped (with exception to opts.keepBoldEtc - option to ignore b , strong and other tags). You can turn off HTML tag removal completely here. |
stripHtmlButIgnoreTags | Array | ["b", "strong", "i", "em", "br", "sup"] | List zero or more strings, each meaning a tag name that should not be stripped. For example, ["a", "sup"] . |
stripHtmlAddNewLine | Array | ["li", "/ul"] | List of zero or more tag names which, if stripped, are replaced with a line break. Closing tags must start with slash. |
} |
Here it is in one place:
det("text to clean", {
fixBrokenEntities: true,
removeWidows: true,
convertEntities: true,
convertDashes: true,
convertApostrophes: true,
replaceLineBreaks: true,
removeLineBreaks: false,
useXHTML: true,
dontEncodeNonLatin: true,
addMissingSpaces: true,
convertDotsToEllipsis: true,
stripHtml: true,
stripHtmlButIgnoreTags: ["b", "strong", "i", "em", "br", "sup"],
stripHtmlAddNewLine: ["li", "/ul"]
});
The default set is a wise choice for the most common scenario - preparing text to be pasted into HTML.
You can also set the options to numeric 0
or 1
, that's shorter than Boolean true
or false
.
det()
output objectoutput object's key | Type of its value | Description |
---|---|---|
{ | ||
res | String | The cleaned string |
applicableOpts | Plain Object | Copy of options object without keys that have array values, each set to boolean, is that function applicable to given input |
} |
Function det
returns a plain object, for example:
{
res: "abc",
applicableOpts: {
fixBrokenEntities: false,
removeWidows: false,
convertEntities: false,
convertDashes: false,
convertApostrophes: false,
replaceLineBreaks: false,
removeLineBreaks: false,
useXHTML: false,
dontEncodeNonLatin: false,
addMissingSpaces: false,
convertDotsToEllipsis: false,
stripHtml: false
}
}
applicableOpts
Next generation web applications are designed to show only the options that are applicable to the given input. This saves user's time and also conserves mental resources — you don't even need to read all the labels of the options if they are not applicable.
Detergent currently has 14 option keys, 12 of them boolean. That's not a lot but if you use the tool every day, every optimisation counts.
I got inspiration for this feature while visiting competitor application https://typograf.github.io — it has 110 checkboxes grouped into 12 groups and options are hidden twice — first sidebar is hidden when you visit the page, second, option groups are collapsed.
Another example of overwhelming options set — Kangax minifier — https://kangax.github.io/html-minifier/ — it's got 26 options with heavy descriptions.
Detergent tackles this problem by changing its algorithm: it processes the given input and then makes a note, is particular option applicable or not, independently, is it enabled or not. Then, if it's enabled, it changes the result value.
For example, detergent's output might look like this — all options not applicable because there's nothing to do on "abc":
{
res: "abc",
applicableOpts: {
fixBrokenEntities: false,
removeWidows: false,
convertEntities: false,
convertDashes: false,
convertApostrophes: false,
replaceLineBreaks: false,
removeLineBreaks: false,
useXHTML: false,
dontEncodeNonLatin: false,
addMissingSpaces: false,
convertDotsToEllipsis: false,
stripHtml: false
}
}
The options keys which have values of a type array (stripHtmlButIgnoreTags
and stripHtmlAddNewLine
) are omitted from applicableOpts
report.
The simplest possible operation - encoding using default settings:
const { det } = require("detergent");
let { res } = det("clean this text £");
console.log(res);
// > 'clean this text £'
Now, using custom settings object with one custom setting convertEntities
(others are left default):
const { det } = require("detergent");
let { res } = det("clean this text £", {
convertEntities: 0 // <--- zero is like "false", turns off the feature
});
console.log(res);
// > 'clean this text £'
In monorepo, npm libraries are located in packages/
folder. Inside, the source code is located either in src/
folder (normal npm library) or in the root, cli.js
(if it's a command-line application).
The npm script "dev
", the "dev": "rollup -c --dev --silent"
builds the development version retaining all console.log
s with row numbers. It's handy to have js-row-num-cli installed globally so you can automatically update the row numbers on all console.log
s.
MIT License
Copyright (c) 2015-2019 Roy Revelt and other contributors
Passes unit tests from https://github.com/kemitchell/straight-to-curly-quotes.json, licenced under CC0-1.0