Sign Up for Free

RunKit +

Try any Node.js package right in your browser

This is a playground to test code. It runs a full Node.js environment and already has all of npm’s 400,000 packages pre-installed, including detergent with all npm packages installed. Try it out:

var detergent = require("detergent")

This service is provided by RunKit and is not affiliated with npm, Inc or the package authors.

detergent v5.7.2

a tool to prepare text for pasting into HTML

Detergent

a tool to prepare text for pasting into HTML

Minimum Node version required Repository is on GitLab Coverage View dependencies as 2D chart Downloads/Month Test in browser Code style: prettier MIT License

Online web app: https://detergent.io

Table of Contents

Install

npm i detergent

Consume via a require():

const { det, opts, version } = require("detergent");

or as an ES Module:

import { det, opts, version } from "detergent";

or for web pages, as a production-ready minified script file (so-called "UMD build"), straight from CDN:

<script src="https://cdn.jsdelivr.net/npm/detergent/dist/detergent.umd.js"></script>
// in which case you get a global variable "detergent" which you consume like this:
const { det, opts, version } = detergent;

This package has three builds in dist/ folder:

TypeKey in package.jsonPathSize
Main export - CommonJS version, transpiled to ES5, contains require and module.exportsmaindist/detergent.cjs.js47 KB
ES module build that Webpack/Rollup understands. Untranspiled ES6 code with import/export.moduledist/detergent.esm.js52 KB
UMD build for browsers, transpiled, minified, containing iife's and has all dependencies baked-inbrowserdist/detergent.umd.js392 KB

⬆ back to top

Rationale

Detergent is a tool which cleans and prepares text so you can paste it safely into HTML template:

For starters, Detergent will:

  • delete invisible Unicode characters
  • collapse whitespace chunks longer than one space (considering newlines)
  • strip HTML and recursively decode anything HTML-encoded

Then Detergent will optionally:

  • encode all non-ASCII characters (for example, £ into &pound;)
  • improve English grammar style (for example, convert straight quotes to curly)

Adobe Photoshop and Illustrator both place ETX characters when you insert linebreaks using SHIFT+ENTER to break the line but keep the text within the same paragraph (that's opposed to normal line breaks using ENTER alone which breaks paragraphs). When a text with an ETX character is pasted into HTML template, it is invisible in the code editor but might surface up later as "�" when CMS or ESP or other platform attempts to read the code.

Detergent has optional features to improve the English style:

Extra features are:

  • You can skip the HTML encoding of non-Latin language letters. Useful when you are deploying Japanese or Chinese emails because otherwise, everything would be HTML-encoded.
  • Detergent is both XHTML and HTML-friendly. You can set which way you want your <BR>'s to appear: with a closing slash (XHTML) or without (HTML), so your HTML code should be passing the W3C validator.
  • Detergent handles the full range of Unicode code points. In other words, it's emoji-friendly.
  • Detergent will use the named HTML entities (for example, &nbsp; instead of &#xA0;) so you can read and recognise them. Not all named HTML entities work in all email clients, so we did the testing, found out which-ones don't render correctly and set those to be numeric.

⬆ back to top

API

The main function is exported in a plain object under key detergent, so please import it like that:

const { det } = require("detergent");
// or request everything:
const { det, opts, version } = require("detergent");
// this gives extra plain object `exportedOpts` with default options. Handy when
// developing front-ends that consume the Detergent.

det is the main function. See its API below.

opts is default options' object. You pass it (or its tweaked version) to det.

version returns same-named package.json key's value - the version of the particular copy of Detergent you've got.

⬆ back to top

API - det() Input

The det above is a function. You pass two input arguments to it:

Input argumentTypeObligatory?Description
inputStringyesThe string you want to clean.
optionsObjectnoOptions object. See its key arrangement below.

⬆ back to top

API - det() options object

Options object's keyType of its valueDefaultDescription
{
fixBrokenEntitiesBooleanTrueshould we try to fix any broken named HTML entities like &nsp; ("b" missing)
removeWidowsBooleanTruereplace the last space in paragraph with a non-breaking space
convertEntitiesBooleanTrueencode all non-ASCII chars
convertDashesBooleanTruetypographically-correct the n/m-dashes
convertApostrophesBooleanTruetypographically-correct the apostrophes
replaceLineBreaksBooleanTruereplace all line breaks with br's
removeLineBreaksBooleanFalseput everything on one line (removes any line breaks, inserting space where necessary)
useXHTMLBooleanTrueadd closing slashes on br's
dontEncodeNonLatinBooleanTrueskip non-latin character encoding (for example, CJK, Alefbet Ivri or Arabic abjad)
addMissingSpacesBooleanTrueadds missing spaces after dots/colons/semicolons, unless it's an URL
convertDotsToEllipsisBooleanTrueconvert three dots into &hellip; - ellipsis character. When set to false, all encoded ellipses will be converted to three dots.
stripHtmlBooleanTrueby default, all HTML tags are stripped (with exception to opts.keepBoldEtc - option to ignore b, strong and other tags). You can turn off HTML tag removal completely here.
stripHtmlButIgnoreTagsArray["b", "strong", "i", "em", "br", "sup"]List zero or more strings, each meaning a tag name that should not be stripped. For example, ["a", "sup"].
stripHtmlAddNewLineArray["li", "/ul"]List of zero or more tag names which, if stripped, are replaced with a line break. Closing tags must start with slash.
}

Here it is in one place:

det("text to clean", {
  fixBrokenEntities: true,
  removeWidows: true,
  convertEntities: true,
  convertDashes: true,
  convertApostrophes: true,
  replaceLineBreaks: true,
  removeLineBreaks: false,
  useXHTML: true,
  dontEncodeNonLatin: true,
  addMissingSpaces: true,
  convertDotsToEllipsis: true,
  stripHtml: true,
  stripHtmlButIgnoreTags: ["b", "strong", "i", "em", "br", "sup"],
  stripHtmlAddNewLine: ["li", "/ul"]
});

The default set is a wise choice for the most common scenario - preparing text to be pasted into HTML.

You can also set the options to numeric 0 or 1, that's shorter than Boolean true or false.

⬆ back to top

API - det() output object

output object's keyType of its valueDescription
{
resStringThe cleaned string
applicableOptsPlain ObjectCopy of options object without keys that have array values, each set to boolean, is that function applicable to given input
}

Function det returns a plain object, for example:

{
  res: "abc",
  applicableOpts: {
    fixBrokenEntities: false,
    removeWidows: false,
    convertEntities: false,
    convertDashes: false,
    convertApostrophes: false,
    replaceLineBreaks: false,
    removeLineBreaks: false,
    useXHTML: false,
    dontEncodeNonLatin: false,
    addMissingSpaces: false,
    convertDotsToEllipsis: false,
    stripHtml: false
  }
}

⬆ back to top

applicableOpts

Next generation web applications are designed to show only the options that are applicable to the given input. This saves user's time and also conserves mental resources — you don't even need to read all the labels of the options if they are not applicable.

Detergent currently has 14 option keys, 12 of them boolean. That's not a lot but if you use the tool every day, every optimisation counts.

I got inspiration for this feature while visiting competitor application https://typograf.github.io — it has 110 checkboxes grouped into 12 groups and options are hidden twice — first sidebar is hidden when you visit the page, second, option groups are collapsed.

Another example of overwhelming options set — Kangax minifier — https://kangax.github.io/html-minifier/ — it's got 26 options with heavy descriptions.

Detergent tackles this problem by changing its algorithm: it processes the given input and then makes a note, is particular option applicable or not, independently, is it enabled or not. Then, if it's enabled, it changes the result value.

For example, detergent's output might look like this — all options not applicable because there's nothing to do on "abc":

{
  res: "abc",
  applicableOpts: {
    fixBrokenEntities: false,
    removeWidows: false,
    convertEntities: false,
    convertDashes: false,
    convertApostrophes: false,
    replaceLineBreaks: false,
    removeLineBreaks: false,
    useXHTML: false,
    dontEncodeNonLatin: false,
    addMissingSpaces: false,
    convertDotsToEllipsis: false,
    stripHtml: false
  }
}

The options keys which have values of a type array (stripHtmlButIgnoreTags and stripHtmlAddNewLine) are omitted from applicableOpts report.

⬆ back to top

Example

The simplest possible operation - encoding using default settings:

const { det } = require("detergent");
let { res } = det("clean this text £");
console.log(res);
// > 'clean this text &pound;'

Now, using custom settings object with one custom setting convertEntities (others are left default):

const { det } = require("detergent");
let { res } = det("clean this text £", {
  convertEntities: 0 // <--- zero is like "false", turns off the feature
});
console.log(res);
// > 'clean this text £'

⬆ back to top

Contributing

  • If you see an error, raise an issue.
  • If you want a new feature but can't code it up yourself, also raise an issue. Let's discuss it.
  • If you tried to use this package, but something didn't work out, also raise an issue. We'll try to help.
  • If you want to contribute some code, fork the monorepo via GitLab, then write code, then file a pull request on GitLab. We'll merge it in and release.

In monorepo, npm libraries are located in packages/ folder. Inside, the source code is located either in src/ folder (normal npm library) or in the root, cli.js (if it's a command-line application).

The npm script "dev", the "dev": "rollup -c --dev --silent" builds the development version retaining all console.logs with row numbers. It's handy to have js-row-num-cli installed globally so you can automatically update the row numbers on all console.logs.

⬆ back to top

Licence

MIT License

Copyright (c) 2015-2019 Roy Revelt and other contributors

Passes unit tests from https://github.com/kemitchell/straight-to-curly-quotes.json, licenced under CC0-1.0

Metadata

RunKit is a free, in-browser JavaScript dev environment for prototyping Node.js code, with every npm package installed. Sign up to share your code.
Sign Up for Free