| Categories: | credativ® Inside HowTos |
|---|
LanguageTool is one of the leading open-source solutions for grammatical and stylistic text checking. While most users are likely familiar with the cloud-based version, the on-premise (self-hosted) variant is gaining increasing importance – especially for businesses, educational institutions, and organizations with high data protection and control requirements.
The core of LanguageTool is licensed under the GNU Lesser General Public License (LGPL-2.1). This license permits the free use, modification, and distribution of the software, even in commercial environments, provided that changes to the original code are also published under the LGPL. The license is “weak copyleft,” meaning that applications using LanguageTool as a library do not necessarily have to be open source. License information can be found in the official repository on GitHub in COPYING.txt. Third-party components such as dictionaries may be under different licenses (e.g., GPL).
There is an open-source version as well as an extended premium version with additional features such as improved style, semantics, and format checks. A detailed overview can be found on the website. It is important to note that for self-hosted instances, premium features are only available for commercial use and by individual quote. However, this is communicated with difficulty and primarily in the forum upon request. It also appears that not all premium features are available.
Unfortunately, LanguageTool made changes to the use of browser extensions in 2026: a premium subscription is now required for cloud usage. The self-hosted version remains unaffected – here, the browser extension can still be connected to your own server to enable seamless integration into web applications such as email, CMS, or forms.
LanguageTool has a modular design and combines several technologies. These go far beyond the integrated spell checking of, for example, LibreOffice or Thunderbird. However, a much-desired feature is currently not yet available: support for multiple languages within a single document.
Morphological Analyzer & POS Tagger
First, the text is broken down into sentences and words. Each word receives at least one Part-of-Speech (POS) tag (e.g., noun, verb, adjective). The analyzer also considers inflectional forms, so “gegangen” (gone) is correctly identified as a past participle.
Disambiguator
Many words have multiple meanings (e.g., “Bank” as a bench or a financial institution). The disambiguator uses contextual information to select the correct interpretation. This is done either rule-based or statistically and improves the accuracy of subsequent rule application.
Rule Engine (XML & Java)
Error detection is based on a combination of:
N-Gram Model (optional)
For improved detection of confusions (e.g., “ihre vs. ihre“), an n-gram model can be added. This uses statistical data from vast text corpora (e.g., Google Books) and compares the probability of word sequences. The n-gram data is not included in the standard package but can be downloaded locally.
spelling_custom.txt. AnnotatedText, HTML, LaTeX, or XML can be processed without distorting position information.JLanguageTool offers a powerful interface.Integration is versatile: in addition to the browser extension, LanguageTool supports APIs for custom applications, plugins for LibreOffice, Microsoft Word, Thunderbird, and direct connection to development tools. The self-hosted solution thus offers maximum flexibility, security, and scalability – ideal for use in sensitive or regulated environments.
A complete list can be found in the following link. Notably absent is a dedicated plugin for the Outlook client. As far as could be ascertained, the effort was probably not justified by the demand. However, there are only older posts in the forum about this. Nevertheless, LanguageTool in the browser also works without problems with Outlook in the browser. The limitation should therefore only affect the desktop client.
On Github, you will find various options for installing a self-hosted service. Especially for local installations, a Docker instance is probably the fastest to deploy.
Several images are linked here; the author chose one as an example.
The maintainer also offers various almost ready-to-use copy-paste solutions to start the service. This includes a Docker Compose template to start the service as an unprivileged user and keep the file system read-only:
To use this, the content must be written into, for example, a docker-compose.yml, the ‘ngrams’ and ‘fasttext’ directories created, and permissions adjusted for, for example, the ‘nobody’ user. All subsequent examples were performed on a Debian 13 system.
$ mkdir ~/Programme/Languagetool
$ cd ~/Programme/Languagetool
$ mkdir ngrams fasttext
$ chown nobody:nogroup ngrams fasttext
Below is the content of the compose-yaml with support for n-grams in German and English. It is important to note that the n-gram data is quite large and requires several GB of storage.
Currently, it is approximately 3 GB for German and 15 GB for English.
services:
languagetool:
image: meyay/languagetool:latest
container_name: languagetool
restart: unless-stopped
user: "65534:65534"
read_only: true
tmpfs:
- /tmp:exec
cap_drop:
- ALL
security_opt:
- no-new-privileges
ports:
- 8081:8081
environment:
download_ngrams_for_langs: de, en
volumes:
- ./ngrams:/ngrams
- ./fasttext:/fasttext
The service can then be started with the following command:
$ docker compose up -d
# Das Herunterladen der n-grams kann etwas dauern.
$ docker ps
2af60ed08544 meyay/languagetool:latest "/sbin/tini -g -e 14…" 4 weeks ago Up 3 hours (healthy) 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp languagetool
The service is now available, and the plugins should be able to access it. There is no authentication or similar. Anyone with access to the URL and port can use it.
LanguageTool on-premise combines data protection-compliant text checking with flexible integration. The LGPL-2.1 license allows free use, while comprehensive interfaces enable seamless integration into office and web applications. With the correct configuration, a local server becomes a fully functional, enterprise-grade solution for linguistic checking.
| Categories: | credativ® Inside HowTos |
|---|
About the author
Berater
about the person
Danilo ist seit 2016 Berater bei der credativ GmbH. Sein fachlicher Fokus liegt bei Containertechnologien wie Kubernetes, Podman, Docker und deren Ökosystem. Außerdem hat er Erfahrung mit Projekten und Schulungen im Bereich RDBMS (MySQL/Mariadb und PostgreSQL®). Seit 2015 ist er ebenfalls im Organisationsteam der deutschen PostgreSQL® Konferenz PGConf.DE.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Brevo. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou need to load content from Turnstile to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Turnstile. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information