Like the traditional Unix spell(1), hspell outputs the sorted list of incorrect words, and does not have a more friendly interface for making corrections for you. However, unlike spell(1), hspell can suggest possible corrections for some spelling errors. Such suggestions can be enabled with the -c (correct) and -n (notes) options.
Hspell currently expects ISO-8859-8-encoded input files. Non-Hebrew characters in the input files are ignored, allowing the easy spellchecking of Hebrew-English texts, as well as HTML or TeX files. If files using a different encoding (e.g., UTF-8) are to be checked, they must be converted first to ISO-8859-8 (e.g., see iconv(1), recode(1)).
The output will also be in ISO-8859-8 encoding, in so-called "logical order", so it is normally useful to pipe it to bidiv(1) before viewing, as in:
If Hspell was built without morphological analysis support, this option will only show the correct splits of the given word into prefix + word, as the full information incurs a 4-fold increase in the installation size.
Giving the -c option in addition to -l results in special behavior. In that case hspell suggests "corrections" to every word (regardless if they are in the dictionary or not), and shows the linguistic information on all those words. This can be useful for a reader application, which may also want to be able to understand misspellings and their possible meanings.
Running hspell with the program name hspell-i also enables the -i option. This is a useful trick when an application expects just the name of a spell-checking program, and adds only the "-a" option (without giving the user an option to also add "-i"). The multispell script supplied with hspell serves a similar purpose, with more control over encodings and which spell-checker to run for non-Hebrew words.
This is both an advantage and a disadvantage, depending on your viewpoint. It's an advantage because it encourages a correct and consistent spelling style throughout your writing. It is a disadvantage, because a few of the Academia's official spelling decisions are relatively unknown to the general public.
Users of Hspell (and all Hebrew writers, for that matter) are encouraged to read the Academia's official niqqud-less spelling rules (which are printed at the end of most modern Hebrew dictionaries, and an abridged version is available in http://hebrew-academy.huji.ac.il/decision4.html). Users are also encouraged to refer to Hebrew dictionaries which use the niqqud-less spelling (such as Millon Ha-hove, Rav Milim, and the new Even Shoshan).
Hspell's distribution (and Web site) also include a document, niqqudless.odt, which explains Hspell's spelling standard in detail (in Hebrew). It explains both the overall principles, and why specific words are spelled the way they are.
In order for this dictionary to be completely free of other people's copyright restrictions, the Hspell project is a clean-room implementation, not based on pre-existing word lists or spell checkers, or on copying of printed dictionaries.
The word list is also not based on automatic scanning of available Hebrew documents (such as online newspapers), because there is no way to guarantee that such a list will be correct, complete, or consistent in its spelling standard.
Instead, our idea was to write programs which know how to correctly inflect Hebrew nouns and conjugate Hebrew verbs. The input to these programs is a list of noun stems and verb roots, plus hints needed for the correct inflection when these cannot be figured out automatically. Most of the effort that went into the Hspell project went into building these input files. Then, "word list generators" (written in Perl, and are also part of the Hspell project) create the complete inflected word list that will be used by the spellchecking program, hspell. This generation process is only done once, when building hspell from source.
These lists, before and after inflection, may be useful for much more than spellchecking. Morphological analysis (which hspell provides with the -l option) is one example. For more ideas, see Hspell project's Web site, at http://ivrix.org.il/projects/spell-checker.
Note that only these words exactly will be added - they are not inflected, and prefixes are not automatically allowed.
Hspell is free software, released under the GNU Affero General Public License (AGPL) version 3. Note that not only the programs in the distribution, but also the dictionary files and the generated word lists, are licensed under the AGPL. There is no warranty of any kind.
See the LICENSE file for more information and the exact license terms.
The latest version of this software can be found in http://hspell.ivrix.org.il/
Although we wrote all of Hspell's code ourselves, we are truly indebted to the old-style "open source" pioneers - people who wrote books instead of hiding their knowledge in proprietary software. For the correct noun inflections, Dr. Shaul Barkali's "The Complete Noun Book" has been a great help. Prof. Uzzi Ornan's booklet "Verb Conjugation in Flow Charts" has been instrumental in the implementation of verb conjugation, and Barkali's "The Complete Verb Book" was used too.
During our work we have extensively used a number of Hebrew dictionaries, including Even Shoshan, Millon Ha-hove and Rav-Milim, to ensure the correctness of certain words. Various Hebrew newspapers and books, both printed and online, were used for inspiration and for finding words we still do not recognize.
We wish to thank Cilla Tuviana and Dr. Zvi Har'El for their assistance with some grammatical questions.
Several other people helped us in various releases, with suggestions, fixes or patches - they are listed in the WHATSNEW file in the distribution.
For GUI-lovers, hspell's user interface is an abomination. However, as more and more applications learn to interface with hspell, and as Hspell's data becomes available in multi-lingual spellcheckers (such as aspell and hunspell), this will no longer be an issue. See http://hspell.ivrix.org.il/ for instructions on how to use Hspell in a variety of applications.
hspell's being limited to the ISO-8859-8 encoding, and not recognizing UTF-8 or even CP1255 (including niqqud), is an anachronism today.