OpenPipeline – an open-source document processing pipeline

Most commercial search engines include a more or less advanced document processing pipeline for transforming raw input into something that can be indexed. The process involves normalization, entity extraction, linguistic processing, annotation, data cleansing etc. When it comes to Open Source search engines, they start getting pretty good at the core of indexing and search,

Rana vs Wium Lie

Note: Links are to Norwegian sites. In a recnt post on Shahzad Rana’s (Microsoft’s most profiled OOXML promoter in Norway) blog, he comments on Håkon Wium Lie’s (Opera Software’s tech director and profiled standards promoter) wording in a comment to VG TV. Here, Lie introduces the term “Microsoft tax” to explain what happens when ordinary

Norweigan search portal Sesam.no releases middleware as GPL

In this blog post, Sesam annonces that their middleware architecture, Sesam Search Application Toolkit (SESAT) is released as open source software. This is the piece of software (written in Java) which sits between the portal (such as sesam.no) and the data sources (such as FAST ESP, Yahoo! or a database) and dispatches in parallel a

The state of open source search

Gnu logoOpen Source Software (OSS) and free software has been an alternative to commercial, licensed software for decades. Most known and successful are perhaps projects like GNU/Linux (licensed under the GNU General Public License, GPL), OpenOffice.org, Apache web server and MySQL. They have all managed to produce excellent, high-quality, stable software with an impressive wide-spread use. Other well known projects that are also Open Source are Java programming language, Norwegian TrollTech’s (now Nokia) Qt, Mozilla Firefox, Thunderbird, eZ Publish, and the list goes on.

For Search, there are a few players picking up speed that you should be aware of: