Microsoft Windows search is not fast, and it also does not give us good search results. So i thought about writing my own Search Engine for the Desktop. It should crawl the filesystem, extract the content and metadata and finally should give the same results as Google.
I also wanted to test some new technologies like JavaFX with embedded HTML5, Apache Lucene as a fulltext search engine, Apache Tika as the content extraction framework and other stuff. But before we dive deep into internals, lets take a look at the frontend:
JavaFXDesktopSearch also comes with a visualization of the current full text index. It provides a clickable Sunburst diagram for this purpose. Basically it looks as follows:
Under the hood it uses d3js.org to visualize the Lucene index. Quite nice and fast, just try it out. The project is hosted at github.com/mirkosertic/FXDesktopSearch. FXDesktopSearch is deployed by JavaFX based native installers. The original version was deployed by WebStart, but WebStart support was dropped due to Oracles changes on security policies. Now JavaFXDesktopSearch can be installed by using native installers, and the right Java runtime is also bundled. Checkout the released at Google Drive.
Of course i want to say thank you to In-SideFX for the cool Undecorator tool, which can be found here.
I use a multithreaded pipes and filters architecture for file indexing. The FileSystemCrawler searches for files and puts them on the ContentExtractionQueue. The ContextExtractor takes entries from the ContentExtractionQueue, extracts the content and metadata with Apache Tika and puts the content on the IndexWriterQueue. The LuceneIndexHandler takes content from the IndexWriterQueue and updates the Apache Lucene fulltext index.
The JavaFX/HTML5 hybrid is a very powerful thing. It enables us to create cool user interfaces with full support of the whole Java stack using the described Gateway approach. Also, the HTML application could be deployed standalone without Desktop interaction, for instance to support mobile devices like tablets or smartphones.