Activate the PDF Preview for Your Results
Category: Crawling Pipeline
Learn to enable high-quality PDF previews for Office documents in Mindbreeze, transforming basic text extractions into visually engaging, fully formatted previews.
Exploring the Advantages of PDF Previews in Mindbreeze InSpire
To start, let's explore the advantages of PDF previews in Mindbreeze InSpire. To illustrate, let's examine a PDF document that has already been indexed. This document highlights the BMW 7 Series, a luxury sedan.
When we search for "BMW 7" in our setup and click preview, we're presented with a well-structured PDF preview. This preview includes powerful features such as navigating between different matching keywords. This is particularly useful for large PDFs like manuals or technical documents. For example, I can jump directly between all BMW 7 Series occurrences in the document.
This feature aligns the preview results with the query, even in the case of complex searches, ensuring the highlighted matches are clear and easy to navigate.
Activating PDF Previews
Next, I will show you how you can activate the PDF preview for more information. Let's look at a few examples.
Example: Microsoft Word Documents
When searching for a Microsoft Word document like the Mindbreeze Inspire Cheat Sheet, the preview provides a clear and readable text representation, ensuring quick access to the content.
However, with the enhanced PDF preview feature, the experience is elevated even further, offering beautifully formatted documents that retain their original structure and design.
Example: Excel Files
For an Excel file, such as statistics about the space economy, the standard preview delivers the essential data in plain text, allowing users to quickly locate the information they need.
With the new PDF preview, however, the document is transformed into a visually polished and well-organized layout, making it easier to interpret complex data at a glance.
Example: PowerPoint Presentations
The most impressive upgrade is for PowerPoint presentations. In the default preview, the slide's text is extracted and fully searchable, perfect for quick text-based queries.
But with the enhanced PDF preview, the slides are displayed in their original, beautifully formatted design, providing a seamless and user-friendly experience that combines functionality with aesthetics.
The PDF preview takes an already powerful tool and makes it even better, allowing users to access their documents in a way that is both efficient and visually engaging.
Enabling the PDF Content Filter
To enable these features, you can activate the OfficeDocumentToPDFContentFilter. Follow these steps:
1. Open the Filter Service
Open the Filter Service in the Mindbreeze Management Center. If you have multiple Filter Services, ensure you select the correct one.
2. Configure Supported File Extensions
Scroll down to the list of supported file extensions. Identify the extensions you want to enhance with PDF previews, such as Word documents, PowerPoint presentations, and spreadsheets. Activate the OfficeDocumentToPDFContentFilter for these extensions. This will replace the default Apache Tika content filter with the new filter for these file types.
3. Apply Changes and Restart
Apply the changes and restart the Filter Service to activate the new settings. Once the Filter Service is updated, it's crucial to re-index your existing documents. This ensures the new content filter is applied to previously indexed files, as the filter settings are part of the indexing pipeline.
4. Re-index Data Sources
Navigate to Services and re-index your data sources. This process may take some time, depending on your data volume. After the re-indexing is complete, your users can open the beautiful previews.
Unlocking the Full Potential
By enabling the OfficeDocumentToPDFContentFilter, you can significantly enhance the usability and appearance of document previews. This allows you to unlock the full potential of Mindbreeze for Microsoft Office content, providing seamless, high-quality PDF previews for your users.