Basic Indexing

This section describes how to send an object from a data source to Mindbreeze. You will learn how to implement a custom crawler and what is needed to send objects to an index.


Send objects to Mindbreeze

In order to be able to search, objects first have to lie in the index. This chapter explains how you can send objects from your data source to Mindbreeze. It’s very easy to make an object searchable – the following lines are all you need to set up an object with the title “title” and the key “1” in the index:

Indexable indexable = new Indexable();
indexable.setKey("1");
indexable.setTitle("title");
client.filterAndIndex(indexable);

There are a few things to clarify around these lines. First and foremost you have to think about what documents from your data source are relevant for the search.

What objects are there in my data source?

When you want to add a new data source to the search, you should always consider what content is interesting for the users.
This example uses CMIS services as a data source. CMIS offers four different types of objects: folders, documents, relationships and policies. Here only documents are sent.

How are objects sent? Which process takes care of it?

Mindbreeze uses crawlers to send objects to the index. A crawler knows the data source and sends the contained objects for indexing. There’s a crawler for each data source type. For Mindbreeze InSpire there are, among others, a Microsoft Exchange crawler and a Microsoft SharePoint crawler. We offer the same plug-in interface that we use for our crawlers in our SDK.
As a first step you should package the example crawlers as a plug-in and load it onto your appliance. Right click on the file build.xml and select Run As > Ant Build.

Run as \

In the directory build the plug-in archive cmis-datasource.zip is set up.
Now the plug-in needs to be saved in the appliance. To do this open the Mindbreeze configuration user interface and go to the tab Plug-ins. Select the Zip file and confirm with “Upload”. The plug-in is now installed.

Plugins

Now set up an index and add a new data source.

Add new datasource
Scroll to Navigation

Send an object

The most important class for the CMIS crawler is mindbreeze.example.cmis.crawler.CmisCrawler. This implements the plug-in interface for crawler com.mindbreeze.enterprisesearch.mesapi.crawler.Crawler:

public interface Crawler {
	public void init (Configuration configuration);
	
	public void performCrawlRun(FilterAndIndexClient client) throws Exception;
	
	public void shutdown();
}

The init method serves as the preparation of the crawler for the crawler runs. Here databases, for example, can be built up. The counterpart is the shutdown method – this is invoked before the ending of the crawler.
performCrawlRun is invoked for every crawl run. The objects are indexed by the data source. The call

client.filterAndIndex(indexable);

sends an object.

The Indexable object collects the relevant data for a transmitted object. The following minimal example prepares an object for the indexing:

Indexable indexable = new Indexable();
indexable.setKey("1");
indexable.setTitle("my title");

The key sets the key for the object in the data source. The title sets the title of the object.
You can now already search in the Mindbreeze web client for the indexed document.

Important: When defining the key you should be aware that the crawler can be configured for multiple data sources of the type. This example can index objects from CMIS data sources. Multiple crawlers can be configured, for example for Fabasoft Folio Cloud Austria and Fabasoft Folio Cloud Germany. So that an object can be uniquely identified across multiple data sources, the key in the index consists of 3 parts. The type of the data source (category), the data source (category instance) and the key. The category is set for the plug-in. The category instance can be configured in the configuration user interface:

Data Sources - Category Instance

For the example you can use the standard settings. However, you have to be aware that when creating a further data source a different category instance can be configured, since otherwise objects from e.g. Fabasoft Folio Cloud Austria will overwrite objects from Fabasoft Folio Cloud Germany.

Displayed date and change date

In order to set the displayed date for the indexed object use setDate.

indexable.setDate(cmisDocument.getCreationDate());

This is also the date that is used if you sort according to date.
In addition to the displayed date there is also the change date of the document. This date is used to determine whether a document has been changed or not.

indexable.setModificationDate(cmisDocument.getLastModificationDate());

All data values must be converted into UTC before being sent. Only this ensures that the values are correctly displayed and that there are no problems with time zones or summer/winter times.

Scroll to Navigation

Send contents and files

In the previous section a single object was indexed with title and key. Mindbreeze offers a very easy possibility for the indexing of an object’s content. You can send existing files directly and let the Mindbreeze Filter Service extract all the content from the document. The methods setContent and setExtension are important for this. With setContent you transmit the bytes of the content. With setExtension you define the file type of the content. If you enter .doc here, for example, and the word document data contains content then the filter automatically extracts the content and additional information such as the author.
But you can also simply send a string as content.

indexable.setContent("content".getBytes("utf-8"));
indexable.setExtension("txt");

To sum up, an object’s content is determined with the properties content and extension. The content’s data is transmitted in content. The extension txt means that the content is viewed as a text document.

Scroll to Navigation

Index all CMIS documents

The example project runs through all available folders and invokes the method buildIndexable for each document. An indexable object is then created with the desired values.

private Indexable buildIndexable(Document cmisDocument) throws IOException {
Indexable indexable = new Indexable();
	indexable.setKey(cmisDocument.getId());
	indexable.setTitle(cmisDocument.getName());
	indexable.setDate(cmisDocument.getCreationDate());
	indexable.setModificationDate(cmisDocument.getLastModificationDate());

	byte[] content = getContentBytes(cmisDocument.getContentStream());
	indexable.setContent(content);
	String extension = getExtension(cmisDocument);
	indexable.setExtension(extension);
	
	return indexable;
}

The search in the Mindbreeze Web Client now delivers all the demo user’s documents in Fabasoft Folio Cloud.

Scroll to Navigation

Index and search properties

The example currently indexes the title, content and date. But Mindbreeze offers the possibility to save as many properties per object as you want. Properties are specified with indexable.putProperty.

indexable.putProperty(NamedValue.newBuilder()
	.setName("name")
.addValue(ValueHelper.newBuilder("value"))
);

In order to map a list value you simply need to invoke addValue multiple times:

indexable.putProperty(NamedValue.newBuilder()
	.setName("name")
.addValue(ValueHelper.newBuilder("value"))
.addValue(ValueHelper.newBuilder("value"))
);

The added properties are already contained in the index and can also be searched through. You can, for example, carry out a search for name:value and receive only those objects that have value set as the name.

Display of properties in the Mindbreeze client

So that the properties are also displayed in the Mindbreeze client, they need to be entered in the CategoryDescriptor. Each property that should be displayed must be entered as metadatum. You can specify the translation for the term via name elements.

<metadatum id="name">
  <name xml:lang="de">Name de</name>
  <name xml:lang="en">Name en</name>
</metadatum>

After the CategoryDescriptor is changed the plug-in must be newly packaged and loaded.

Scroll to Navigation

Display of properties with SearchService

The SearchService delivers the values that are also displayed in the client service. Therefore you can also adapt and customise the CategoryDescriptor as described in section "Index an search properties". But you can also request properties for the search with addRequestProperty.

QueryExpr query = QueryExpr.newBuilder().setKind(QueryExpr.Kind.EXPR_UNPARSED).setUnparsedExpr(userQuery).build();
SearchRequest searchRequest = SearchRequest.newBuilder()
	.setRankingStrategy(RankingStrategy.RANK)
	.setDetailLimit(DetailLimit.CONTENT)
	.setUserQuery(query)
	.addRequestedProperty(PropertyDefinition.newBuilder().setName("title"))
	.addView(View.newBuilder()
		.setId(View)
		.setCount(10)
		.setRankingStrategy(mindbreeze.query.ViewProtos.View.RankingStrategy.RELEVANCE)
	)
.build();

You receive an identical value object to what you specified when indexing.
Instead of a simple Unparsed-QueryExpr you can also use a Labeled-QueryExpr. This allows you to define the name of the property separately to the search term.

QueryExpr query = QueryExpr.newBuilder() .setKind(QueryExpr.Kind.EXPR_LABELED) .setNamedExpr(Labeled.newBuilder() .setLabel("name") .setExpr(QueryExpr.newBuilder() .setKind(QueryExpr.Kind.EXPR_UNPARSED) .setUnparsedExpr("value") ) ).build();

This can be useful if, for example, there is an entry field for a property. Then the property name is fixed and the user entry is used as the search term.

Scroll to Navigation

Data types

The previous examples show how string values can be added. The Mindbreeze InSpire SDK automatically finds figures in such properties. You can, for example, set the property version to 12 and Mindbreeze interprets the value. You can then e.g. search for version: 12 TO 24. This function also enables the use of version with the value “Version 12”. The restriction expression can still remain the same.
Figures can also be specified as a numerical value. This allows the section search to be accelerated and you receive a numerical value if you use the SearchService. With the following example you can index entire figures:

indexable.putProperty(NamedValue.newBuilder()
	.setName("version")
	.addValue(
		Value.newBuilder()
			.setKind(Value.Kind.INTEGER)
			.setIntegerValue(12)
		)
);

So that the quantity values can be searched through, the attribute regexmatchable must be set to true in the CategoryDescriptor.

Version

The search is possible with version:[12] or version:[10 TO 20].
In addition to simple numerical values, date values are also possible:

indexable.putProperty(NamedValue.newBuilder()
.setName("mydate")
	.addValue(
		Value.newBuilder()
			.setKind(Value.Kind.QUANTITY)
			.setQuantityValue(Quantity.newBuilder()
				.setKind(Quantity.Kind.TIME)
				.setUnit(Unit.MS_SINCE_01_01_1970_00_00)
				.setIntValue(Indexable.getCalendarFromDate(new Date()).getTime().getTime())
			)
		)
);

The search mydate:[2003-01-01 TO 2023-01-01] restricts the results to between 2003 und 2023.
The search documentsize limits the size of the searched document:

indexable.putProperty(NamedValue.newBuilder()
	.setName("availablespace")
	.addValue(
		Value.newBuilder()
			.setKind(Value.Kind.QUANTITY)
			.setQuantityValue(Quantity.newBuilder()
				.setKind(Quantity.Kind.INFORMATION_STORAGE)
				.setUnit(Unit.BYTE)
				.setIntValue(1024 * 1024)
			)
	)
);
Scroll to Navigation

Facets

Facets allow the user to easily refine the search results using the content of the index.

Facets

So that indexed properties can be used as facets, the attribute aggregatable must be set to true for the property in the CategoryDescriptor.
Important: Only string values are currently available to facets.
Facets are automatically displayed in the Mindbreeze client but can be deactivated with the option aggregations.

Aggregations

In the search service the same facets are available as in the client service. But you can also request any facets you want with addAggregation for a View.

…
View.newBuilder()
	.setId("View")
	.setCount(10)
	.addAggregation("name")
…

The facets are contained as a list of aggregation objects in the ResultSet.

<% 
for (Aggregation aggregation : resultSet.getAggregationList()) {				
		if (aggregation.getEntryCount() == 0) continue;
%>
<div class="facet">
		<div class="facetName">
<b><%="mes:date".equals(aggregation.getName())? "Datum" : aggregation.getName()%></b>
</div>
		<div class="facetEntries">
<% 
			for (Aggregation.Entry entry : aggregation.getEntryList()) {
				byte[] queryExpr = entry.getQueryExpr().toByteArray();
				String facetQuery = Base64.encodeBase64URLSafeString(queryExpr);
%>
		<div><a href="<%=Nav.self(request, "start", "0")%>&facet=<%=facetQuery%>"><%= entry.getName() %> (<%= entry.getCount() %>)</a></div>	
<% 	
			} 
%>
</div>
</div>
<% 
} 
%>

Every characteristic of the facets contains the value of the property, the number of the objects and a QueryExpr object for the restriction of the documents. You can use this QueryExpr object when searching to restrict the search results.

QueryExpr.And.Builder constraints = QueryExpr.And.newBuilder();
String[] facets = request.getParameterValues("facet");
if (facets != null) { 
	for (String facet : facets) {
		if (facet.isEmpty()) continue;
		QueryExpr queryExpr = QueryExpr.newBuilder().mergeFrom(Base64.decodeBase64(facet)).build();
		constraints.addExpr(queryExpr);		
	}
}
if (constraints.getExprCount() > 0) {
	searchRequest.setQueryConstraints(QueryExpr.newBuilder()
		.setKind(QueryExpr.Kind.EXPR_AND)
		.setAndExpr(constraints)
	);
}
Scroll to Navigation

Basic Searching

Embed search in your application.
 

Go to the tutorial

Authorization

Restrict search results to certain users and groups.

Go to the tutorial

Download

Access the Mindbreeze InSpire SDK download area.


*
*
*