Start Up Fusion Solr as Non-root User in CentOS 6.6

One of the basic things to do when installing and configuring Lucidworks Fusion 2.x on a single-server using the Solr instance included in the Fusion distribution is getting all Fusion services to start up on their own on a server reboot.

Lucidworks included the fundamentals for how to do this in their Fusion 2.1 User Guide, but the instructions assume you have Ubuntu Upstart Scripts at your fingertips, which are not out of the box in CentOS 6.6 or RH6.5 distros.

Fusion needs to be started as a non-root user, kinda like other web services like Tomcat. It’s got really simple start, stop, restart and status commands, but they are not based on *.sh files.

Here are the steps that worked for me.

Create init script

sudo nano /etc/init.d/fusion

Add commands to script

#!/bin/bash
# description: Fusion Startup
# processname: fusion
# chkconfig: 234 20 80
# by max.derungs@providence.org

FUSION_CMD=/opt/fusion/bin/fusion

# Source the function library for daemon.
. /etc/init.d/functions

# Summon the daemons.
case "$1" in
start)
    daemon --check fusion $FUSION_CMD start
;;
stop)
    daemon --check fusion $FUSION_CMD stop
;;
status)
    daemon --check fusion $FUSION_CMD status
;;
restart)
    daemon --check fusion $FUSION_CMD restart
;;
*)
echo $"Usage: $0 {start|stop|restart|status}"
esac
exit 0

Set permissions of and make executable

sudo chmod 755 /etc/init.d/fusion

Setup chkconfig utility to start service at boot time

sudo /sbin/chkconfig --add fusion
sudo /sbin/chkconfig --level 234 fusion on
sudo /sbin/chkconfig --list fusion

Test

sudo service fusion start

Semaphore 4.x ontology model not flushing properly when published

The first ontology I built/published called “Locations” was rough. So, I built/published a better one called “Organizations.” I deleted the first ontology from the Ontology Editor (OE), but my Search Application Framework (SAF) search dialog box seemed to continue autosuggesting old ontology terms found “in Locations”

Bad Search Autosuggest

I then realized I was unaware of a way to formally flush an old ontology model from the Classification Server (CS) or Semantic Enhancement Server (SES). So, I went to Smartlogic’s Portal and looked through the knowledge base articles related to the Semaphore Publisher. Although I did see a “purge” command line switch, the command applied only to Windows. There were no “drop” or “purge” switches available to the Publisher from the Linux command line.

Emails with Smartlogic suggested that publishing should automatically update CS and SES assuming the publisher configuration has not changed. But, “should” was the operative word here – the old stuff did not seem to go away when I published.

They then offered the idea that a change in the model name might leave some old rules lingering around. Well, I had not changed the name of the old model – I had just deleted it. But, if this idea applied to the renaming of Concept Schemes in model, then the issue may well have not stemmed from an old lingering model at all.

Maybe search suggestions were returning instances from the current Organizations model! I did have a Concept Scheme in the current model called “Locations” at one time with a few instances. After deleting those instances, I had renamed the Concept Scheme to “Facilities” and added a new set of instances.

Concept Schema

Both old and new places began showing up as suggestions in the search box thereafter. Oddly, the old ones were showing up after typing one or two characters, while the new ones required at least four characters before showing up.

I was publishing only the “Organization” ontology sure enough, but looking more closely in the editor, it was clear that the change in Concept Scheme from “Locations” to “Facilities” had changed only the concept’s label rather than the URI in the back-end triplestore (see Note below) …

Facilities Schema

… which was reflected in the SPARQL Editor as well:

SPARQL Editor with URIs

With this in mind, Smartlogic recommended I restart the Tomcat service supporting the Semaphore Workbench Ontology Editor. They had heard of Tomcat not flushing deleted terms properly. Indeed, that seemed to solve the SES autosuggest problem. The current schema appears as designed.

Good Search Autosuggest

Note: the URI remains intact, because the model is published as a linked data resource by default. Smartlogic has debated internally about this default, and allowing modelers to edit the URI to correct things when they change their minds might be a future enhancement. In the meantime, it “may” be possible to modify the URI using the Ontology Editor’s built-in SPARQL editor.

Using “Append Text to Query” in Search Core Results

After creating a document library with uniquely named columns, populating the columns with values, and waiting for Search to crawl the library, you set up a Display Group and Scope on the Site Collection Administration Search scopes management page (_layouts/viewscopes.aspx), then add a Search Box and Search Core Results web part to a page and configure them to use your new group and scope. The library documents are displayed, but hmmm… so are the library views and folders.

Naturally, SharePoint Search indexes and returns as much as it can, and it depends on you to filter out what you don’t want. If you can’t filter the results using Scope rules, then one alternative is to use the Search Core Results web part -> Search Core Results Action Links -> Results Query Options -> Append Text to Query field. You could simply add…

isDocument=Yes or IsDocument:1

… to filter out the folders. The difference between the “=” and “:” property operator is spelled out in this MSDN Property Restriction Keyword Queries article.

But, what if your library has a Link to a Document content type that refers to ASPX pages rather than DOC’s or PDF’s? This is where an understanding of the default metadata properties might come in handy. The Search Core Results web part Display Properties -> Fetched Properties lists the default metadata properties used by Search. See this Change how search results appear in the Search Core Results Web Part (Search Server 2010) Technet article for more details.

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<Columns>
		<Column Name="WorkId" />
		<Column Name="Rank" />
		<Column Name="Title" />
		<Column Name="Author" />
		<Column Name="Size" />
		<Column Name="Path" />
		<Column Name="Description" />
		<Column Name="Write" />
		<Column Name="SiteName" />
		<Column Name="CollapsingStatus" />
		<Column Name="HitHighlightedSummary" />
		<Column Name="HitHighlightedProperties" />
		<Column Name="ContentClass" />
		<Column Name="IsDocument"/>
		<Column Name="PictureThumbnailURL"/>
		<Column Name="PopularSocialTags"/>
		<Column Name="PictureWidth"/>
		<Column Name="PictureHeight"/>
		<Column Name="DatePictureTaken"/>
		<Column Name="ServerRedirectedURL"/>
	</Columns>
</root>

The trick here is to use ContentClass in the Append Text to Query field instead of IsDocument, as in …

ContentClass:STS_ListItem_DocumentLibrary

Combining multiple items in the Append Text to Query field can be accomplished with a space between each , like …

contentclass:STS_ListItem_DocumentLibrary Size<300000

… which will filter results to list items with a size property of less than 300KB.

To make library column available in the Search Core Results window, map it as a Managed Property in Search and then add it among the other Column Names in the Fetched Properties XML. For example, would fetch values from a crawled library column created with the ungainly name of “PHSPMGDocumentStatus” that let everyone know where it came from and let the System Administrators identify it uniquely among the other crawled Status columns in the system. Once mapped and crawled, you can filter on this column in the search scope or add it to the Append Text to Query field.

What’s more:

  • ModifiedBy:”userlogin”
  • Scopes and the Cross-Web Part Query ID
  • Aggregating Content
  • ContentClass:spspeople