Start Up Fusion Solr as Non-root User in CentOS 6.6

One of the basic things to do when installing and configuring Lucidworks Fusion 2.x on a single-server using the Solr instance included in the Fusion distribution is getting all Fusion services to start up on their own on a server reboot.

Lucidworks included the fundamentals for how to do this in their Fusion 2.1 User Guide, but the instructions assume you have Ubuntu Upstart Scripts at your fingertips, which are not out of the box in CentOS 6.6 or RH6.5 distros.

Fusion needs to be started as a non-root user, kinda like other web services like Tomcat. It’s got really simple start, stop, restart and status commands, but they are not based on *.sh files.

Here are the steps that worked for me.

Create init script

sudo nano /etc/init.d/fusion

Add commands to script

#!/bin/bash
# description: Fusion Startup
# processname: fusion
# chkconfig: 234 20 80
# by max.derungs@providence.org

FUSION_CMD=/opt/fusion/bin/fusion

# Source the function library for daemon.
. /etc/init.d/functions

# Summon the daemons.
case "$1" in
start)
    daemon --check fusion $FUSION_CMD start
;;
stop)
    daemon --check fusion $FUSION_CMD stop
;;
status)
    daemon --check fusion $FUSION_CMD status
;;
restart)
    daemon --check fusion $FUSION_CMD restart
;;
*)
echo $"Usage: $0 {start|stop|restart|status}"
esac
exit 0

Set permissions of and make executable

sudo chmod 755 /etc/init.d/fusion

Setup chkconfig utility to start service at boot time

sudo /sbin/chkconfig --add fusion
sudo /sbin/chkconfig --level 234 fusion on
sudo /sbin/chkconfig --list fusion

Test

sudo service fusion start
Advertisements

Semaphore 4.x ontology model not flushing properly when published

The first ontology I built/published called “Locations” was rough. So, I built/published a better one called “Organizations.” I deleted the first ontology from the Ontology Editor (OE), but my Search Application Framework (SAF) search dialog box seemed to continue autosuggesting old ontology terms found “in Locations”

Bad Search Autosuggest

I then realized I was unaware of a way to formally flush an old ontology model from the Classification Server (CS) or Semantic Enhancement Server (SES). So, I went to Smartlogic’s Portal and looked through the knowledge base articles related to the Semaphore Publisher. Although I did see a “purge” command line switch, the command applied only to Windows. There were no “drop” or “purge” switches available to the Publisher from the Linux command line.

Emails with Smartlogic suggested that publishing should automatically update CS and SES assuming the publisher configuration has not changed. But, “should” was the operative word here – the old stuff did not seem to go away when I published.

They then offered the idea that a change in the model name might leave some old rules lingering around. Well, I had not changed the name of the old model – I had just deleted it. But, if this idea applied to the renaming of Concept Schemes in model, then the issue may well have not stemmed from an old lingering model at all.

Maybe search suggestions were returning instances from the current Organizations model! I did have a Concept Scheme in the current model called “Locations” at one time with a few instances. After deleting those instances, I had renamed the Concept Scheme to “Facilities” and added a new set of instances.

Concept Schema

Both old and new places began showing up as suggestions in the search box thereafter. Oddly, the old ones were showing up after typing one or two characters, while the new ones required at least four characters before showing up.

I was publishing only the “Organization” ontology sure enough, but looking more closely in the editor, it was clear that the change in Concept Scheme from “Locations” to “Facilities” had changed only the concept’s label rather than the URI in the back-end triplestore (see Note below) …

Facilities Schema

… which was reflected in the SPARQL Editor as well:

SPARQL Editor with URIs

With this in mind, Smartlogic recommended I restart the Tomcat service supporting the Semaphore Workbench Ontology Editor. They had heard of Tomcat not flushing deleted terms properly. Indeed, that seemed to solve the SES autosuggest problem. The current schema appears as designed.

Good Search Autosuggest

Note: the URI remains intact, because the model is published as a linked data resource by default. Smartlogic has debated internally about this default, and allowing modelers to edit the URI to correct things when they change their minds might be a future enhancement. In the meantime, it “may” be possible to modify the URI using the Ontology Editor’s built-in SPARQL editor.

Lucidworks Fusion 2.1 datasource connector to SQL returns only one document

Issue

Been working with Lucidworks Fusion 2.1 search engine platform lately. Really cool stuff; worthy of respect. One of the first steps in using it is to crawl content from a data source. It can crawl just about anything.

Trying to crawl a MS SQL table recently threw a queer error that took a bit more than usual to figure out. The settings were straightforward enough, and the job would look like it was finding content, but the job would return only one document (that’s “row” in db-speak).

Analysis

It wasn’t anything to do with the JDBC driver, permissions or the SQL select statement which had a WHERE clause. It was an issue with one of the fields. By testing the query one line at a time, all string data types returned all records, but one number field consistently caused the job to fail.

The solr.log explained the issue:

2015-12-09T11:50:40,786 – ERROR [qtp355629945-19:SolrException@139] – {collection=dg, core=dg_shard1_replica1, node_name=10.246.71.13:8983_solr, replica=core_node1, shard=shard1} – org.apache.solr.common.SolrException: ERROR: [doc=a596f774-7289-4d03-802b-e1ca4945c9bc] Error adding field ‘NPINumber’=’1.174598122E9’ msg=For input string: “1.174598122E9″

It was a float field in the data source, so that’s what Fusion would try to return and fail. Why was the number a float? Who knows.

Solution

Changing it to a “int” fixed the problem. That’s really what the field was anyhow – that or a string in this case.

Fusion datasource job status

Fusion datasource job status

Workaround – Display Tableau dashboard in SharePoint web part

When the business needs to view Tableau dashboards or visualizations in SharePoint 2010, the most complete solution is to install the Tableau Server Service Application into the SharePoint farm. But what if it is not installed or supported on the farm?

The most obvious workaround is to add a Page Viewer web part to a SharePoint page which points to the URL of the Tableau dashboard page. This method shows the entire Tableau page – navigation header, footer and everything else in-between. But who really wants a page in a page, unless it’s the only option?

To display just the dashboard itself on a SharePoint page, use a Content Editor Web Part. Click the “Share” hyperlink on the Tableau dashboard page, and a dialog is displayed with Email and Embed code strings.

tableaupage

The Embed string in the dialog contains object and param tags:

tableaupage-codesnip

Just copy and paste them into the HTML Source pane of the Content Editor Web Part.

tableaupage-contenteditor

tableaupage-htmleditor

Finish configuring the web part settings and publish the page to view the Tableau dashboard. Depending on network, server and dashboard conditions, the report may take a few seconds to load at times, but it works.

After publishing the page, it is possible to replace the “Embed” code in the HTML Source Editor code, but SharePoint 2010 makes it a bit tricky. In my case, it took clicking on the content area and hitting the delete key a couple of times to remove a veneer of sorts before the “Edit HTML Source” button would apply to the correct web part on the page. If that fails to satisfy, just create a new Content Editor Web Part to replace the previous one.

Remove HTML formatting from SharePoint Rich Text Editor content with JavaScript

Compared with Microsoft Word, SharePoint’s out of the box Rich Text editor seems like a tool from the dark ages. These days, anyone can use Microsoft Word to format content better than the US Mint could 20 years ago, so when it comes to publishing content in SharePoint, most of us follow the path of least resistance and just paste Word content into the SharePoint editor.

This habit, however, can cause problems when there is a need to re-purpose Rich Text and Enhanced Rich Text content elsewhere in the organization. When Engineering, for example, wants the content without all of Marketing’s 30pt Bodoni Bold fluff, we need a way to dispose of the Word HTML/XML formatting.

Have you ever looked at Word’s HTML? It has a beauty all its own. But good luck at editing it with anything other than the Word client application. Stripping the HTML with a blissfully simple “jQuery(html).text();” JavaScript won’t do in many cases, because it wipes out essential spaces and line returns as well. Regular Expressions will do the job however, but there’s a need for more than one or two.

Fortunately, some good people at 1st Class Media in Edinburgh posted a respectable set of expressions back in 2007 – http://www.1stclassmedia.co.uk/developers/clean-ms-word-formatting.php

function CleanWordHTML( str )
{
	str = str.replace(/<o:p>\s*<\/o:p>/g, "") ;
	str = str.replace(/<o:p>.*?<\/o:p>/g, "&nbsp;") ;
	str = str.replace( /\s*mso-[^:]+:[^;"]+;?/gi, "" ) ;
	str = str.replace( /\s*MARGIN: 0cm 0cm 0pt\s*;/gi, "" ) ;
	str = str.replace( /\s*MARGIN: 0cm 0cm 0pt\s*"/gi, "\"" ) ;
	str = str.replace( /\s*TEXT-INDENT: 0cm\s*;/gi, "" ) ;
	str = str.replace( /\s*TEXT-INDENT: 0cm\s*"/gi, "\"" ) ;
	str = str.replace( /\s*TEXT-ALIGN: [^\s;]+;?"/gi, "\"" ) ;
	str = str.replace( /\s*PAGE-BREAK-BEFORE: [^\s;]+;?"/gi, "\"" ) ;
	str = str.replace( /\s*FONT-VARIANT: [^\s;]+;?"/gi, "\"" ) ;
	str = str.replace( /\s*tab-stops:[^;"]*;?/gi, "" ) ;
	str = str.replace( /\s*tab-stops:[^"]*/gi, "" ) ;
	str = str.replace( /\s*face="[^"]*"/gi, "" ) ;
	str = str.replace( /\s*face=[^ >]*/gi, "" ) ;
	str = str.replace( /\s*FONT-FAMILY:[^;"]*;?/gi, "" ) ;
	str = str.replace(/<(\w[^>]*) class=([^ |>]*)([^>]*)/gi, "<$1$3") ;
	str = str.replace( /<(\w[^>]*) style="([^\"]*)"([^>]*)/gi, "<$1$3" ) ;
	str = str.replace( /\s*style="\s*"/gi, '' ) ; 
	str = str.replace( /<SPAN\s*[^>]*>\s*&nbsp;\s*<\/SPAN>/gi, '&nbsp;' ) ; 
	str = str.replace( /<SPAN\s*[^>]*><\/SPAN>/gi, '' ) ; 
	str = str.replace(/<(\w[^>]*) lang=([^ |>]*)([^>]*)/gi, "<$1$3") ; 
	str = str.replace( /<SPAN\s*>(.*?)<\/SPAN>/gi, '$1' ) ; 
	str = str.replace( /<FONT\s*>(.*?)<\/FONT>/gi, '$1' ) ;
	str = str.replace(/<\\?\?xml[^>]*>/gi, "") ; 
	str = str.replace(/<\/?\w+:[^>]*>/gi, "") ; 
	str = str.replace( /<H\d>\s*<\/H\d>/gi, '' ) ;
	str = str.replace( /<H1([^>]*)>/gi, '' ) ;
	str = str.replace( /<H2([^>]*)>/gi, '' ) ;
	str = str.replace( /<H3([^>]*)>/gi, '' ) ;
	str = str.replace( /<H4([^>]*)>/gi, '' ) ;
	str = str.replace( /<H5([^>]*)>/gi, '' ) ;
	str = str.replace( /<H6([^>]*)>/gi, '' ) ;
	str = str.replace( /<\/H\d>/gi, '<br>' ) ; //remove this to take out breaks where Heading tags were 
	str = str.replace( /<(U|I|STRIKE)>&nbsp;<\/\1>/g, '&nbsp;' ) ;
	str = str.replace( /<(B|b)>&nbsp;<\/\b|B>/g, '' ) ;
	str = str.replace( /<([^\s>]+)[^>]*>\s*<\/\1>/g, '' ) ;
	str = str.replace( /<([^\s>]+)[^>]*>\s*<\/\1>/g, '' ) ;
	str = str.replace( /<([^\s>]+)[^>]*>\s*<\/\1>/g, '' ) ;
	//some RegEx code for the picky browsers
	var re = new RegExp("(<P)([^>]*>.*?)(<\/P>)","gi") ;
	str = str.replace( re, "<div$2</div>" ) ;
	var re2 = new RegExp("(<font|<FONT)([^*>]*>.*?)(<\/FONT>|<\/font>)","gi") ; 
	str = str.replace( re2, "<div$2</div>") ;
	str = str.replace( /size|SIZE = ([\d]{1})/g, '' ) ;
	return str ;
}

Using “Append Text to Query” in Search Core Results

After creating a document library with uniquely named columns, populating the columns with values, and waiting for Search to crawl the library, you set up a Display Group and Scope on the Site Collection Administration Search scopes management page (_layouts/viewscopes.aspx), then add a Search Box and Search Core Results web part to a page and configure them to use your new group and scope. The library documents are displayed, but hmmm… so are the library views and folders.

Naturally, SharePoint Search indexes and returns as much as it can, and it depends on you to filter out what you don’t want. If you can’t filter the results using Scope rules, then one alternative is to use the Search Core Results web part -> Search Core Results Action Links -> Results Query Options -> Append Text to Query field. You could simply add…

isDocument=Yes or IsDocument:1

… to filter out the folders. The difference between the “=” and “:” property operator is spelled out in this MSDN Property Restriction Keyword Queries article.

But, what if your library has a Link to a Document content type that refers to ASPX pages rather than DOC’s or PDF’s? This is where an understanding of the default metadata properties might come in handy. The Search Core Results web part Display Properties -> Fetched Properties lists the default metadata properties used by Search. See this Change how search results appear in the Search Core Results Web Part (Search Server 2010) Technet article for more details.

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<Columns>
		<Column Name="WorkId" />
		<Column Name="Rank" />
		<Column Name="Title" />
		<Column Name="Author" />
		<Column Name="Size" />
		<Column Name="Path" />
		<Column Name="Description" />
		<Column Name="Write" />
		<Column Name="SiteName" />
		<Column Name="CollapsingStatus" />
		<Column Name="HitHighlightedSummary" />
		<Column Name="HitHighlightedProperties" />
		<Column Name="ContentClass" />
		<Column Name="IsDocument"/>
		<Column Name="PictureThumbnailURL"/>
		<Column Name="PopularSocialTags"/>
		<Column Name="PictureWidth"/>
		<Column Name="PictureHeight"/>
		<Column Name="DatePictureTaken"/>
		<Column Name="ServerRedirectedURL"/>
	</Columns>
</root>

The trick here is to use ContentClass in the Append Text to Query field instead of IsDocument, as in …

ContentClass:STS_ListItem_DocumentLibrary

Combining multiple items in the Append Text to Query field can be accomplished with a space between each , like …

contentclass:STS_ListItem_DocumentLibrary Size<300000

… which will filter results to list items with a size property of less than 300KB.

To make library column available in the Search Core Results window, map it as a Managed Property in Search and then add it among the other Column Names in the Fetched Properties XML. For example, would fetch values from a crawled library column created with the ungainly name of “PHSPMGDocumentStatus” that let everyone know where it came from and let the System Administrators identify it uniquely among the other crawled Status columns in the system. Once mapped and crawled, you can filter on this column in the search scope or add it to the Append Text to Query field.

What’s more:

  • ModifiedBy:”userlogin”
  • Scopes and the Cross-Web Part Query ID
  • Aggregating Content
  • ContentClass:spspeople
  • Open SharePoint List/Library Link in New Window

    Why can’t a SharePoint list/library web part open links in a new window by default?

    Right… maybe someday.

    There are several workarounds, but here’s one of the “easy” ones that seems to make sense to site owners. It adds some Javascript in an XML Web Part or Form Web Part that’s placed below the list/library web part (maybe, even at the bottom-most zone on the page, no?). Why not use a Content Editor Web Part for the script? Well, it was a fine place to put the script in SP2007, but to survive a migration from SP2007 to SP2010, the XML Web Part seems to be the way to go for small scripts like this, and if already in SP2010, the Form Web Part should do the trick.

    The code looks for the name of the list/library on page load and adds the “target” attribute to the list web part’s table tag. So, say the list name in the current site is called “Links.” The code would look for that list name in the page source by it’s Summary attribute:

    <script language="javascript" type="text/javascript">
    	var tbl = document.getElementsByTagName('table');
    		for(var i = 0; i < tbl.length; i++)
    		{
    		if(tbl[i].getAttribute("Summary") == "Links")
    		{
    			var anc = tbl[i].getElementsByTagName('a');
    			for(var j = 0; j < anc.length; j++)
    			{
    				anc[j].setAttribute('target', '_blank');
    			}
    		}
    	}
    </script>
    

    For more than one list/library on the page, the code is a bit different…

    <script language="javascript" type="text/javascript">
    	var tbl = document.getElementsByTagName('table');
    	for(var i = 0; i < tbl.length; i++)
    	{
    	if(tbl[i].getAttribute("Summary") == "Links1" 
    	|| tbl[i].getAttribute("Summary") == "Links2"
    	|| tbl[i].getAttribute("Summary") == "Links3"
    	)
    		{
    		var anc = tbl[i].getElementsByTagName('a');
    		for(var j = 0; j < anc.length; j++)
    			{
    			anc[j].setAttribute('target', '_blank');
    			}
    		}
    	}
    </script>