Blog

CogniBlog

Thoughts from the Cognidox world
Tags >> Enterprise Search

CMS Watch is a great site for learning about document management, web content management, enterprise content management, enterprise portals, web analytics, and enterprise search solutions. It's mandatory reading for those of us making products, and definitely worth the time of any buyer who needs to select a product in this area.

This week they published their predictions for what is likely to happen in 2010. They were also brave and smart enough to assess how well their 2009 predictions fared.

Selectively looking at some of their 2009 outcomes, they probably got it right that Open Source ECM made advances and that social computing diffused into the Enterprise. I think they got it more wrong than they admit on SharePoint being derailed by Office 14  - the interest continued unabated and the beta of SharePoint 2010 did it no harm either.

They predicted that SaaS vendors would expand offerings. There was plenty of activity in the SaaS sector but many buyers are still unconvinced about putting company documents on a third party server. A term of service from a SaaS provider looked at recently is not uncommon: "Once you cancel or terminate your registration with us, we will instantly delete data, content, text, documents, images and information from the service. After the cancellation of the service, all of your content will be lost for ever." Data lock-in as a variant on vendor lock-in, and just as dangerous. 


For 2010, CMS Watch makes some predictions that we can compare and assess in regard to CogniDox.

For example, "Enterprise Content Management and Document Management will go their separate ways" (#1) gets our agreement. There is a tension between providing good workflow tools for document management and the tools used for website publishing. Clearly they need to be compatible and work in unision at some point but doing both in one application isn't necessarily the best idea. Given that prediction, their "Document Services will become an integrated part of ECM" (#8) may seem to be a contradiction. Ignoring that, we'd agree that helping users to automate the production of documents will be helpful and important. We tend to describe this as "document assembly" which I think is a more specific and useful term for e.g. creating a new legal document using metadata to populate key fields.

We'd also agree that "Faceted search will pervade enterprise applications" (#2). They mention Lucene/Solr, which we support in CogniDox 8.0 onwards, but we'd also mention the Xapian open source search engine library and the Flax Search Service as a technology to watch. CMS Watch only mentions SharePoint in the context of "Digital Asset Management vendors will focus on SharePoint integration over geographic expansion" (#3) which may be overly specific - most ECM and WCM companies will consider offering a SharePoint connector. That doesn't however imply that SharePoint is any more than a workgroup Intranet builder with file sharing capability. Buying decisions may verge on the ridiculous - why pay $$ for SharePoint user licenses and then add $$$$ for a proprietary ECM add-on? But SharePoint 2010 will be a force, and we can anticipate a rush on products that see themselves as SharePoint 'killers'.

One thing that correlates with ECM suites and SharePoint is that they can bring hidden costs in hardware / server requirements and the IT admin effort inherent in their client/server architectures. For that reason we'd agree that "Enterprises will lead thick client backlash" (#6) and to a lesser extent that "Cloud alternatives will become pervasive" (#7). Some of the same issues faced in 2008/9 for SaaS still remain.

We'd strongly agree that "Gadgets and Widgets will sweep the Portal world" (#9). People have become accustomed to seeing them on the public web at sites like iGoogle and they will become the definitive user interface for the Intranet. The only question is what widgets are the most essential?

The prediction that "Mobile will come of age for Document Management and Enterprise Search" (#4) depends on what is meant by "coming of age". It will certainly be the case that this will grow from the current small base, but we think this is one for 2011 instead.

That leaves 4 other predictions we have not commented on, and so a quick visit to CMS Watch to read these for yourself is recommended.


We published a paper just two days ago on open source Enterprise Search tools such as Lucene/Solr and Xapian/Flax which basically asked whether these tools are now comparable for this purpose with the proprietary products from the likes of Autonomy and Microsoft FAST?

It's a very hot topic at the moment, and Matt Asay (VP Business Development, Alfresco) covers Lucene/Solr in particular in his CNET blog.

At the risk of simplification, the answer is more or less "yes", but the integration of these powerful tools can be held back by the fact that companies need to invest time to learn how. Then there is the issue of who do you call when you need support later on?

These problems are being addressed by companies whose business model is to provide those implementation, customization and technical support services. For Lucene/Solr, the leading name is Lucid Imagination based in San Mateo, California. One of their customers is Comcast Interactive Media, a division of the CableTV/ISP giant that specialises in online media. Their view is that Lucene/Solr has 80% of the features of rival proprietary search products (and they didn't need the other 20%).

For Xapian, the equivalent source of services and support is the Flax team. They are local to us in Cambridge (UK) and are very actively developing their Flax Search Service.

In June of this year, In-Q-Tel, the technology arm of the CIA, invested an undisclosed amount in Lucid Imagination. I guess that if ever an enterprise knew a thing about searching massively large datasets, it's the intelligence agencies! Both Lucene/Solr and Xapian/Flax are demonstrating that they are capable of scaling to more than 100 million documents.

The other problem with Enterprise Search engines is that it is hard to see the value until after you have integrated the service and can see the results on an actual document search. We're now in the final testing stages of our next release (v8.0.0) and are able to see that for ourselves. We've developed plug-ins for both search engines, and are building up a rich picture of the strengths of each.


Recently, we've been completely re-working the core CogniDox search functionality. When we initially wrote CogniDox, we were looking for a fast, open  source search engine we could easily tailor to our needs. As it turned out, we chose swish-e and it has served us well in the ensuing years.

Well, times change, and we now have to cater for multi-site, multi-cluster installations with large, multilingual document sets. While swish-e is simple to setup and use, it lacks several features (UTF-8 support and incremental indexing spring to mind) that limit its usefulness for us. As a result, we've pluginised our search functionality to allow a variety of search systems to be used. Swish-e will remain in the mix, but now we'll have Solr and Flax in there too. Moving on, we will be able to support a wider range of search systems, depending on customer requirements.

The Flax system has been interesting as their web service is still in alpha. However, as it looks so promising, we've been happy to submit the perl client code to the Flax project. Today, our submission landed in the tree. Time for a celebratory cuppa I think.


Recently, I got around to reading the AIIM "State of the ECM Industry 2009" report published at the end of March. It's based on a survey of 568 respondents, with a majority (52%) from large organizations (1000+ employees). There was a high percentage (21%) of people from the local and national Government sector, and there was a distinct US & Canada (61%) skew.

What struck me about the report was that it could be seen as a prescription for smaller companies and startups, along the lines "This is what big companies don't like about themselves, you have an opportunity to do it differently". After all, one of the good things about a new startup is that you don't have to make the same mistakes.

It's easy to say, harder to do, but in my experience the degree of "start as you mean to go on" that smaller companies can muster is a good way to keep the growth curve as accelerated and smooth as possible. When you have to switch-in a new process or workflow, it can be a massively disruptive force at a time when you can least afford disruption. Of course, cost is a factor - many a person has left behind an expensive ERP system at their former company, unable to replace it in their new startup. But with the growing availability of Free and Open Source software packages, that isn't the only factor any more.

So what ails the larger Enterprise in their Content Management, and how can SMEs adapt this as guidelines for a better strategy?

By inverting the AIIM ECM findings, I'd say they are:

(1) Start with a low-cost strategy and keep to it - never think that you will switch to an expensive solution at some point in the future. Make other requirements such as compliance and legal discovery a sub-set of the cost saving requirement.

(2) Treat e-mail usage as a dangerous drug - if you become over-dependent you will find it impossible and/or expensive to quit. If e-mail attachments become the main method for communication, it will get out of control more quickly than you'd prefer.

(3) Don't allow documents to be buried. It isn't enough that it was ready by the deadline and reviewed at the meeting. It also has to be just as easy to find in 6 months time.

(4) Beware multiple solutions. It's easier to let two teams 'go their own way', but then very hard to re-integrate them. Keep asking why does it need to be different and whether anything can be done to integrate them across the top.

(5) Don't abdicate business solution decisions to IT only (even when it is a part time role done by one of the engineers or founders). This is as important as all your other strategic decisions, and it will have a very significant impact on your company culture.

(6) Get support directly from vendors rather than consultants. They have up-to-date knowledge and it is in keeping with low-cost guideline #1.

(7) Enterprise content will become a large repository quicker than you think, so the earlier you start to think about Enterprise search, the better.

(8) Blogs and wikis are useful, but remember they are content too and will need to be included in your records management.

I'm hardly neutral on this topic, but I don't see or accept why any startup or small company cannot develop a short strategy paper that sets out their business solutions needs and priorities. If they can't afford Oracle or SAP using third party implementers at this stage, it is totally not a problem. In fact, guideline #1 says it would be better to find an open source alternative.



Company Blog Tags Enterprise Search