Main Menu

Contact Us

Earn Money
Earn money online, For lifetime Hashdot membership and for Advertisement details..
Click Here

Login




 


 Log in Problems?
 New User? Sign Up!

Posted by : trraju on Nov 29, 2003 - 05:18 AM General
Mike Lynch, chief executive of software supplier Autonomy, came to prominence when he was labelled 'the UK's first internet billionaire' during the dot com boom. He may be worth a few bob less these days, but the company has remained profitable - thanks partly to a 98 per cent gross margin on sales of its software.

Autonomy's technology uses advanced mathematical techniques for pattern recognition to analyse the vast amounts of unstructured data passing through organisations. Its products can find linked information from sources such as documents, emails, telephone calls and even videos. Many major software vendors, including BEA and Computer Associates, are embedding Autonomy technology into their products.

Computing talked exclusively to Lynch about the challenges for users managing the rapid growth in unstructured data.

Most companies have enormous databases of information - why do they need to worry about data being structured or unstructured?

The fact is that 80 per cent of information in a company is unstructured - human friendly stuff like email, news articles, phone calls. The IT industry is built around the other 20 per cent, around structured stuff like databases. Customers know that too. Software companies realise the next generation of products have to handle unstructured information in the same way as structured. It's a generic problem in every sector of software, but there's a nice return on investment because you're generally replacing something done by human beings.

You're a former academic, with a Ph.D from Cambridge University - where did the idea come from to create a software business?

Being a bit of a rebel was quite important - people have been trying to do what we do for a long time. When you look at what people do inside a company, they are doing a lot of tasks where it would be really useful to automate or help them do it. The problem is - can you do it?

Academics viewed unstructured information as a linguistic challenge, which sounds reasonable at first. But it's only when you work on the problem you realise linguistics doesn't apply the same way in the real world. If I said to you - 'the dog walked into the room it was furry,' you have to understand the use of the word 'it'. The rules of language will get it wrong. You will do it OK because you have experience of dogs and rooms and you probably know the dog is more likely to be furry than the room.

When you have a mathematical approach, it can start to realise that when you're talking about David Beckham you're also talking about football a lot. You can make those relationships, so it can move with the real world.

It's vital to have that kind of approach.




So what are the applications of this in business?

Say you have an email address which receives customer support queries. You'd need to have people read all those emails to decide which part of the company to deal with them, but the technology can do it for you.

At the BBC, when a picture researcher wants to find something where Tony Blair was talking about Iraq six months ago, they could send the office junior down to get the tape and watch it for two hours to find out what he said, or you can get the technology to take you straight to it. If you're in intelligence gathering - say a coastguard stops a boat and finds something interesting, and has a hunch. Do they go in every day and start asking questions of the system, or do you just say - if anything comes up like this let me know.

What form does information come in from customers? I've yet to meet a customer who sends in a database. They send an email or ring you up - that's unstructured information.

What is the future potential for your technology?

I think every piece of enterprise software in a few years time will have to handle unstructured information, and most of them don't at the moment.

The problem with structured data is that it has to arise from a conversion process - something happens in the real world which is unstructured, then a process makes it structured, such as a bank worker typing in transactions. The amount of information captured is going up. You can't afford to keep up with the structuring process, so you have to deal with it as unstructured.

The problem you have is the value of information is very high, but it's surrounded by so much stuff that's not valuable to the particular issue you're looking at. It all comes down to having technology that understands the ability to sift things out.

These days you're dealing with a hundred emails a day, when travelling you're on the mobile phone, you work closely with people in New York and San Francisco. Technology has allowed the connections between people to greatly increase.

What technology needs to do is connect people by saying - you wrote an article two months ago about this thing I'm trying to do over here. As soon as you make that connection you can leverage all the work that person has done without having to redo it. It's going to become more and more important, because unstructured means flexible. Unstructured information will become the lifeblood of the IT industry.

How does that work in practice?

The IT industry always frames problems by what it can solve, rather than what the problem is. You have applications and data and they are separate from everything else. You may have a Lotus email server, and an Oracle marketing database. But the answer to a problem comes from all of those. If you're Ford, and you see something on your service system about tyres exploding, maybe you've got stuff from supply chain management noticing there are defects in the tyre. The idea that those are central systems is dead. You need to get an answer from all of those.

What you need is a layer that plumbs into every repository once, it understands what the information is about and where it came from. The IT industry talks about integrating systems - but that just means you can get the bits from here to there. What will be fundamental in the next two years will be integration through understanding. This email about problems with tyres links to this report in the knowledge management system which is linked to the PR system and linked to the legal database. Those are not linked because they can exchange data - they link because the system understands all those pieces of information are about the same thing, and the only way it can do that is by reading them. It's a really fundamental change.

The value of companies now is in what they know, not their assets.

To see more of VNUNet go to http://www.vnunet.com
Time to develop a strategy for your unstructured data | Log-in or register a new user account | 0 Comments
Comments are statements made by the person that posted them.
They do not necessarily represent the opinions of the site editor.
 
Web Hosting Articles and Forum web hosting directory with top 10 web hosts Channel partners : Web Hosting

© 2008 Hashdot.com