Software Links Text And Images For Categorisation

A Xerox product story
Edited by the Printingtalk editorial team Oct 11, 2007

Software that can link text and general images together when categorising online and paper-based information has been developed by Xerox researchers.

Software that can link text and general images together when categorising online and paper-based information has been developed by Xerox researchers.

Xerox said that by linking image and text-based content, its new software improves document management tasks, such as retrieving information from a database, or automatically routing documents.

That is claimed to realise more complete searches and streamlined business processes.

The software remains under development, said Xerox, which has filed a number of patents on the technology.

The company believes that, until now, the traditionally used tools classify or tag either text, or images so they can be processed but the two elements had never before been effectively combined.

Marco Bressan, a computer scientist who led the research team at Xerox Research Centre Europe, said that for example, if a brochure from an isolated hotel in the French Alps describes the hotel's features and includes maps and pictures of mountainous surroundings, the categoriser will automatically discover the content and link the text and the images together.

Then someone searching for an isolated mountain lodge within a certain price range would retrieve the brochure even if an 'isolated lodge in the mountains' were never mentioned in the text.

Xerox added that the research is compatible with its goal of developing 'smarter' documents to make information-based work easier, more efficient and more effective.

Bressan believes there are many uses for the new categorisation software and commented: "Because the Xerox categoriser handles both text and visuals, it can identify the images, automatically match them to the written text and then enrich the visuals with additional information via hyperlinks to a knowledge base, such as Wikipedia." A second application, according to Bressan, could be at Xerox's imaging centres, where the company scans and digitises documents to create secure, accessible and searchable online information archives for its customers.

Currently the process of scanning, labelling and indexing documents is partially supervised by operators but hybrid categorisation can streamline document management in this application, improving accuracy and eliminating manual operations.

Enabling Xerox's hybrid categoriser are recent advances in machine 'learning' and pattern recognition, improvements in computer vision and the large body of hybrid content now available, added the company.

According to the company, the Xerox Research Centre Europe has experience with text categorisation and, in 2005, it demonstrated the industry's first generic image categoriser.

The new categoriser combines earlier text and image categorisers to handle hybrid content.

Bressan explained: "Xerox's hybrid categoriser creates a shared knowledge space between text and images.

The textual information enriches the visual and the visual information enriches the textual.

The whole is ultimately greater than the sum of the parts.".

Not what you're looking for? Search the site.

Back to top Back to top

Contact Xerox

Related Stories

Contact Xerox

 

Newsletter sign up

Request your free weekly copy of the Printingtalk email newsletter ...

A Pro-talk Publication

A Pro-talk publication