The Oldest Big Data Problem: Parsing Human Language
- by dan.mcclary
There's a new whitepaper up on Oracle Technology Network which details the use of Digital Reasoning Systems' Synthesys software on Oracle Big Data Appliance. Digital Reasoning's approach is inherently "big data friendly," as it leverages multiple components of the Hadoop ecosystem. Moreover, the paper addresses the oldest big data problem of them all: extracting knowledge from human text.
You can find the paper here.
From the Executive Summary:
There is a wealth of information to be extracted from natural language, but that extraction is
challenging. The volume of human language we generate constitutes a natural Big Data
problem, while its complexity and nuance requires a particular expertise to model and mine. In
this paper we illustrate the impressive combination of Oracle Big Data Appliance and Digital
Reasoning Synthesys software. The combination of Synthesys and Big Data Appliance makes
it possible to analyze tens of millions of documents in a matter of hours. Moreover, this
powerful combination achieves four times greater throughput than conducting the equivalent
analysis on a much larger cloud-deployed Hadoop cluster.