I was recently tasked with building a Name Entity Recognizer as part of a project. The objective was to parse a given sentence and come up with all the possible combinations of the entities.
One approach that was suggested was to keep a lookup table for all the know connector words like articles and conjunctions, remove them from the words list after splitting the sentence on the basis of the spaces. This would leave out the Name Entities in the sentence.
A lookup is then done for these identified entities on another lookup table that associates them to the entity type, for example if the sentence was: Remember the Titans was a movie directed by Boaz Yakin, the possible outputs would be:
{Remember the Titans,Movie} was {a movie,Movie} directed by {Boaz
Yakin,director} {Remember the Titans,Movie} was a movie directed
by Boaz Yakin {Remember the Titans,Movie} was {a movie,Movie}
directed by Boaz Yakin {Remember the Titans,Movie} was a movie
directed by {Boaz Yakin,director} Remember the Titans was {a
movie,Movie} directed by Boaz Yakin Remember the Titans was {a
movie,Movie} directed by {Boaz Yakin,director} Remember the
Titans was a movie directed by {Boaz Yakin,director} Remember the
{the titans,Movie,Sports Team} was {a movie,Movie} directed by {Boaz
Yakin,director} Remember the {the titans,Movie,Sports Team} was a
movie directed by Boaz Yakin Remember the {the
titans,Movie,Sports Team} was {a movie,Movie} directed by Boaz
Yakin Remember the {the titans,Movie,Sports Team} was a movie
directed by {Boaz Yakin,director}
The entity lookup table here would contain the following data:
Remember the Titans=Movie a movie=Movie Boaz
Yakin=director the Titans=Movie the Titans=Sports
Team
Another alternative logic that was put forward was to build a crude sentence tree that would contain the connector words in the lookup table as parent nodes and do a lookup in the entity table for the leaf node that might contain the entities. The tree that was built for the sentence above would be:
The question I am faced with is the benefits of the two approaches, should I be going for the tree approach to represent the sentence parsing, since it provides a more semantic structure? Is there a better approach I should be going for solving it?