I have some free text (think: blog articles, interview transcripts, chat comments), and would like to explore the text data by analysing the proper nouns it contains.
I know of many ways to simply look up the text against a 'list' of proper nouns. The problem with the approach is many false positives and false negatives, as well as inaccuracies where one proper noun (e.g. "John Allen") is identified as two proper nouns ("John" and "Allen"), as well as other problems, mostly to do with long or unusual proper nouns (e.g. "the Gulf of Carpentaria" - a single proper noun containing the word "of", and long names like "Joost van der Westhuizen"). These kinds of longer, non-conformist proper nouns tend to really trip up grep-style proper noun identification models.
Does anyone know if any AI available to the public can more accurately identify proper nouns in free text?