Words to numbers bleg

Dear Scatterplotters,

I’m cross-posting an inquiry from my advisee and collaborator Alex Hanna regarding text parsing to convert qualitative descriptions of events into numerical estimates.

http://badhessian.org/2013/06/numerical-approximation-words-to-numbers/

I’ve done this myself in the past, but as a human coder using the text descriptions to do qualitative categorization of group size based on my best judgment reading the whole story. FYI the codes I used for Madison protests in the 1990s were: Tiny (1-5), Very Small (6-15), Small (16-30), Modest (31-99), Medium (100-499), Larger (500-1500), Large (2000-10,000), Very Large (10,000  +), and Huge (100,000+) which we then collapsed into Small (1-15), Medium (16-499), and Large (500+).

The problem here is to use automated text parsing of words like “several”, “scores,” “small,” “large,” etc. to categorize protests. I can find substantial literature on the problem of estimating crowd sizes while looking at a crowd and about the diversity of crowd size estimates from different sources (e.g. police and organizers) and about how news reporters decide which sources to use. But I can’t find anything about this problem of trying to get some rough event size estimate from text parsing.  Can anyone point us to a source?