extra-model
extra-model
is an implementation of the ExtRA
algorithm described in the paper "ExtRA: Extracting Prominent Review Aspects from Customer Feedback".
It is an unsupervised algorithm that implements an NLP task called Aspect-Based Sentiment Analysis.
At Wayfair this algorithm is routinely used in production on 1mln+ reviews to, e.g., produce what we call Bubble Filters:
However, what we found in our work is that extra-model
can be used to analyze any text, not only reviews, so we are hoping that with this package you'd be able to unlock the power of this algorithm in your work too.
Examples
More examples are available here
extra-model
input
More info about the worklow is available here
Input of an extra is a .csv
file with 2 columns: CommentId
and Comments
.
Both must be present and named exactly in that way.
extra-model
output
After extra-model
finishes calculations, it'll produce a .csv
file with following structure:
AdCluster,Aspect,AspectCount,CommentId,Descriptor,Position,SentimentBinary,SentimentCompound,Topic,TopicCount,TopicImportance,TopicSentimentBinary,TopicSentimentCompound,WordnetNode
only,downside,1,321,only,9,0.0,0.0,downside.n.01,1,0.005572645018795278,0.0,0.0,downside.n.01
more,nothing,1,74,more,54,0.0,0.0,nothing.n.01,1,0.005572645018795278,0.0,0.0,nothing.n.01
clean,bathrooms,1,146,clean,4,1.0,0.4019,toilet.n.01,1,0.005572645018795278,1.0,0.4019,toilet.n.01
decorated,place,5,146,decorated,32,0.0,0.0,home.n.01,6,0.03343587011277168,0.0,-0.01131666666666666,home.n.01
Columns have following meaning:
Column | Description |
---|---|
AdCluster | Adjectives are clustered together and this indicates the "center" of a cluster (e.g., "awesome", "fantastic", "great" descriptors might produce "great" as AdCluster ) |
Aspect | Identified aspect - this is an actual word that person wrote in a text |
AspectCount | How often this aspect has been found in all of the input |
CommentId | ID of an input. Since one input may produce multiple aspects, ID column must always be present |
Descriptor | Identified adjective (not clustered) - this is an actual word that person wrote in a text |
Position | Character number where aspect was found (e.g., "nice shirt" will have aspect "shirt" and Position 6 |
SentimentBinary | Binary sentiment for aspect |
SentimentCompound | Compound sentiment for aspect |
Topic | Collection of aspects. |
TopicCount | How often topic has been found in input |
TopicImportance | Importance of a topic |
TopicSentimentBinary | Similar to aspect, but on a topic level |
TopicSentimentCompound | Similar to aspect, but on a topic level |
WordnetNode | Mapping to wordnet node. Identifiers in the form .n.01 mean first meaning of the noun in wordnet |