Why is TFIDF seen as a model in Gensim

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


Why is TFIDF seen as a model in Gensim



I am familiar with the tfidf vectorizer.



However, in gensim it seems like tfidf is treated as a model on itself, just like LDA, LSI and others.



Why is this the case? Can't tfidf not just be used to vectorize and then to input in an LDA model for example?



Link to documentation: https://radimrehurek.com/gensim/tut2.html





From the link you provided, it looks like they're vectorizing the corpus first then putting it into an LSI transformation which is standard. Looking at the TFIDF Model documentation, it looks to be the same as SkLearn's TFIDF, just in this case they use the word "model" to describe it.
– W Stokvis
Jul 25 at 13:34





@WStokvis so you think you can insert the tfidf vectorized document in an ldamodel for example?
– Daphne
Jul 25 at 13:41





Yes, that's exactly what they do in the documentation
– W Stokvis
Jul 25 at 13:47





TDIDF is not a static transformation. The term frequencies need to be learned and stored (i.e. it is a model). You could learn the term frequencies with one corpus and transform another with it, so by making it a model in Gensim, it can be reused for multiple use cases.
– BrunoGL
yesterday





Thanks for the helpful answer! @BrunoGL
– Daphne
4 hours ago




1 Answer
1



TDIDF is not a static transformation.



The term frequencies need to be learned and stored (i.e. it is a model).



This means that you could learn the term frequencies with one corpus and transform another with it, so by making it a model in Gensim, it can be reused for multiple use cases.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Makefile test if variable is not empty

Will Oldham

'Series' object is not callable Error / Statsmodels illegal variable name