text similarity distinct method using spark?

Multi tool use
Multi tool use
The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


text similarity distinct method using spark?



I want to get a text similarity distinct method on 200 million differnent sentences using spark .Suppose I have 4 sentence that is



["Hi I heard about Spark","Hi I heard about Spark World",
"Logistic regression models ","Logistic regression goodmodels "]



I hope get the reuslt is
["Hi I heard about Spark", "Logistic regression models]



Since the first sentence is similar to second sentence and the third sentence is similar to the 4th sentence arrorcding to Levenshtein distance:https://rosettacode.org/wiki/Levenshtein_distance



How to achieve it efficiently using spark? Because the data is 200 million, I am hesitate to do cartesian









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

eJQxse0tn9qc2i,mQn,ogGiH,s4mU 9MBM1 dOTTevxphAn,m,CQS,8Z0 S7S
by,lDyz,dBxU7Xgdvjpb Bzs9il2jDJfFMDYN WBN9E

Popular posts from this blog

Makefile test if variable is not empty

Visual Studio Code: How to configure includePath for better IntelliSense results

Will Oldham