Apache Commons LevenshteinDistance莱文斯坦(相似度)算法
admin
2024-01-27 22:58:33
0

Apache Commons LevenshteinDistanceLevenshteinDistance(final Integer threshold) 如果阈值不为空,则距离计算将限制为最大长度。

介绍

如果阈值不为空,则距离计算将受到限制 到最大长度。

如果阈值为 null,则算法的无限版本将 被使用。

LevenshteinDistance() 方法是一个构造函数。

语法

来自LevenshteinDistance 的方法 LevenshteinDistance() 声明为:

复制

public LevenshteinDistance(final Integer threshold)

参数

LevenshteinDistance() 方法具有以下参数:

  • 整数阈值 - 如果此值为空,则距离计算将不受限制。这可能不是负面的。

以下代码演示如何使用 Apache CommonsLevenshteinDistance LevenshteinDistance(final Integer threshold)

例 1

复制

import org.apache.commons.text.*;
import org.apache.commons.text.diff.*;
import org.apache.commons.text.similarity.*;
import org.apache.commons.text.translate.*;import java.util.HashMap;
import java.util.Locale;
import java.util.Map;class ShowVisitor<Character> implements CommandVisitor<Character> {private int inserts = 0;private int keeps = 0;private int deletes = 0;public void visitInsertCommand(Character character) {++inserts;//   w w   w  .  d e  m  o   2 s    .  c o  m System.out.println(String.format("insert %s", character));}public void visitKeepCommand(Character character) {++keeps;System.out.println(String.format("keep   %s", character));}public void visitDeleteCommand(Character character) {++deletes;System.out.println(String.format("delete %s", character));}public void printStats() {System.out.println(String.format("%d inserts, %d deletes, %d keeps", inserts, deletes, keeps));}
}public class CommonsTextExamples {public static void main(String[] args) {caseUtilsExample();stringEscapeUtilsExample();stringSubstitutorExample();wordUtilsExample();diffExample();translateExample();similaritiesExample();sentenceSimilarityExample();distancesExample();sentenceDistanceExample();}private static void printExampleHeader(String example) {// Contains an example of TextStringBuilderString header = "Examples of " + example;System.out.println("\n" + header);TextStringBuilder builder = new TextStringBuilder();System.out.println(builder.appendPadding(header.length(), '-').toString());}public static void caseUtilsExample() {printExampleHeader("CaseUtils");String string = "java-programming-language";System.out.println(CaseUtils.toCamelCase(string, true, '-'));System.out.println(CaseUtils.toCamelCase(string, false, '-'));}public static void stringEscapeUtilsExample() {printExampleHeader("StringEscapeUtils");String string = "Department, R&D";System.out.println(StringEscapeUtils.escapeHtml4(string));System.out.println(StringEscapeUtils.escapeXml11(string));System.out.println(StringEscapeUtils.escapeCsv(string));System.out.println(StringEscapeUtils.builder(StringEscapeUtils.ESCAPE_HTML4).append("R&D dept: ").escape(string).toString());}public static void stringSubstitutorExample() {printExampleHeader("StringSubstitutor");Map<String, String> substitutions = new HashMap<>();substitutions.put("city", "London");substitutions.put("country", "England");// With static methodSystem.out.println(StringSubstitutor.replace("${city} is the capital of ${country}", substitutions));// With StringSubstitutor objectStringSubstitutor sub = new StringSubstitutor(substitutions);System.out.println(sub.replace("${city} is the capital of ${country}"));StringSubstitutor interpolator = StringSubstitutor.createInterpolator();System.out.println(interpolator.replace("Base64 encoder: ${base64Encoder:Secret password}"));}public static void wordUtilsExample() {printExampleHeader("WordUtils");String longString = "This is a very long string, from https://www.example.org";String allLower = "all lower but ONE";String allCapitalized = "All Capitalized But ONE";System.out.println("\nWordUtils: Abbreviation");// Take at least 9 characters, cutting to 12 characters if no space is found beforeSystem.out.println(WordUtils.abbreviate(longString, 9, 12, " ..."));// Take at least 10 characters, cutting to 12 characters if no space is found beforeSystem.out.println(WordUtils.abbreviate(longString, 10, 12, " ..."));// Take at least 10 characters, then cut on the first space wherever it isSystem.out.println(WordUtils.abbreviate(longString, 10, -1, " ..."));System.out.println("\nWordUtils: Initials");System.out.println(WordUtils.initials(allLower));System.out.println(WordUtils.initials(allCapitalized));System.out.println("\nWordUtils: Case change");// Doesn't lowercase the uppercase charactersSystem.out.println(WordUtils.capitalize(allLower));// Lowercases everything, then capitalizes the first letter of each wordSystem.out.println(WordUtils.capitalizeFully(allLower));// Lowercases the first letter of each wordSystem.out.println(WordUtils.uncapitalize(allCapitalized));// Swaps the case of each characterSystem.out.println(WordUtils.swapCase(allLower));System.out.println("\nWordUtils: Wrapping");// Line length is 10, uses '\n' as a line break, does not break words longer than the lineSystem.out.println(WordUtils.wrap(longString, 10, "\n", false) + "\n");// Line length is 10, uses '\n' as a line break, breaks words longer than the lineSystem.out.println(WordUtils.wrap(longString, 10, "\n", true) + "\n");// Line length is 10, uses '\n' as a line break, breaks words longer than the line, also breaks on commasSystem.out.println(WordUtils.wrap(longString, 10, "\n", true, ",") + "\n");}public static void diffExample() {printExampleHeader("diff");String s1 = "hyperspace";String s2 = "cyberscape";StringsComparator comparator = new StringsComparator(s1, s2);EditScript<Character> script = comparator.getScript();System.out.println("Longest Common Subsequence length (number of \"keep\" commands): " + script.getLCSLength());System.out.println("Effective modifications (number of \"insert\" and \"delete\" commands): "+ script.getModifications());ShowVisitor<Character> visitor = new ShowVisitor<>();script.visit(visitor);visitor.printStats();}public static void translateExample() {printExampleHeader("translate");Map<CharSequence, CharSequence> translation = new HashMap<>();translation.put("e", "3");translation.put("l", "1");translation.put("t", "7");String s1 = "Let it be!";LookupTranslator lookupTranslator = new LookupTranslator(translation);System.out.println(lookupTranslator.translate(s1));UnicodeEscaper unicodeEscaper = new UnicodeEscaper();UnicodeUnescaper unicodeUnescaper = new UnicodeUnescaper();String unicodeString = unicodeEscaper.translate(s1);System.out.println(unicodeString);System.out.println(unicodeUnescaper.translate(unicodeString));}public static void similaritiesExample() {printExampleHeader("similarities");String s1 = "hyperspace";String s2 = "cyberscape";JaccardSimilarity jaccard = new JaccardSimilarity();System.out.println("Jaccard similarity: " + jaccard.apply(s1, s2));JaroWinklerSimilarity jaroWinkler = new JaroWinklerSimilarity();System.out.println("Jaro-Winkler similarity: " + jaroWinkler.apply(s1, s2));LongestCommonSubsequence lcs = new LongestCommonSubsequence();System.out.println("Longest Common Subsequence similarity: " + lcs.apply(s1, s2));FuzzyScore fuzzyScore = new FuzzyScore(Locale.ENGLISH);System.out.println("Fuzzy score similarity: " + fuzzyScore.fuzzyScore(s1, s2));System.out.println("Fuzzy score similarity: " + fuzzyScore.fuzzyScore(s1, "space"));}public static void sentenceSimilarityExample() {printExampleHeader("sentence similarity");String s1 = "string similarity";String s2 = "string distance";Map<CharSequence, Integer> vector1 = new HashMap<>();Map<CharSequence, Integer> vector2 = new HashMap<>();for (String token : s1.split(" ")) {vector1.put(token, vector1.getOrDefault(token, 0) + 1);}for (String token : s2.split(" ")) {vector2.put(token, vector2.getOrDefault(token, 0) + 1);}CosineSimilarity cosine = new CosineSimilarity();System.out.println("Cosine similarity: " + cosine.cosineSimilarity(vector1, vector2));// Adding one repetition of "string" to vector2vector2.put("string", vector2.getOrDefault("string", 0) + 1);System.out.println("Cosine similarity: " + cosine.cosineSimilarity(vector1, vector2));}public static void distancesExample() {printExampleHeader("distances");String s1 = "hyperspace";String s2 = "cyberscape";HammingDistance hamming = new HammingDistance();// Requires the two strings to have the same lengthSystem.out.println("Hamming distance: " + hamming.apply(s1, s2));JaccardDistance jaccard = new JaccardDistance();System.out.println("Jaccard distance: " + jaccard.apply(s1, s2));JaroWinklerDistance jaroWinkler = new JaroWinklerDistance();// The result is wrong at the moment (see https://issues.apache.org/jira/browse/TEXT-104)System.out.println("Jaro-Winkler distance: " + jaroWinkler.apply(s1, s2));LongestCommonSubsequenceDistance lcs = new LongestCommonSubsequenceDistance();System.out.println("Longest Common Subsequence distance: " + lcs.apply(s1, s2));LevenshteinDistance levenshtein = new LevenshteinDistance();System.out.println("Levenshtein distance: " + levenshtein.apply(s1, s2));LevenshteinDistance levenshteinWithThreshold = new LevenshteinDistance(3);// Returns -1 since the actual distance, 4, is higher than the thresholdSystem.out.println("Levenshtein distance: " + levenshteinWithThreshold.apply(s1, s2));LevenshteinDetailedDistance levenshteinDetailed = new LevenshteinDetailedDistance();System.out.println("Levenshtein detailed distance: " + levenshteinDetailed.apply(s1, s2));}public static void sentenceDistanceExample() {printExampleHeader("sentence distance");String s1 = "string similarity";String s2 = "string distance";CosineDistance cosine = new CosineDistance();System.out.println("Cosine distance: " + cosine.apply(s1, s2));System.out.println("Cosine distance: " + cosine.apply(s1, s2 + " string"));}
}

相关内容

热门资讯

真“车厘子自由”!价格近乎腰斩... 本文来源:消费者报道 作者:郑艺阳 “今年车厘子品质和价格达到了一个很好的平衡,不再是高端水果,而是...
阳光乳业大宗交易折价成交278... 阳光乳业01月16日大宗交易平台共发生13笔成交,合计成交量278.42万股,成交金额3872.81...
雷军:新一代SU7已经开始小订 雷军刚刚更新微博:新一代SU7已经开始小订,更多详情登录小米汽车APP。 来源:雷军
欧洲8国集体回应特朗普关税威胁... 据CCTV国际时讯:因丹麦自治领地格陵兰岛问题被美国总统特朗普宣布将加征关税的欧洲8国今天(1月18...
原创 中... 中国一年用的电,比美国、欧盟、俄罗斯、印度和日本加起来还多。当这个数字在2025年初被公之于众时,它...
中国芯片最大IPO,要来了 长鑫科技递交科创板IPO申请已获受理 资料图 作者 黎曼 编辑王庆武 原标题《开年最大IPO要来了》...
实探海南自贸港封关“满月”:国... 2026年1月18日,海南自贸港封关运作迎来“满月”。 走进海口国际免税城,椰风裹着暖融融的阳光,漫...
原创 杰... 财经摆渡人 精研出品 破浪前行,共探财富新局 大家好欢迎收看【古今财鉴】 2026年1月12日,北交...
突发特讯!马克龙回应美国关税威... 一场关税威胁,让美欧盟友关系再度紧绷。当地时间1月18日,法国总统马克龙针对美国的关税胁迫作出首次官...
原创 2... 彩排路透一曝光,镜头扫过那个圆乎乎的身影,弹幕瞬间分成了两派。一边开始吐槽:怎么又是他了?另一边则是...
内部炸锅!追觅员工怒怼CEO“... 来源:21ic电子网 快科技 近日,有员工在追觅科技智能汽车项目的千人大群里,直怼俞浩和陈龙冬的聊天...
兴业银行成功发行首单自贸区主体... 1月15日,兴业银行(601166.SH)在全市场首次以“玉兰债”模式发行银行自贸区主体境外债券,发...
印度对华出口激增67%,对美出... 参考消息网1月16日报道据美国消费者新闻与商业频道网站1月16日报道,随着美国总统特朗普加征的高额关...
投资前瞻(1.19—1.25)... 重点关注: ·央行连续第8个月通过买断式逆回购向市场注入中期流动性。 ·“十五五”期间电网投资计划比...
肿瘤科医生提醒:如果你属于这 ... 在所有常见肿瘤中,结直肠癌(也就是大家常说的“肠癌”)其实是个很“矛盾”的存在。一方面,它的发展速度...
九龙县多维发力激活电商发展新动... 01 “一村一主播”育才强基 紧扣高原农特产品上行需求,构建“基础培训、实战演练、创业孵化”三级体系...
原创 美... 编辑:[微风] 最近,美国华盛顿的那些决策者总算幡然醒悟了,他们从2018年就开始打主意,想通过贸易...
马斯克点火全球最大超算,首个1... 全球首个吉瓦级训练集群! 马斯克一早激动官宣,专为训下一代Grok打造的「超算巨兽」Colossus...
原创 A... 近期A股市场的走势,让不少股民直呼“心塞”!明明大盘一度强势上攻、站稳4100点之上,市场情绪看似回...
拿下1200亿锂电大单?容百科... 据证监会官网,2026年1月14日,宁波容百新能源科技股份有限公司(简称容百科技)披露日常经营重大合...