Apache Commons LevenshteinDistance莱文斯坦(相似度)算法
admin
2024-01-27 22:58:33
0

Apache Commons LevenshteinDistanceLevenshteinDistance(final Integer threshold) 如果阈值不为空,则距离计算将限制为最大长度。

介绍

如果阈值不为空,则距离计算将受到限制 到最大长度。

如果阈值为 null,则算法的无限版本将 被使用。

LevenshteinDistance() 方法是一个构造函数。

语法

来自LevenshteinDistance 的方法 LevenshteinDistance() 声明为:

复制

public LevenshteinDistance(final Integer threshold)

参数

LevenshteinDistance() 方法具有以下参数:

  • 整数阈值 - 如果此值为空,则距离计算将不受限制。这可能不是负面的。

以下代码演示如何使用 Apache CommonsLevenshteinDistance LevenshteinDistance(final Integer threshold)

例 1

复制

import org.apache.commons.text.*;
import org.apache.commons.text.diff.*;
import org.apache.commons.text.similarity.*;
import org.apache.commons.text.translate.*;import java.util.HashMap;
import java.util.Locale;
import java.util.Map;class ShowVisitor<Character> implements CommandVisitor<Character> {private int inserts = 0;private int keeps = 0;private int deletes = 0;public void visitInsertCommand(Character character) {++inserts;//   w w   w  .  d e  m  o   2 s    .  c o  m System.out.println(String.format("insert %s", character));}public void visitKeepCommand(Character character) {++keeps;System.out.println(String.format("keep   %s", character));}public void visitDeleteCommand(Character character) {++deletes;System.out.println(String.format("delete %s", character));}public void printStats() {System.out.println(String.format("%d inserts, %d deletes, %d keeps", inserts, deletes, keeps));}
}public class CommonsTextExamples {public static void main(String[] args) {caseUtilsExample();stringEscapeUtilsExample();stringSubstitutorExample();wordUtilsExample();diffExample();translateExample();similaritiesExample();sentenceSimilarityExample();distancesExample();sentenceDistanceExample();}private static void printExampleHeader(String example) {// Contains an example of TextStringBuilderString header = "Examples of " + example;System.out.println("\n" + header);TextStringBuilder builder = new TextStringBuilder();System.out.println(builder.appendPadding(header.length(), '-').toString());}public static void caseUtilsExample() {printExampleHeader("CaseUtils");String string = "java-programming-language";System.out.println(CaseUtils.toCamelCase(string, true, '-'));System.out.println(CaseUtils.toCamelCase(string, false, '-'));}public static void stringEscapeUtilsExample() {printExampleHeader("StringEscapeUtils");String string = "Department, R&D";System.out.println(StringEscapeUtils.escapeHtml4(string));System.out.println(StringEscapeUtils.escapeXml11(string));System.out.println(StringEscapeUtils.escapeCsv(string));System.out.println(StringEscapeUtils.builder(StringEscapeUtils.ESCAPE_HTML4).append("R&D dept: ").escape(string).toString());}public static void stringSubstitutorExample() {printExampleHeader("StringSubstitutor");Map<String, String> substitutions = new HashMap<>();substitutions.put("city", "London");substitutions.put("country", "England");// With static methodSystem.out.println(StringSubstitutor.replace("${city} is the capital of ${country}", substitutions));// With StringSubstitutor objectStringSubstitutor sub = new StringSubstitutor(substitutions);System.out.println(sub.replace("${city} is the capital of ${country}"));StringSubstitutor interpolator = StringSubstitutor.createInterpolator();System.out.println(interpolator.replace("Base64 encoder: ${base64Encoder:Secret password}"));}public static void wordUtilsExample() {printExampleHeader("WordUtils");String longString = "This is a very long string, from https://www.example.org";String allLower = "all lower but ONE";String allCapitalized = "All Capitalized But ONE";System.out.println("\nWordUtils: Abbreviation");// Take at least 9 characters, cutting to 12 characters if no space is found beforeSystem.out.println(WordUtils.abbreviate(longString, 9, 12, " ..."));// Take at least 10 characters, cutting to 12 characters if no space is found beforeSystem.out.println(WordUtils.abbreviate(longString, 10, 12, " ..."));// Take at least 10 characters, then cut on the first space wherever it isSystem.out.println(WordUtils.abbreviate(longString, 10, -1, " ..."));System.out.println("\nWordUtils: Initials");System.out.println(WordUtils.initials(allLower));System.out.println(WordUtils.initials(allCapitalized));System.out.println("\nWordUtils: Case change");// Doesn't lowercase the uppercase charactersSystem.out.println(WordUtils.capitalize(allLower));// Lowercases everything, then capitalizes the first letter of each wordSystem.out.println(WordUtils.capitalizeFully(allLower));// Lowercases the first letter of each wordSystem.out.println(WordUtils.uncapitalize(allCapitalized));// Swaps the case of each characterSystem.out.println(WordUtils.swapCase(allLower));System.out.println("\nWordUtils: Wrapping");// Line length is 10, uses '\n' as a line break, does not break words longer than the lineSystem.out.println(WordUtils.wrap(longString, 10, "\n", false) + "\n");// Line length is 10, uses '\n' as a line break, breaks words longer than the lineSystem.out.println(WordUtils.wrap(longString, 10, "\n", true) + "\n");// Line length is 10, uses '\n' as a line break, breaks words longer than the line, also breaks on commasSystem.out.println(WordUtils.wrap(longString, 10, "\n", true, ",") + "\n");}public static void diffExample() {printExampleHeader("diff");String s1 = "hyperspace";String s2 = "cyberscape";StringsComparator comparator = new StringsComparator(s1, s2);EditScript<Character> script = comparator.getScript();System.out.println("Longest Common Subsequence length (number of \"keep\" commands): " + script.getLCSLength());System.out.println("Effective modifications (number of \"insert\" and \"delete\" commands): "+ script.getModifications());ShowVisitor<Character> visitor = new ShowVisitor<>();script.visit(visitor);visitor.printStats();}public static void translateExample() {printExampleHeader("translate");Map<CharSequence, CharSequence> translation = new HashMap<>();translation.put("e", "3");translation.put("l", "1");translation.put("t", "7");String s1 = "Let it be!";LookupTranslator lookupTranslator = new LookupTranslator(translation);System.out.println(lookupTranslator.translate(s1));UnicodeEscaper unicodeEscaper = new UnicodeEscaper();UnicodeUnescaper unicodeUnescaper = new UnicodeUnescaper();String unicodeString = unicodeEscaper.translate(s1);System.out.println(unicodeString);System.out.println(unicodeUnescaper.translate(unicodeString));}public static void similaritiesExample() {printExampleHeader("similarities");String s1 = "hyperspace";String s2 = "cyberscape";JaccardSimilarity jaccard = new JaccardSimilarity();System.out.println("Jaccard similarity: " + jaccard.apply(s1, s2));JaroWinklerSimilarity jaroWinkler = new JaroWinklerSimilarity();System.out.println("Jaro-Winkler similarity: " + jaroWinkler.apply(s1, s2));LongestCommonSubsequence lcs = new LongestCommonSubsequence();System.out.println("Longest Common Subsequence similarity: " + lcs.apply(s1, s2));FuzzyScore fuzzyScore = new FuzzyScore(Locale.ENGLISH);System.out.println("Fuzzy score similarity: " + fuzzyScore.fuzzyScore(s1, s2));System.out.println("Fuzzy score similarity: " + fuzzyScore.fuzzyScore(s1, "space"));}public static void sentenceSimilarityExample() {printExampleHeader("sentence similarity");String s1 = "string similarity";String s2 = "string distance";Map<CharSequence, Integer> vector1 = new HashMap<>();Map<CharSequence, Integer> vector2 = new HashMap<>();for (String token : s1.split(" ")) {vector1.put(token, vector1.getOrDefault(token, 0) + 1);}for (String token : s2.split(" ")) {vector2.put(token, vector2.getOrDefault(token, 0) + 1);}CosineSimilarity cosine = new CosineSimilarity();System.out.println("Cosine similarity: " + cosine.cosineSimilarity(vector1, vector2));// Adding one repetition of "string" to vector2vector2.put("string", vector2.getOrDefault("string", 0) + 1);System.out.println("Cosine similarity: " + cosine.cosineSimilarity(vector1, vector2));}public static void distancesExample() {printExampleHeader("distances");String s1 = "hyperspace";String s2 = "cyberscape";HammingDistance hamming = new HammingDistance();// Requires the two strings to have the same lengthSystem.out.println("Hamming distance: " + hamming.apply(s1, s2));JaccardDistance jaccard = new JaccardDistance();System.out.println("Jaccard distance: " + jaccard.apply(s1, s2));JaroWinklerDistance jaroWinkler = new JaroWinklerDistance();// The result is wrong at the moment (see https://issues.apache.org/jira/browse/TEXT-104)System.out.println("Jaro-Winkler distance: " + jaroWinkler.apply(s1, s2));LongestCommonSubsequenceDistance lcs = new LongestCommonSubsequenceDistance();System.out.println("Longest Common Subsequence distance: " + lcs.apply(s1, s2));LevenshteinDistance levenshtein = new LevenshteinDistance();System.out.println("Levenshtein distance: " + levenshtein.apply(s1, s2));LevenshteinDistance levenshteinWithThreshold = new LevenshteinDistance(3);// Returns -1 since the actual distance, 4, is higher than the thresholdSystem.out.println("Levenshtein distance: " + levenshteinWithThreshold.apply(s1, s2));LevenshteinDetailedDistance levenshteinDetailed = new LevenshteinDetailedDistance();System.out.println("Levenshtein detailed distance: " + levenshteinDetailed.apply(s1, s2));}public static void sentenceDistanceExample() {printExampleHeader("sentence distance");String s1 = "string similarity";String s2 = "string distance";CosineDistance cosine = new CosineDistance();System.out.println("Cosine distance: " + cosine.apply(s1, s2));System.out.println("Cosine distance: " + cosine.apply(s1, s2 + " string"));}
}

相关内容

热门资讯

原创 4... 写在文章前的声明:在本文之前的说明:本文中所列的投资信息,只是一个对基金资产净值进行排行的客观描述,...
胜宏科技港股大涨49% 做完英... 记者 陈月芹 4月21日,全球AI算力板龙头胜宏科技(02476.HK)登陆港交所,上市首日股价大涨...
永赢基金:聚焦“科技新锐”,科... 数据来源:Wind,时间统计区间为2025/1/1-2026/4/21,指数过往表现不预示未来,不构...
五大阅读趋势显现!当当网发布2... 在第31个世界读书日即将来临之际及首个全民阅读活动周期间,当当网正式发布2026国民阅读洞察报告。 ...
业绩逐季回暖 老百姓大药房一季... 上证报中国证券网讯(记者 夏子航)4月22日晚,老百姓大药房发布2025年年报和2026年一季报。今...
中国20强城市大洗牌:苏州接近... 中国的城市经济竞争格局一直在变化,每年发布的GDP数据都会对城市经济实力进行重新排列。2025年榜又...
直击金宏气体股东会:预期年内氦... 《科创板日报》4月22日讯(记者 郭辉)金宏气体日前举行2025年度股东大会。会上该公司审议了公司年...
5月1日起,俄据悉将叫停哈萨克... 据行业消息人士透露,俄罗斯将于5月1日起停止经友谊管道转运哈萨克斯坦输往德国的石油,相关调整计划已送...
深化具身智能生态布局 京东携手... 4 月 22 日,京东与国内消费级人形机器人头部企业松延动力正式达成三年期战略合作。双方将围绕产品研...
原创 帮... 先问你一个问题,美伊停火今晚到期,按常理避险情绪该升温,黄金应该涨吧?结果恰恰相反——原油涨了,黄金...
300295、600889,将... 三六五网、南京化纤,将被*ST。 公司股票自4月23日开市起停牌一天,于4月24日开市起复牌并实施退...
能源大变天!外媒:羡慕中国的石... 这一次油价突破 110 美元的能源危机,着实魔幻。如果放在十年前,没人会相信中国能在这场风波中获利,...
黄金涨跌两难,现在还能上车吗? 中新网4月22日电(记者 左雨晴) 四月以来,美伊局势反复拉扯,美联储降息预期一变再变。黄金价格在4...
“我身体健康”,库克现身员工大... 当地时间4月21日,受苹果官宣CEO换届影响,公司股价盘中下探超2%,总市值失守4万亿美元关口,收盘...
库克留下一个悬念 工程师能否拯救创新节奏? 听筒Tech(ID:tingtongtech)原创 文 | 赵 森 ...
探索消费信贷与社交支付深度融合... 腾讯这一金融产品再添新功能,4月19日,北京商报记者注意到,微信分付灰度测试转账功能引发热议,在向微...
土耳其主要银行股指早盘下跌2% 每经AI快讯,4月20日,土耳其主要银行股指早盘下跌2%。 每日经济新闻
好用的OTA代运营源头厂家 在如今竞争激烈的酒旅行业中,OTA代运营服务成为了众多酒店、民宿提升竞争力的关键。但市场上的代运营厂...
成都五一出游全国热门第三 “五一”假期临近,同程旅行最新发布的《2026“五一”旅行趋势报告》显示,今年“五一”期间成都同时位...