Apache Commons LevenshteinDistance莱文斯坦(相似度)算法
admin
2024-01-27 22:58:33
0

Apache Commons LevenshteinDistanceLevenshteinDistance(final Integer threshold) 如果阈值不为空,则距离计算将限制为最大长度。

介绍

如果阈值不为空,则距离计算将受到限制 到最大长度。

如果阈值为 null,则算法的无限版本将 被使用。

LevenshteinDistance() 方法是一个构造函数。

语法

来自LevenshteinDistance 的方法 LevenshteinDistance() 声明为:

复制

public LevenshteinDistance(final Integer threshold)

参数

LevenshteinDistance() 方法具有以下参数:

  • 整数阈值 - 如果此值为空,则距离计算将不受限制。这可能不是负面的。

以下代码演示如何使用 Apache CommonsLevenshteinDistance LevenshteinDistance(final Integer threshold)

例 1

复制

import org.apache.commons.text.*;
import org.apache.commons.text.diff.*;
import org.apache.commons.text.similarity.*;
import org.apache.commons.text.translate.*;import java.util.HashMap;
import java.util.Locale;
import java.util.Map;class ShowVisitor<Character> implements CommandVisitor<Character> {private int inserts = 0;private int keeps = 0;private int deletes = 0;public void visitInsertCommand(Character character) {++inserts;//   w w   w  .  d e  m  o   2 s    .  c o  m System.out.println(String.format("insert %s", character));}public void visitKeepCommand(Character character) {++keeps;System.out.println(String.format("keep   %s", character));}public void visitDeleteCommand(Character character) {++deletes;System.out.println(String.format("delete %s", character));}public void printStats() {System.out.println(String.format("%d inserts, %d deletes, %d keeps", inserts, deletes, keeps));}
}public class CommonsTextExamples {public static void main(String[] args) {caseUtilsExample();stringEscapeUtilsExample();stringSubstitutorExample();wordUtilsExample();diffExample();translateExample();similaritiesExample();sentenceSimilarityExample();distancesExample();sentenceDistanceExample();}private static void printExampleHeader(String example) {// Contains an example of TextStringBuilderString header = "Examples of " + example;System.out.println("\n" + header);TextStringBuilder builder = new TextStringBuilder();System.out.println(builder.appendPadding(header.length(), '-').toString());}public static void caseUtilsExample() {printExampleHeader("CaseUtils");String string = "java-programming-language";System.out.println(CaseUtils.toCamelCase(string, true, '-'));System.out.println(CaseUtils.toCamelCase(string, false, '-'));}public static void stringEscapeUtilsExample() {printExampleHeader("StringEscapeUtils");String string = "Department, R&D";System.out.println(StringEscapeUtils.escapeHtml4(string));System.out.println(StringEscapeUtils.escapeXml11(string));System.out.println(StringEscapeUtils.escapeCsv(string));System.out.println(StringEscapeUtils.builder(StringEscapeUtils.ESCAPE_HTML4).append("R&D dept: ").escape(string).toString());}public static void stringSubstitutorExample() {printExampleHeader("StringSubstitutor");Map<String, String> substitutions = new HashMap<>();substitutions.put("city", "London");substitutions.put("country", "England");// With static methodSystem.out.println(StringSubstitutor.replace("${city} is the capital of ${country}", substitutions));// With StringSubstitutor objectStringSubstitutor sub = new StringSubstitutor(substitutions);System.out.println(sub.replace("${city} is the capital of ${country}"));StringSubstitutor interpolator = StringSubstitutor.createInterpolator();System.out.println(interpolator.replace("Base64 encoder: ${base64Encoder:Secret password}"));}public static void wordUtilsExample() {printExampleHeader("WordUtils");String longString = "This is a very long string, from https://www.example.org";String allLower = "all lower but ONE";String allCapitalized = "All Capitalized But ONE";System.out.println("\nWordUtils: Abbreviation");// Take at least 9 characters, cutting to 12 characters if no space is found beforeSystem.out.println(WordUtils.abbreviate(longString, 9, 12, " ..."));// Take at least 10 characters, cutting to 12 characters if no space is found beforeSystem.out.println(WordUtils.abbreviate(longString, 10, 12, " ..."));// Take at least 10 characters, then cut on the first space wherever it isSystem.out.println(WordUtils.abbreviate(longString, 10, -1, " ..."));System.out.println("\nWordUtils: Initials");System.out.println(WordUtils.initials(allLower));System.out.println(WordUtils.initials(allCapitalized));System.out.println("\nWordUtils: Case change");// Doesn't lowercase the uppercase charactersSystem.out.println(WordUtils.capitalize(allLower));// Lowercases everything, then capitalizes the first letter of each wordSystem.out.println(WordUtils.capitalizeFully(allLower));// Lowercases the first letter of each wordSystem.out.println(WordUtils.uncapitalize(allCapitalized));// Swaps the case of each characterSystem.out.println(WordUtils.swapCase(allLower));System.out.println("\nWordUtils: Wrapping");// Line length is 10, uses '\n' as a line break, does not break words longer than the lineSystem.out.println(WordUtils.wrap(longString, 10, "\n", false) + "\n");// Line length is 10, uses '\n' as a line break, breaks words longer than the lineSystem.out.println(WordUtils.wrap(longString, 10, "\n", true) + "\n");// Line length is 10, uses '\n' as a line break, breaks words longer than the line, also breaks on commasSystem.out.println(WordUtils.wrap(longString, 10, "\n", true, ",") + "\n");}public static void diffExample() {printExampleHeader("diff");String s1 = "hyperspace";String s2 = "cyberscape";StringsComparator comparator = new StringsComparator(s1, s2);EditScript<Character> script = comparator.getScript();System.out.println("Longest Common Subsequence length (number of \"keep\" commands): " + script.getLCSLength());System.out.println("Effective modifications (number of \"insert\" and \"delete\" commands): "+ script.getModifications());ShowVisitor<Character> visitor = new ShowVisitor<>();script.visit(visitor);visitor.printStats();}public static void translateExample() {printExampleHeader("translate");Map<CharSequence, CharSequence> translation = new HashMap<>();translation.put("e", "3");translation.put("l", "1");translation.put("t", "7");String s1 = "Let it be!";LookupTranslator lookupTranslator = new LookupTranslator(translation);System.out.println(lookupTranslator.translate(s1));UnicodeEscaper unicodeEscaper = new UnicodeEscaper();UnicodeUnescaper unicodeUnescaper = new UnicodeUnescaper();String unicodeString = unicodeEscaper.translate(s1);System.out.println(unicodeString);System.out.println(unicodeUnescaper.translate(unicodeString));}public static void similaritiesExample() {printExampleHeader("similarities");String s1 = "hyperspace";String s2 = "cyberscape";JaccardSimilarity jaccard = new JaccardSimilarity();System.out.println("Jaccard similarity: " + jaccard.apply(s1, s2));JaroWinklerSimilarity jaroWinkler = new JaroWinklerSimilarity();System.out.println("Jaro-Winkler similarity: " + jaroWinkler.apply(s1, s2));LongestCommonSubsequence lcs = new LongestCommonSubsequence();System.out.println("Longest Common Subsequence similarity: " + lcs.apply(s1, s2));FuzzyScore fuzzyScore = new FuzzyScore(Locale.ENGLISH);System.out.println("Fuzzy score similarity: " + fuzzyScore.fuzzyScore(s1, s2));System.out.println("Fuzzy score similarity: " + fuzzyScore.fuzzyScore(s1, "space"));}public static void sentenceSimilarityExample() {printExampleHeader("sentence similarity");String s1 = "string similarity";String s2 = "string distance";Map<CharSequence, Integer> vector1 = new HashMap<>();Map<CharSequence, Integer> vector2 = new HashMap<>();for (String token : s1.split(" ")) {vector1.put(token, vector1.getOrDefault(token, 0) + 1);}for (String token : s2.split(" ")) {vector2.put(token, vector2.getOrDefault(token, 0) + 1);}CosineSimilarity cosine = new CosineSimilarity();System.out.println("Cosine similarity: " + cosine.cosineSimilarity(vector1, vector2));// Adding one repetition of "string" to vector2vector2.put("string", vector2.getOrDefault("string", 0) + 1);System.out.println("Cosine similarity: " + cosine.cosineSimilarity(vector1, vector2));}public static void distancesExample() {printExampleHeader("distances");String s1 = "hyperspace";String s2 = "cyberscape";HammingDistance hamming = new HammingDistance();// Requires the two strings to have the same lengthSystem.out.println("Hamming distance: " + hamming.apply(s1, s2));JaccardDistance jaccard = new JaccardDistance();System.out.println("Jaccard distance: " + jaccard.apply(s1, s2));JaroWinklerDistance jaroWinkler = new JaroWinklerDistance();// The result is wrong at the moment (see https://issues.apache.org/jira/browse/TEXT-104)System.out.println("Jaro-Winkler distance: " + jaroWinkler.apply(s1, s2));LongestCommonSubsequenceDistance lcs = new LongestCommonSubsequenceDistance();System.out.println("Longest Common Subsequence distance: " + lcs.apply(s1, s2));LevenshteinDistance levenshtein = new LevenshteinDistance();System.out.println("Levenshtein distance: " + levenshtein.apply(s1, s2));LevenshteinDistance levenshteinWithThreshold = new LevenshteinDistance(3);// Returns -1 since the actual distance, 4, is higher than the thresholdSystem.out.println("Levenshtein distance: " + levenshteinWithThreshold.apply(s1, s2));LevenshteinDetailedDistance levenshteinDetailed = new LevenshteinDetailedDistance();System.out.println("Levenshtein detailed distance: " + levenshteinDetailed.apply(s1, s2));}public static void sentenceDistanceExample() {printExampleHeader("sentence distance");String s1 = "string similarity";String s2 = "string distance";CosineDistance cosine = new CosineDistance();System.out.println("Cosine distance: " + cosine.apply(s1, s2));System.out.println("Cosine distance: " + cosine.apply(s1, s2 + " string"));}
}

相关内容

热门资讯

1汤1粥,护肝很好!肝血足、睡... 肝脏是我们体内最“沉默”的器官,日夜不停地代谢垃圾、分解毒素,却从不喊疼。可一旦它累了,身体就会悄悄...
金方通圣代理商区域保护政策怎么... 深耕大健康赛道 金方通圣以区域保护筑牢代理商盈利护城河 随着国民健康意识的持续提升,大健康产业进入高...
沃什上任“首秀”,股市、黄金、... 北京时间18日凌晨,凯文·沃什以美联储主席身份主持了他上任后的首次利率决议会议(下称议息会议)。 这...
连续5年财务造假坐实,虚增利润... 日前,消费电子产业链企业合力泰(002217.SZ)发布公告,针对公司财务信息披露违法违规行为,福建...
最高报价+全场最高分,华夏基金... 睿思网讯:6月19日,广东省公共资源交易平台披露,珠海水务环境控股集团有限公司基础设施公募REITs...
金价,大幅下挫!“老铺们”也开... 近期,国际金价上演剧烈的“过山车”行情,COMEX黄金期货连续三日大幅下挫,累计跌幅超4%。6月中旬...
营收三连降,贵阳银行换帅能否解... 近日,贵州省资产规模最大的城商行——贵阳银行股份有限公司(以下简称:贵阳银行)迎来高层重大人事变动—...
国际油价V型反弹,美股期货全线... 国际油价V型反弹! 北京时间6月20日,ICE布油、WIT原油盘中一度直线跳水跌约1%,随后双双快速...
布局国产半导体赛道,这份芯片E... 【免责声明】本文仅为科普性质内容,不构成任何投资建议。半导体行业属于高波动、强周期赛道,ETF投资存...
原创 6... 一、大盘指数分析 今天,先分析上证指数和科创50指数。 先分析上证指数。 上证指数,日线级别...
原创 长... 6月11日,湖南金融监管局发布任职批复,核准陈明沙湖南长银五八消费金融股份有限公司(以下简称“长银五...
原创 刘... 一、素颜聚餐照引爆热搜,白鹿为什么会让刘冲亲自出面推荐? 6月14日晚上,一条来自《Vogue》主编...
芯片股狂欢周!美股基金流入创新... 本周,芯片股再次成为了全球股市的焦点。 受益于美伊签署谅解备忘录,霍尔木兹海峡恢复通航,缓解全球通胀...
美媒:连苹果都要涨价,为何特朗... 北京时间6月20日,据《华尔街日报》报道,美国总统特朗普最不希望看到的事情,就是消费者的钱包再次受到...
新动向!多家券商债券评级上调,... 券商信用评级新动向。 6月以来,券商债券迎来少见的评级上调潮。中金财富、东北证券、长城证券、华安证券...
珠宝商投储能,白酒厂配储柜……... 周大福开始做储能了,没错,就是你知道的那个周大福。 2026年4月,周大福创建宣布在芬兰投建储能项...
手握10亿订单却“缺血”?偏科... 文 | 创业最前线 5月28日,苏州天瞳威视电子科技股份有限公司(以下简称“天瞳威视”)向港交所主...
原创 重... 太空探索技术公司SpaceX于2026年6月12日正式登陆纳斯达克完成上市挂牌作业。此举旨在从资本整...
原创 喝... 01 先看一个反直觉的数字。 白酒行业的总产量,从2016年的1358万千升降到了去年的354万千升...
广信科技:高级管理人员变动公告 证券日报网讯 6月18日,广信科技发布公告称,公司于2026年6月16日召开第六届董事会第十次会议,...