Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[emf-dev] EMF Compare Name Similarity

Hi,

at the moment I am reverse engineering EMF Compare and I've already read much material. I think I found some inconsistencies among the material and want to task if I understand things right.

That are the statements in question:
a) According to [1] EMF Compare uses Levenshtein distance for string similarity. b) According to [3] EMF Compare 1.3 is similar to [4]. In [4] the Dice coefficient (although it is not named explicitly) is used for string similarity.


After a code review of [2] and [5], I came to the following conclusions:
I) EMF Compare 1.x and 2.x use the Dice coefficient with bi-grams for string similarity II) EMF Compare 2.x uses the Longest Common Subsequence to determine changes in multi-references of EObjects
III) a) is wrong/outdated.

I appreciate if someone can approve my conclusions.




References:

[1] http://eclipsesummit.org/summiteurope2006/presentations/ESE2006-EclipseModelingSymposium10_EMFCompareUtility.pdf

[2] http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/tree/plugins/org.eclipse.emf.compare.match/src/org/eclipse/emf/compare/match/internal/statistic/NameSimilarity.java?h=1.3

[3] http://wiki.eclipse.org/EMF_Compare/FAQ/1.3#What_kind_of_.22strategies.22_use_EMF_compare_.3F

[4] http://ase.cs.uni-due.de/olbib/p54-xing-241.pdf

[5] http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/tree/plugins/org.eclipse.emf.compare/src/org/eclipse/emf/compare/utils/DiffUtil.java?h=2.1


Back to the top