Re: [emf-dev] EMF Compare Name Similarity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [emf-dev] EMF Compare Name Similarity

From: Cédric Brun <cedric.brun@xxxxxxx>
Date: Tue, 09 Jul 2013 09:03:35 +0200
Delivered-to: emf-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/emf-dev>
List-help: <mailto:emf-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/emf-dev>, <mailto:emf-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/emf-dev>, <mailto:emf-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5

Hi,

I'll start by a bit of history.

The original submission of emf compare was in fact made of two differentproducts, one from Intalio and another one from Obeo. Both had acomparison engine and a UI, the one from Obeo was more advancedregarding the engine whereas the one from Intalio was more advanced froman UI perspective. [1] refers to the Intalio product (it was actuallypublished before EMF compare got created). The presentation of this workduring the Modeling Symposium in 2006 led to the creation of the projectin 2007. At that time it was decided to keep Obeo's engine (which wasrelying on the dice coefficient ) and Intalio's UI. The Levensteindistance was used by Intalio's engine.

EMF compare 1.3 is indeed similar to [4] and leverage the dicecoefficient quite a lot. The matching strategy is quite different in EMFCompare 2.x but still use the dice coefficient.



In a nutshell :

> I) EMF Compare 1.x and 2.x use the Dice coefficient with bi-grams forstring similarity

That's right

> II) EMF Compare 2.x uses the Longest Common Subsequence to determinechanges in multi-references of EObjects

That's right, and its used for multi-valued attributes too.

> III) a) is wrong/outdated.

It refers to EMF Compare 1.3 (see the URL) and as such is neither wrongnor outdated but there is no complete description of the 2.x algorithmon the wiki.



Le 05/07/2013 14:53, Simon a écrit :

Hi,
at the moment I am reverse engineering EMF Compare and I've alreadyread much material. I think I found some inconsistencies among thematerial and want to task if I understand things right.
That are the statements in question:
a) According to [1] EMF Compare uses Levenshtein distance for stringsimilarity.b) According to [3] EMF Compare 1.3 is similar to [4]. In [4] the Dicecoefficient (although it is not named explicitly) is used for stringsimilarity.
After a code review of [2] and [5], I came to the following conclusions:
I) EMF Compare 1.x and 2.x use the Dice coefficient with bi-grams forstring similarityII) EMF Compare 2.x uses the Longest Common Subsequence to determinechanges in multi-references of EObjects
III) a) is wrong/outdated.

I appreciate if someone can approve my conclusions.




References:
[1]http://eclipsesummit.org/summiteurope2006/presentations/ESE2006-EclipseModelingSymposium10_EMFCompareUtility.pdf
[2]http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/tree/plugins/org.eclipse.emf.compare.match/src/org/eclipse/emf/compare/match/internal/statistic/NameSimilarity.java?h=1.3
[3]http://wiki.eclipse.org/EMF_Compare/FAQ/1.3#What_kind_of_.22strategies.22_use_EMF_compare_.3F
[4] http://ase.cs.uni-due.de/olbib/p54-xing-241.pdf
[5]http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/tree/plugins/org.eclipse.emf.compare/src/org/eclipse/emf/compare/utils/DiffUtil.java?h=2.1
_______________________________________________
emf-dev mailing list
emf-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/emf-dev

References:
- [emf-dev] EMF Compare Name Similarity
  - From: Simon

Prev by Date: Re: [emf-dev] Switching EMF to use Gerrit
Next by Date: [emf-dev] Vote for Committer status for Maximilian Koegel was started by Ed Merks
Previous by thread: [emf-dev] EMF Compare Name Similarity
Next by thread: [emf-dev] Vote for Committer status for Maximilian Koegel was started by Ed Merks
Index(es):
- Date
- Thread

Breadcrumbs