[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [mylyn-integrators] [WikiText] Is it possible to extract attributes using WikiText
|
Luven,
The good news: The WikiText parser is used behind the scenes in the WikiText editor -- so yes, it can give you the offsets that you need to know where things are in the source markup. The bad news: this functionality was retrofitted into the WikiText parser and is probably not the most intuitive API.
You're pointed in the right direction. org.eclipse.mylyn.wikitext.core.parser.Locator is indeed the place to get the offsets that you need. The WikiText markup editor uses a DocumentBuilder to build a model of the document with exact offsets. A good place to start looking is org.eclipse.mylyn.internal.wikitext.ui.editor.syntax.FastMarkupPartitioner.PartitionBuilder.
If you feel that there are bugs in the implementation (including documentation bugs or lacking documentation) please post a bug at
bugs.eclipse.org under Tools/Mylyn/WikiText. If you're able to attach JUnit tests that exercise/demonstrate the bug, that's even better.
To satisfy my curiosity, perhaps you could tell me more about your use case: Why do you want to know these offsets?
Regards,
David
On Wed, Jul 15, 2009 at 8:24 AM, siluven
<siluven@xxxxxxxx> wrote:
Hello,
now I like to get the begin and end offset of each links in wikitext.
Later maybe other components (images).
Is that possible to do this using Mylyn WikiText?
I tried to use getLocator().getDocumentOffset()
in method:
void
org.eclipse.mylyn.wikitext.core.parser.builder.NoOpDocumentBuilder.link(Attributes
attributes,
String hrefOrHashName, String text) and
void
org.eclipse.mylyn.wikitext.core.parser.builder.NoOpDocumentBuilder.characters(String
text)
but I found out that document offset is not updated after calling
method link(Attributes
attributes, String hrefOrHashName, String text).
The offset is also sometimes decremented. So far I now, it should be
only incremented.
I test by using this simple text below as my MediaWiki WikiText:
The '''EditorX''' is an [[text
editor|editor]] of small to medium-sized [[text]].
This is a [[test]] too ('''yes''').
The Result:
Offset
|
Component type
|
Value
|
Comment
|
-1 |
DOCUMENT_BEGIN
|
[] |
|
0 |
BLOCK_BEGIN
|
[PARAGRAPH] |
|
0 |
CHARACTERS_GROUP |
[The ] |
|
4 |
SPAN_BEGIN |
[BOLD] |
|
7 |
CHARACTERS_GROUP |
[EditorX] |
|
7 |
SPAN_END |
[BOLD] |
|
17 |
CHARACTERS_GROUP |
[ is an ] |
|
7
|
LINK |
[editor] |
The offset
is anyhow decremented (¿Bug?)
|
7
|
CHARACTERS_GROUP |
[ of small to
medium-sized ] |
Now is all
offset incorrect
|
55 |
LINK
|
[text] |
|
55 |
CHARACTERS_GROUP |
[.] |
|
83 |
CHARACTERS_GROUP |
[] |
New line
position is correct now.
|
83 |
CHARACTERS_GROUP |
[This is a ] |
|
93 |
LINK |
[test] |
|
93 |
CHARACTERS_GROUP |
[ too (] |
93 is
offset of the link
|
107 |
SPAN_BEGIN |
[BOLD] |
Here is
correct again
|
110 |
CHARACTERS_GROUP
|
[yes] |
|
110 |
SPAN_END |
[BOLD] |
|
116 |
CHARACTERS_GROUP |
[).] |
|
118 |
BLOCK_END |
[PARAGRAPH] |
|
118 |
DOCUMENT_END |
[] |
|
For heading I am currently using:
getLocator().getLineDocumentOffset()
for beginOffset and
getLocator().getLineDocumentOffset()+getLocator().getLineLength()
for endOffset
and it works so far.
It does not work for link because there can be some links in the same
line.
Best regards,
Luven
On 6/25/2009 6:59 PM, siluven wrote:
Thank you David,
that is the functionality I need. I've tried also with headings and
images.
And it works too.
Best regards,
Luven
On 6/24/2009 6:47 PM, David Green wrote:
You want to do something like this:
public class ExtractHyperlinksBuilder extends NoOpDocumentBuilder {
private Set<String> hyperlinks = new HashSet<String>();
@Override
public void link(Attributes
attributes, String hrefOrHashName, String text) {
hyperlinks.add(hrefOrHashName);
}
@Override
public void imageLink(Attributes
linkAttributes, Attributes imageAttributes, String href, String
imageUrl) {
hyperlinks.add(href);
}
public Set<String>
getHyperlinks() {
return hyperlinks;
}
}
MarkupParser
parser = new MarkupParser(ServiceLocator.getInstance().getMarkupLanguage("MediaWiki"));
ExtractHyperlinksBuilder
builder
= new ExtractHyperlinksBuilder();
parser.setBuilder(builder);
Reader
markupContent = null;// open reader
try {
parser.parse(markupContent);
} finally {
markupContent.close();
}
// do something with
builder.getHyperlinks()
Regards,
David
On Wed, Jun 24, 2009 at 4:13 AM, siluven
<siluven@xxxxxxxx>
wrote:
Hello
everyone,
I am a new Mylyn user. I'm planning to work with Wiki Articles with
java and eclipse.
Is that possible to extract attributes like hyperlinks, headings,
images, etc. directly from e. g. wikimedia markup-language using
WikiText.
like:
- obj.getHyperlinks();
If it is possible or maybe there is solutions for this, how could it be
done?
Thank you and best regards
Luven
_______________________________________________
mylyn-integrators mailing list
mylyn-integrators@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/mylyn-integrators
_______________________________________________
mylyn-integrators mailing list
mylyn-integrators@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/mylyn-integrators
_______________________________________________
mylyn-integrators mailing list
mylyn-integrators@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/mylyn-integrators