Summary: | MediaWikiApiImageFetchingStrategy.java does not fetch all images | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | z_Archived | Reporter: | Carsten Hammer <carsten.hammer> | ||||||
Component: | Mylyn | Assignee: | Carsten Hammer <carsten.hammer> | ||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||
Severity: | normal | ||||||||
Priority: | P3 | CC: | greensopinion, john.arthorne, steffen.pingel | ||||||
Version: | 1.7 | Keywords: | contributed | ||||||
Target Milestone: | 2.0 | ||||||||
Hardware: | All | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
Bug Depends on: | 418964 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Carsten Hammer
2012-06-25 03:50:03 EDT
Trying to provide a patch file by dragging from git history does not work as expected. I generates a file too big to be attached for these few lines. Either the eclipse git support is buggy or I do something the wrong way. However commiting to gerrit does not work anyway. So here the diff as shown in eclipse git history: commit e2c6cca966e6d756ec0fc7f75ef76a046fe80e2c Author: chammera1a <carsten.hammer@oce.com> 2012-06-27 17:02:51 Committer: chammera1a <carsten.hammer@oce.com> 2012-06-27 17:37:34 Parent: 58b45e9c4b8bc1e12f986f7bb8887ae8fb2f5af8 (bug 348237: wikitext doesn't handle image file names properly https://bugs.eclipse.org/bugs/show_bug.cgi?id=348237) Branches: master Change-Id: I56e5f418b8c746f1ddba362142d1d6e34588c4e6 org.eclipse.mylyn.wikitext.mediawiki.core/src/org/eclipse/mylyn/internal/wikitext/mediawiki/core/tasks/MediaWikiApiImageFetchingStrategy.java diff --git a/org.eclipse.mylyn.wikitext.mediawiki.core/src/org/eclipse/mylyn/internal/wikitext/mediawiki/core/tasks/MediaWikiApiImageFetchingStrategy.java b/org.eclipse.mylyn.wikitext.mediawiki.core/src/org/eclipse/mylyn/internal/wikitext/mediawiki/core/tasks/MediaWikiApiImageFetchingStrategy.java index 2231f57..3c32ca5 100644 --- a/org.eclipse.mylyn.wikitext.mediawiki.core/src/org/eclipse/mylyn/internal/wikitext/mediawiki/core/tasks/MediaWikiApiImageFetchingStrategy.java +++ b/org.eclipse.mylyn.wikitext.mediawiki.core/src/org/eclipse/mylyn/internal/wikitext/mediawiki/core/tasks/MediaWikiApiImageFetchingStrategy.java @@ -56,7 +56,7 @@ throw new BuildException("please specify @pageName"); //$NON-NLS-1$ } if (!pageName.equals(pageName.trim())) { - throw new BuildException("@pageName must not have leading or training whitespace"); //$NON-NLS-1$ + throw new BuildException("@pageName must not have leading or trailing whitespace"); //$NON-NLS-1$ } String base; @@ -69,90 +69,97 @@ base += "/"; //$NON-NLS-1$ } - URL apiUrl; - try { - String queryString = String.format( - "action=query&titles=%s&generator=images&prop=imageinfo&iiprop=url&format=xml", URLEncoder.encode(pageName, "UTF-8")); //$NON-NLS-1$ //$NON-NLS-2$ - apiUrl = new URL(base + "api.php?" + queryString); //$NON-NLS-1$ - } catch (Exception e) { - throw new BuildException("Cannot compose API URL", e); //$NON-NLS-1$ - } - + ImageFetchingContentHandler contentHandler = new ImageFetchingContentHandler(); + String gimcontinue = null; Set<String> filenames = new HashSet<String>(); - final SAXParserFactory parserFactory = SAXParserFactory.newInstance(); parserFactory.setNamespaceAware(true); parserFactory.setValidating(false); + do { + contentHandler.setGimcontinue(null); + URL apiUrl; + try { + String queryString = String.format( + "action=query&titles=%s&generator=images&prop=imageinfo&iiprop=url&format=xml%s", //$NON-NLS-1$ + URLEncoder.encode(pageName, "UTF-8"), (gimcontinue == null ? "" : "&gimcontinue=" + URLEncoder.encode(gimcontinue, "UTF-8"))); //$NON-NLS-1$ + apiUrl = new URL(base + "api.php?" + queryString); //$NON-NLS-1$ + } catch (Exception e) { + throw new BuildException("Cannot compose API URL", e); //$NON-NLS-1$ + } - Reader input; - try { - input = new InputStreamReader(new BufferedInputStream(apiUrl.openStream()), "UTF-8"); //$NON-NLS-1$ - } catch (IOException e) { - throw new BuildException(String.format("Cannot contact %s: %s", apiUrl, e.getMessage()), e); //$NON-NLS-1$ - } - try { - SAXParser saxParser = parserFactory.newSAXParser(); - XMLReader xmlReader = saxParser.getXMLReader(); - xmlReader.setEntityResolver(IgnoreDtdEntityResolver.getInstance()); - - ImageFetchingContentHandler contentHandler = new ImageFetchingContentHandler(); - xmlReader.setContentHandler(contentHandler); + Reader input; + try { + input = new InputStreamReader(new BufferedInputStream(apiUrl.openStream()), "UTF-8"); //$NON-NLS-1$ + } catch (IOException e) { + throw new BuildException(String.format("Cannot contact %s: %s", apiUrl, e.getMessage()), e); //$NON-NLS-1$ + } try { - xmlReader.parse(new InputSource(input)); - } catch (IOException e) { - throw new BuildException(String.format("Unexpected exception retrieving data from %s", apiUrl), e); //$NON-NLS-1$ - } finally { + SAXParser saxParser = parserFactory.newSAXParser(); + XMLReader xmlReader = saxParser.getXMLReader(); + xmlReader.setEntityResolver(IgnoreDtdEntityResolver.getInstance()); + + xmlReader.setContentHandler(contentHandler); + try { - input.close(); + xmlReader.parse(new InputSource(input)); + gimcontinue = contentHandler.getGimcontinue(); } catch (IOException e) { - // ignore - } - } - int fileCount = 0; - for (Map.Entry<String, String> ent : contentHandler.imageTitleToUrl.entrySet()) { - String title = ent.getKey(); - String imageUrl = ent.getValue(); - Matcher titleMatcher = imageTitlePattern.matcher(title); - if (titleMatcher.matches()) { - String name = titleMatcher.group(1); - name = name.replace(' ', '_'); - String qualifiedUrl = base; - if (imageUrl.matches("https?://.*")) { //$NON-NLS-1$ - qualifiedUrl = imageUrl; - } else { - if (imageUrl.startsWith("/")) { //$NON-NLS-1$ - qualifiedUrl += imageUrl.substring(0); - } else { - qualifiedUrl += imageUrl; - } - } - - log("Fetching " + qualifiedUrl, Project.MSG_INFO); //$NON-NLS-1$ - Get get = new Get(); - get.setProject(getProject()); - get.setLocation(getLocation()); + throw new BuildException(String.format("Unexpected exception retrieving data from %s", apiUrl), e); //$NON-NLS-1$ + } finally { try { - get.setSrc(new URL(qualifiedUrl)); - } catch (MalformedURLException e) { - log("Skipping " + url + ": " + e.getMessage(), Project.MSG_WARN); //$NON-NLS-1$ //$NON-NLS-2$ - continue; + input.close(); + } catch (IOException e) { + // ignore } - get.setDest(new File(dest, name)); - get.execute(); - - filenames.add(name); - ++fileCount; - } else { - log(String.format("Unexpected title format: %s", title), Project.MSG_WARN); //$NON-NLS-1$ } + } catch (SAXException e) { + throw new BuildException("Unexpected error in XML content", e); //$NON-NLS-1$ + } catch (ParserConfigurationException e) { + throw new BuildException("Cannot configure SAX parser", e); //$NON-NLS-1$ } - log("Fetched " + fileCount + " image files for " + pageName, Project.MSG_INFO); //$NON-NLS-1$ //$NON-NLS-2$ - } catch (SAXException e) { - throw new BuildException("Unexpected error in XML content", e); //$NON-NLS-1$ - } catch (ParserConfigurationException e) { - throw new BuildException("Cannot configure SAX parser", e); //$NON-NLS-1$ + + } while (gimcontinue != null); + int fileCount = 0; + for (Map.Entry<String, String> ent : contentHandler.imageTitleToUrl.entrySet()) { + String title = ent.getKey(); + String imageUrl = ent.getValue(); + Matcher titleMatcher = imageTitlePattern.matcher(title); + if (titleMatcher.matches()) { + String name = titleMatcher.group(1); + name = name.replace(' ', '_'); + String qualifiedUrl = base; + if (imageUrl.matches("https?://.*")) { //$NON-NLS-1$ + qualifiedUrl = imageUrl; + } else { + if (imageUrl.startsWith("/")) { //$NON-NLS-1$ + qualifiedUrl += imageUrl.substring(0); + } else { + qualifiedUrl += imageUrl; + } + } + + log("Fetching " + qualifiedUrl, Project.MSG_INFO); //$NON-NLS-1$ + Get get = new Get(); + get.setProject(getProject()); + get.setLocation(getLocation()); + try { + get.setSrc(new URL(qualifiedUrl)); + } catch (MalformedURLException e) { + log("Skipping " + url + ": " + e.getMessage(), Project.MSG_WARN); //$NON-NLS-1$ //$NON-NLS-2$ + continue; + } + get.setDest(new File(dest, name)); + get.execute(); + + filenames.add(name); + ++fileCount; + } else { + log(String.format("Unexpected title format: %s", title), Project.MSG_WARN); //$NON-NLS-1$ + } } + log("Fetched " + fileCount + " image files for " + pageName, Project.MSG_INFO); //$NON-NLS-1$ //$NON-NLS-2$ + return filenames; } @@ -174,15 +181,27 @@ private class ImageFetchingContentHandler implements ContentHandler { - final Map<String, String> imageTitleToUrl = new HashMap<String, String>(); + public void setGimcontinue(String gimcontinue) { + this.gimcontinue = gimcontinue; + } + + public String getGimcontinue() { + return gimcontinue; + } + + private final Map<String, String> imageTitleToUrl = new HashMap<String, String>(); private String currentPage = null; private boolean inImageInfo = false; + private String gimcontinue = null; + public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException { if ("page".equals(localName)) { //$NON-NLS-1$ currentPage = atts.getValue("title"); //$NON-NLS-1$ + } else if ("images".equals(localName)) { //$NON-NLS-1$ + gimcontinue = atts.getValue("gimcontinue"); //$NON-NLS-1$ } else if ("imageinfo".equals(localName)) { //$NON-NLS-1$ inImageInfo = true; } else if (inImageI I added a patch fixing this as gerrit commit at https://git.eclipse.org/r/#/c/6537/. Is there something wrong with this approach to contribute? Or is there another better way to transfer contributions? Carsten, thanks for posting a review. I apologize for the slow response on this one. Creating a bug referencing a Gerrit review is definitely the best way to provide contributions. Alternatively you can attach your changes to the bug by creating a patch. It's very difficult to grok changes in a comment on a task. It would be great to have some unit tests for this one. Consider mocking up some of the components to avoid having to hit a server. I've commented on the review. Created attachment 218470 [details]
change to take into account review
Created attachment 218471 [details]
mylyn/context/zip
I finally created a gerrit review containing your suggestions at https://git.eclipse.org/r/#/c/6710/ It seems by abandoning a unchanged second patchset at the old gerrit review I abandened not only the patch but the whole review. The gerrit user interface is somewhat confusing in parts. Carsten, can you please answer these questions[1] as a comment on this bug 1. I have authored 100% of the content I'm contributing 2. I have the rights to donate the content to Eclipse 3. I contribute the content under the EPL fn1. http://wiki.eclipse.org/Development_Resources/Handling_Git_Contributions Hi David, 1. I have authored 100% of the content I'm contributing 2. I have the rights to donate the content to Eclipse 3. I contribute the content under the EPL Best regards, Carsten Hammer Hi David, anything else that I have to do to make it possible to merge it into mainstream? I think I dont have the right to do it myself. Best regards, Carsten Carsten, I've checked this change manually and it seems to work as expected, however I would like to see some unit tests to verify that it works as expected for responses containing a gimcontinue and responses without. Closed as part of backlog clean-up. Please re-open if you'd like to see this revisited, perhaps with a contribution. Reopening since this bug has been confirmed and there appears to be a fix in Gerrit. David, I think it would be good to consider this for the next (service) release. Thanks for the contribution Carsten, sorry this has taken so long to get in. I've merged the review. *** Bug 418964 has been marked as a duplicate of this bug. *** I grabbed the latest nightly snapshot from Mylyn downloads page and tried but still have the same problem. I'm guess it just hasn't been picked up by a build yet. John, for a short period there new builds were not being published correctly. It should be up there now. Let me know if you have any problems. |