Community
Participate
Working Groups
I'm developing NS chart generator for C source code based on Eclipse CDT. During development, I experienced bad performance & OutOfMemory Error caused by CDT indexer. I tried identifying what was wrong at debug mode and found a strange thing. During parsing C source, so many IASTProblem objects are generated in DOMScanner.handleProblem() and cause OutOfMemory Error. As I understand, These objects are generated to log SCANNER_BAD_CHARACTER problems. In detail, the problem is caused by null character( '\0' ) found in C source. many null character => many IASTProblem objects => OutOfMemory Error. In order to identify which source includes null character, I modify BaseScanner.fetchToken() as following, BaseScanner.java line 2026 ============================================ // skip over anything we don't handle char [] x = new char [1]; x[0] = buffer[pos]; if( x[0] == '\0') { System.out.println("Suspicious String:"); int s = Math.max(0, pos-100); for( int i=s; i<pos; i++ ){ System.out.print( buffer[i]); } System.out.println(); }else { handleProblem( IASTProblem.SCANNER_BAD_CHARACTER, pos, x ); } ============================================ I tested it using Apache Httpd 2.2.3 source code and I got following log. This log was generated during parsing "mod_auth_basic.c" file and logged string was part of "ap_config.h". But, "ap_config.h" file did not include any null character. ============================================ Suspicious String: VERSION__ >= 199901L) #define AP_HAVE_DESIGNATED_INITIALIZER #endif #endif /* AP_CONFIG_H */ Suspicious String: ERSION__ >= 199901L) #define AP_HAVE_DESIGNATED_INITIALIZER #endif #endif /* AP_CONFIG_H */ Suspicious String: RSION__ >= 199901L) #define AP_HAVE_DESIGNATED_INITIALIZER #endif #endif /* AP_CONFIG_H */ ============================================ I guess that unintentional null charaters are appended to buffer during macro extenstion of C source file and these cause waste of CPU and memory. Because I don't understand CDT inside, My guess may be wrong or right. Please check my guess, and I hope my observation would be helpful to improve CDT. In addition, I found linear search logic for long array from several position of CDT source code. As we know, performance of linear search is O(n) or in proportion to data size. so I think we can consider changing it into other algorithm as binary search for handling heavy project. ( of course, only when the array is sorted ) examples of method using linear search for array. ============================ ArrayUtil.append() _CompositeContext.findContextContainingOffset() LocationMap.getLocation()
I found out where null characters are inserted. In CodeReader.load() method, unintentional null characters are inserted during reading file stream and decoding to character array. CodeReader.load() line 102. =============================== CharBuffer charBuffer = Charset.forName(encoding).decode(byteBuffer); if (charBuffer.hasArray()) return charBuffer.array(); // Got to copy it out char[] buff = new char[charBuffer.length()]; charBuffer.get(buff); return buff; =============================== In order to aviod this problem, I modified it like following, =============================== CharBuffer charBuffer = Charset.forName(encoding).decode(byteBuffer); //if (charBuffer.hasArray()) // return charBuffer.array(); // Got to copy it out char[] buff = new char[charBuffer.length()]; charBuffer.get(buff,0,charBuffer.length()); return buff; =============================== and I checked SCANNER_BAD_CHARACTER problem caused by null char didn't occur in BaseScanner.fetchToken() method after this modification. My solution is proper? please check. regards,
As summary, In my PC ( it is old and very slow :) ), I tested fast indexer using Apache Httpd 2.2.3 source project. when using original CDT core, my eclipse crashed after about 4 minutes with OutOfMemory Error generating 300KB PDOM index. when using CDT core modified by me, my eclipse completed index after about 4 minute generating 4MB PDOM index. Following is my modification : 1. modify CodeReader.load() in order to avoid unintentional null character. 2. replace linear search algorithm with binary search algorithm in following mehtods, ArrayUtil.append() _CompositeContext.findContextContainingOffset() LocationMap.getLocation() regards,
(In reply to comment #2) > Following is my modification : > 1. modify CodeReader.load() in order to avoid unintentional null character. > 2. replace linear search algorithm with binary search algorithm in following > mehtods, > ArrayUtil.append() > _CompositeContext.findContextContainingOffset() > LocationMap.getLocation() > regards, Can you help out by providing a patch?
(In reply to comment #3) > (In reply to comment #2) > Can you help out by providing a patch? I think publishing new patch is beyond my authority and I already submit all information that I found for authorized developers of CDT. Most of all, because I don't understand CDT inside well, I can not sure my modification has no side effect. we need careful check of experts before applying something. regards,
> I think publishing new patch is beyond my authority > and I already submit all information that I found for authorized developers of > CDT. Everybody that has a bugzilla account can attach patches to bugs. See http://wiki.eclipse.org/index.php/CDT/contributing > Most of all, because I don't understand CDT inside well, I can not sure my > modification has no side effect. we need careful check of experts before > applying something. A patch will be applied only after it has been looked at. You seem to have enough expertise to provide helpful input.
Created attachment 52833 [details] avoid unexpected null character problem it slightly change way to get character array from C source file.
Created attachment 52834 [details] improve search of long array and avoid unexpected null character in addition attachment #52833 [details], it changes search logic of long array from linear search into binary search. but I'm not sure my change is correct or not. please check.
Even though my suggested patches are based on source code in CVS( it may be 4.0.0 rc? ), My investigation covered CDT 3.1.1 not 4.0. Actually, I have seen CDT 4.0 code first time only to make patch files, and I feel CDT 4.0 is somewhat different from 3.1.1 in many points of view. Maybe my patches can not apply to CDT 4.0, I think. with CDT core 4.0.0(?), I tested fast index facility on Apache Httpd 2.2.3 project. when my patches was not applied: About 4 minutes later since indexer started, eclipse crashed with OutOfMemory Error and 300KB index file. after applying attachment #52833 [details], About 20 minutes later since indexer started, Indexer was still working and generated 2MB index file. I stopped indexer by force. after applying attachment #52834 [details], About 20 minutes later since indexer started, Indexer was still working and generated 20MB index file. I stopped indexer by force. Actually, I was surprised the result because I expected indexing would complete with 4MB index file in 4~5 minutes just like CDT 3.1.1 after applying attachment #52834 [details]. regards,
Thanks Jeong-Sik! I have applied your first patch. There were bugs in the second one, I have made changes to the LocationMap along your ideas. I stayed away from changing ArrayUtil, see bug 162755. Fixed in 4.0 > 20061030.
assigning
done