I recently stumbled across a performance test of Java CSV libraries (can’t remember where). To my surprise, someone tested multiple Open Source products including my own CSV/Excel Utility Package. And even more surprising to me, mine was the worst. By far! It took 4 times as much than others spent in parsing a CSV file. Embarrasing! Why did I never spent effort in measuring performance?
Anyway. I took the time and wrote a comparable JUnit perfromance test for all major CSV libraries and gave them a 150MB file to read. Analyzing the results of my own library with JProfiler, I found a very stupid performance eater (simplified here):
1 2 3 4 5 6 | String s = ""; for (char c : anotherString.toCharArray()) { ... s += c; ... } |
The “addition” of line 6 was called 150 million times – for each single character in the file. Replacing this by a StringBuilder construct, the performance rose near the other major CSV libraries:
1 2 3 4 5 6 | StringBuilder s = new StringBuilder(); for (char c : anotherString.toCharArray()) { ... s.append(c); ... } |
So two things to learn here: (1) Do not under-estimate performance test before releasing something, (2) Take care when using the “addition” operator for strings. 🙂
PS: CSV/Excel Utility Package 1.7 will contain the fix.
PPS: StringBuilder
is preferred against StringBuffer
as it is not synchronized and therefore faster. Most use cases allow this simplifications.