Categories
CSV Java

CSV/Excel Utility improves performance

I recently stumbled across a performance test of Java CSV libraries (can’t remember where). To my surprise, someone tested multiple Open Source products including my own CSV/Excel Utility Package. And even more surprising to me, mine was the worst. By far! It took 4 times as much than others spent in parsing a CSV file. Embarrasing! Why did I never spent effort in measuring performance?

Anyway. I took the time and wrote a comparable JUnit perfromance test for all major CSV libraries and gave them a 150MB file to read. Analyzing the results of my own library with JProfiler, I found a very stupid performance eater (simplified here):

1
2
3
4
5
6
String s = "";
for (char c : anotherString.toCharArray()) {
   ...
   s += c;
   ...
}

The “addition” of line 6 was called 150 million times – for each single character in the file. Replacing this by a StringBuilder construct, the performance rose near the other major CSV libraries:

1
2
3
4
5
6
StringBuilder s = new StringBuilder();
for (char c : anotherString.toCharArray()) {
   ...
   s.append(c);
   ...
}

So two things to learn here: (1) Do not under-estimate performance test before releasing something, (2) Take care when using the “addition” operator for strings. 🙂

PS: CSV/Excel Utility Package 1.7 will contain the fix.
PPS: StringBuilder is preferred against StringBuffer as it is not synchronized and therefore faster. Most use cases allow this simplifications.