I had to parse in Java an rather big XML document that was looking something like that:
There are hundreds of first level child nodes, but each one only has a few children.
Given any child node, I had to get one of its own children by its name. I had two options:
- iterate over the list of children, and check each one to see if it’s the one I am looking for (5 lines of code);
- or be lazy and use a very simple XPath expression (1 line of code).
Being lazy, and since the context nodes were having only a few child nodes, I decided to go with XPath. I knew there would be some overhead, but it shouldn’t be very big, right?
My code wasn’t doing anything fancy, but it took 40 seconds to complete. Being curious about the overhead introduced by XPath, I replaced it by a bunch of node.getChildNodes(). The total time dropped to 0.1 second.
I wrote a small benchmark to illustrate this problem. It boils down to these two methods:
The full code is available on github.
On my linux box with OpenJDK 1.6.0_20, I got the following results:
Lesson learned: javax.xml.xpath is slow as hell, even for very simple expressions.