knightly

Blog Archives

PHP & DOMDocument slow loading pages

While trying to load a very basic XML template with DOMDocument::load() i ran into some issues last Friday. During the weekend i completely forgot about this. Thank god :) But since it’s Monday i have no other option then to solve it. The problem i was experiencing was page loads took about 2 minutes for every controller action in the application. My first thought was that it would be related to permissions. The application not being able to write certain files. But after checking all permissions the problems still remained.

The logs weren’t showing anything. No errors, nothing. Time to enable the XDebug profiler. And after doing a few requests and loading them up in KCachegrind i found the root of the problem quite fast. A call to the mkDom method was causing the slow load.

But because there are no errors or other indications something is wrong. I added some debug statements to the code in question. And found out that it was PHP’s native DOMDocument::load() that was taking two minutes to load a 200k XML template. That’s not good. So what’s going on here?

First let’s check why there are no errors. Turns out that for some specific reason LIBXML_NOERROR is set. And after disabling this i was presented with an error.

Caught exception: ErrorException: DOMDocument::load()
Entity ‘nbsp’ not defined in /some/file.tal.html, line: 363

WTF? HTML entities inside a XML template. O well. So i changed the  ’s inside the template to their XML equivalent  

Reloaded the page and everything is fast again. This solved my problem. But i wasn’t happy yet. Nobody else was experiencing this issue in the office. So there had to be some other issue. That’s when Geoff notified me about the fact that DTD checks are disabled by using LIBXML_NONET. For this to work however there needs to be an extra W3C package installed.

This was one of those hard to track errors. But it’s fixed and everything is running smooth again.

Parsing XML CDATA with SimpleXML

I was quite amazed today. When loading a SOAP response into a SimpleXMLElement i noticed some fields were left blank. I should have checked the SOAP response first. But instead told our Delphi guy that the response was not filled correctly. This was not the case :)

When we both saw the SOAP response was in perfect shape. We started to poke around on the PHP side. The strange thing was all tag names were taken from the response correctly. It was just the data that was missing. Then we noticed the data missing was inside CDATA tags.

From a first glance at the PHP manual it it wasn’t clear what was going on. So i did some googleing. And found a good post by David Coallier. This post solved the problem. The example showed how to add an extra LIBXML options to the simplexml_load_String method. Although David provided the solution. I still wanted to make post. Maybe it will help somebody.

    // parsing with CDATA tags using the *_load_string method
    $xml = simplexml_load_string($string, 'SimpleXMLElement', LIBXML_NOCDATA);

    // parsing with CDATA tags using the OO way
    $xml = new SimpleXMLElement($string, LIBXML_NOCDATA);

The LIBXML options that can be passed to the *_load methods and constructor can be found in the php documentation.

It’s pretty damn weird though. I want to parse the CDATA tags inside my XML. And can only do so by providing the NOCDATA option.