knightly

Blog Archives

Parsing XML CDATA with SimpleXML

I was quite amazed today. When loading a SOAP response into a SimpleXMLElement i noticed some fields were left blank. I should have checked the SOAP response first. But instead told our Delphi guy that the response was not filled correctly. This was not the case :)

When we both saw the SOAP response was in perfect shape. We started to poke around on the PHP side. The strange thing was all tag names were taken from the response correctly. It was just the data that was missing. Then we noticed the data missing was inside CDATA tags.

From a first glance at the PHP manual it it wasn’t clear what was going on. So i did some googleing. And found a good post by David Coallier. This post solved the problem. The example showed how to add an extra LIBXML options to the simplexml_load_String method. Although David provided the solution. I still wanted to make post. Maybe it will help somebody.

    // parsing with CDATA tags using the *_load_string method
    $xml = simplexml_load_string($string, 'SimpleXMLElement', LIBXML_NOCDATA);

    // parsing with CDATA tags using the OO way
    $xml = new SimpleXMLElement($string, LIBXML_NOCDATA);

The LIBXML options that can be passed to the *_load methods and constructor can be found in the php documentation.

It’s pretty damn weird though. I want to parse the CDATA tags inside my XML. And can only do so by providing the NOCDATA option.

PHP5 SimpleXML and xpath

Yesterday when working with some rather large XML files i noticed. That it’s not possible to do ->xpath() calls on sub nodes of a SimpleXMLElement. If that’s not clear. Here is a small example.

$xmlStr = '<root>
	<fareOption>
		<fare>
			<flight></flight>
			<flight></flight>
			<flight></flight>
		</fare>
	</fareOption>
	<fareOption>
		<fare>
			<flight></flight>
			<flight></flight>
			<flight></flight>
		</fare>
	</fareOption>
</root>';

$objXML = simplexml_load_string($xmlStr);
$fareList = $objXML->xpath('//fareOption');

foreach ($fareList as $fare) {
    $flightList = $fare->xpath('//flight');
    print_r($flightList);
}

So first i try to get all the fareOption elements from the XML structure. And loop through the extracted elements. In the loop i want to do a ->xpath call on $fare which is a instance of SimpleXMLElement. The result however is not what i expected. Instead of extracting all the flights of a $fare node. It extracts all flights from the root of the document. So instead of 3 flight nodes i get 6.

I couldn’t really find a solution for this problem. And from reading some bug reports i understand this is how the PHP implementation of xpath works at the moment. To solve this problem. There is a small work around though. Instead of running the ->xpath() call on the $fare node directly. We can create a new SimpleXMLElement from the $fare node. And do a ->xpath() call on that. This will look something like this.

foreach ($fareList as $fare) {
    $tempList = simplexml_load_string($fare->asXML());
    $flightList = $tempList->xpath('//flight');
    print_r($flightList);
}

This creates the expected output.

Stop ACTA