ManticMoo.COM -> Jeff's Articles -> Programming Articles -> Problems parsing through javascript with the libxml SAX parser

Problems parsing through javascript with the libxml SAX parser

by Jeffrey P. Bigham

libxml is an easy-to-use C library useful for handling XML and HTML files. I use it to automatically alter webpages in an attempt to increase accessibility. I recently ran into some trouble with the SAX parser not correctly handling javascript that output HTML tags.

For instance, the following code would make it fail in an odd way:


...
<script language="javascript">
document.write('<div>');
...
</script>
...

would become:


...
<script language="javascript">
document.write('</script></div>');
...
...

Eventually, I discovered that the problem has been fixed in the most recent version of the libxml library. I was using version 2.6.1, but the most recent version 2.6.26 released in June 2006 fixes the problem. All recent versions can be downloaded from xmlsoft's homepage.

ManticMoo.COM -> Jeff's Articles -> Programming Articles -> Problems parsing through javascript with the libxml SAX parser