I am working on a project that requires pulling and processing different XML feeds from the web and storing the data into MongoDB as JSON. Since new feeds come up everyday, changing the Go program to process and publish new feeds is out of the question. A second constraint is that processing has to work in Iron.io or any other linux cloud based environment.
What I needed was a Go program that could take an XML document and XSLT stylesheet at runtime, transform the XML into JSON and then store the JSON to MongoDB. I have some specific field names and other requirements for the JSON document that I need to make sure exist. XSLT makes this real easy to support.
At first I looked at the different C libraries that exist. I figured I could integrate a library using CGO but after a few hours I realized this was not going to work. The libraries I found were huge and complex. Then by chance I found a reference about a program called xsltproc. The program exists both for the Mac and Linux operating systems. In fact, it comes pre-installed on the Mac and an apt-get will get you a copy of the program on your linux operating system.
I have built a sample program that shows how to use xsltproc in your Go programs. Before we download the sample code we need to make sure you have xsltproc installed.
If you are running on a Mac, xsltproc should already exist under /usr/bin
which xsltproc
/usr/bin/xsltproc
On your linux operating system just run apt-get if you don't already have xsltproc installed
sudo apt-get install xsltproc
The xsltproc program will be installed in the same place under /usr/bin. To make sure everything is good, run the xsltproc program requesting the version:
xsltproc --version
xsltproc was compiled against libxml 20708, libxslt 10126 and libexslt 815
libxslt 10126 was compiled against libxml 20708
libexslt 815 was compiled against libxml 20708
To download and try the sample program, open a terminal session and run the following commands:
export GOPATH=$HOME/example
go get github.com/goinggo/xslt
cd $GOPATH/src/github.com/goinggo/xslt
go build
If you want to install the code under your normal GOPATH, start with the 'go get' line. Here are the files that should exist after the build:
main.go -- Source code for test program
deals.xml -- Sample XML document from Yipit
stylesheet.xslt -- Stylesheet to transform the Yipit XML feed to JSON
xslt -- Test program
Let's look at a portion of the XML document the sample program will transform:
<deals>
<list-item>
<yipit_url>[http://yipit.com/business/rondeaus-kickboxing/](/broken-link) </yipit_url >
<end_date>2014-01-2716:00:03</end_date>
<title>Let a Former Pro Teach You a Few Kicks of the Trade Month...</title>
<tags>
<list-item>
<url />
<name>Fitness Classes</name>
<slug>fitness-classes</slug>
</list-item>
</tags>
...
</list-item>
</deals>
The XML can be found in the deals.xml file. It is an extensive XML document and too large to show in its entirety.
Let's look at a portion of the XSLT stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform "
xmlns:str="[http://exslt.org/strings](/broken-link) "
version="1.0"
extension-element-prefixes="str">
<xsl:output method="text" />
<xsl:template name="cleanText">
<xsl:param name="pText" />
<xsl:variable name="cleaned1" select="str:replace($pText, '"', '')" />
<xsl:variable name="cleaned2" select="str:replace($cleaned1, '\', '')" />
<xsl:variable name="cleaned3" select="str:replace($cleaned2, '
', '')" />
<xsl:value-of select="$cleaned3" />
</xsl:template>
...
<xsl:template match="/">{"deals": [
<xsl:for-each select="root/response/deals/list-item">{
"dealid": <xsl:value-of select="id" />,
"feed": "Yipit",
"date_added": "<xsl:value-of select="date_added" />",
"end_date": "<xsl:value-of select="end_date" />",
...
"categories": [<xsl:for-each select="tags/list-item">"<xsl:value-of select="slug"/>"<xsl:choose><xsl:when test="position() != last()">,</xsl:when></xsl:choose></xsl:for-each>],
...
}<xsl:choose><xsl:when test="position() != last()">,
</xsl:when></xsl:choose>
</xsl:for-each>
]}
</xsl:template>
</xsl:stylesheet>
This XSLT can be found in the stylesheet.xslt file. It is an extensive XSLT stylesheet with templates to help cleanup the XML data. Something really great about xsltproc is that it already contains a bunch of great extensions:
./xsltproc_darwin -dumpextensions
//: # ( {http://exslt.org/strings }concat
)
//: # ( {http://exslt.org/dates-and-times}date
)
//: # ( {http://exslt.org/dates-and-times}day-name
)
//: # ( {http://exslt.org/common}object-type
)
//: # ( {http://exslt.org/math}atan
)
//: # ( {http://exslt.org/strings }encode-uri
)
//: # ( {http://exslt.org/strings}decode-uri
)
//: # ( {http://exslt.org/dates-and-times}add-duration
)
//: # ( {http://exslt.org/dates-and-times}difference
)
//: # ( {http://exslt.org/dates-and-times}leap-year
)
//: # ( {http://exslt.org/dates-and-times}month-abbreviation
)
//: # ( {http://exslt.org/dynamic}map
)
//: # ( {http://exslt.org/math}tan
)
//: # ( {http://exslt.org/math}exp
)
//: # ( {http://exslt.org/dates-and-times}date-time
)
//: # ( {http://exslt.org/dates-and-times}day-in-week
)
//: # ( {http://exslt.org/dates-and-times}second-in-minute
)
//: # ( {http://exslt.org/dates-and-times}year
)
//: # ( {http://icl.com/saxon}evaluate
)
//: # ( {http://exslt.org/math}log
)
//: # ( {http://exslt.org/dates-and-times}add
)
//: # ( {http://exslt.org/dates-and-times}day-abbreviation
)
//: # ( {http://icl.com/saxon}line-number
)
//: # ( {http://exslt.org/math}constant
)
//: # ( {http://exslt.org/sets}difference
)
//: # ( {http://exslt.org/dates-and-times}duration
)
//: # ( {http://exslt.org/dates-and-times}minute-in-hour
)
//: # ( {http://icl.com/saxon}eval
)
//: # ( {http://exslt.org/math}min
)
//: # ( {http://exslt.org/math}max
)
//: # ( {http://exslt.org/math}highest
)
//: # ( {http://exslt.org/math}random
)
//: # ( {http://exslt.org/math}sqrt
)
//: # ( {http://exslt.org/math}cos
)
//: # ( {http://exslt.org/sets}has-same-node
)
//: # ( {http://exslt.org/strings}tokenize
)
//: # ( {http://exslt.org/dates-and-times}seconds
)
//: # ( {http://exslt.org/dates-and-times}time
)
//: # ( {http://exslt.org/dynamic}evaluate
)
//: # ( {http://exslt.org/common}node-set
)
//: # ( {http://exslt.org/dates-and-times}month-name
)
//: # ( {http://exslt.org/dates-and-times}week-in-year
)
//: # ( {http://exslt.org/math}acos
)
//: # ( {http://exslt.org/sets}intersection
)
//: # ( {http://exslt.org/sets}leading
)
//: # ( {http://exslt.org/sets}trailing
)
//: # ( {http://exslt.org/strings}replace
)
//: # ( {http://exslt.org/dates-and-times}day-in-year
)
//: # ( {http://icl.com/saxon}expression
)
//: # ( {http://exslt.org/math}abs
)
//: # ( {http://exslt.org/math}sin
)
//: # ( {http://exslt.org/math}asin
)
//: # ( {http://exslt.org/math}atan2
)
//: # ( {http://exslt.org/sets}distinct
)
//: # ( {http://exslt.org/dates-and-times}hour-in-day
)
//: # ( {http://exslt.org/dates-and-times}sum
)
//: # ( {http://exslt.org/dates-and-times}week-in-month
)
//: # ( {http://exslt.org/strings}split
)
//: # ( {http://exslt.org/strings}padding
)
//: # ( {http://exslt.org/strings}align
)
//: # ( {http://exslt.org/dates-and-times}day-in-month
)
//: # ( {http://exslt.org/dates-and-times}day-of-week-in-month
)
//: # ( {http://exslt.org/dates-and-times}month-in-year
)
//: # ( {http://xmlsoft.org/XSLT/}test
)
//: # (
)
//: # ( Registered Extension Elements:
)
//: # ( {http://exslt.org/common}document
)
//: # ( {http://exslt.org/functions}result
)
//: # ( {http://xmlsoft.org/XSLT/}test
)
//: # (
)
//: # ( Registered Extension Modules:
)
//: # ( http://exslt.org/functions
)
//: # ( http://icl.com/saxon
)
//: # ( http://xmlsoft.org/XSLT/ )
Look at the stylesheet to see how to access these extensions. I am using the strings extension to help replace characters that are not JSON compliant.
Now let's look at the sample code that uses xsltproc to process the XML against the XSLT stylesheet:
package main
import (
"encoding/json"
"fmt"
"os"
"os/exec"
)
type document map[string]interface{}
func main() {
jsonData, err := processXslt("stylesheet.xslt", "deals.xml")
if err != nil {
fmt.Printf("ProcessXslt: %s\n", err)
os.Exit(1)
}
documents := struct {
Deals []document `json:"deals"`
}{}
err = json.Unmarshal(jsonData, &documents)
if err != nil {
fmt.Printf("Unmarshal: %s\n", err)
os.Exit(1)
}
fmt.Printf("Deals: %d\n\n", len(documents.Deals))
for _, deal := range documents.Deals {
fmt.Printf("DealId: %d\n", int(deal["dealid"].(float64)))
fmt.Printf("Title: %s\n\n", deal["title"].(string))
}
}
func processXslt(xslFile string, xmlFile string) (jsonData []byte, err error) {
cmd := exec.Cmd{
Args: []string{"xsltproc", xslFile, xmlFile},
Env: os.Environ(),
Path: "xsltproc",
}
jsonString, err := cmd.Output()
if err != nil {
return jsonData, err
}
fmt.Printf("%s\n", jsonString)
jsonData = []byte(jsonString)
return jsonData, err
}
The processXslt function uses an exec.Cmd object to shell out and run the xsltproc program. The key to making this work is the cmd.Output function. The xsltproc program will return the result of the transformation to stdout. This means we only need to write the xml and xslt files to disk before running xsltproc. We will receive the result from xsltproc as a string from the cmd.Output call.
Once the processXslt function has the resulting JSON transformation from xsltproc, the JSON is displayed on the screen and then converted to a slice of bytes for further processing.
In main after the call to the processXslt function, the slice of bytes containing the JSON transformation is unmarshalled into a map so it can be consumed by our Go program and displayed on the screen. In the future that map can be stored in MongoDB via the mgo MongoDB driver.
The xsltproc program can be uploaded to any cloud environment that will allow you to write the XML and XSLT to disk. I have been successful in using xsltproc inside an Iron.io IronWorker container.
If you have the need to process XSLT in your Go programs, give this a try.