Stripping an Atom RSS Feed

By Paul Heinlein | Sep 10, 2014

At work, we’ve got an internal-only blog for some upcoming special projects. Our experience, however, is that blogs get ignored without some external notification system like RSS. The problem is that some of our employees use public aggregators like feedly, which cannot see behind our firewall.

Our internal blog generates an RSS feed, but the feed includes <content> and <summary> sections that we really don’t want published in a public RSS feed.

My solution is to copy the feed to a public web host, but massage it with XSLT to remove the sensitive sections.

An atom-based RSS feed is standard XML, albeit with a defined namespace:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Our Internal Blog></title>
  <link href="/atom.xml" rel="self"/>
  <link href="http://blog.company.com/"/>
  <updated>2014-09-10T16:29:51.335Z</updated>
  <id>http://blog.company.com/</id>
  <entry>
    <title>Nifty Blog Post</title>
    <link href="http://blog.company.com/2014/09/08/nifty-post/"/>
    <id>http://blog.company.com/2014/09/08/nifty-post/</id>
    <published>2014-09-08T21:40:00.000Z</published>
    <updated>2014-09-08T23:40:32.000Z</updated>
    <content>Stuff we want removed from public view...</content>
    <summary>Short version of stuff we want removed...</summary>
  </entry>
</feed>

It took me a while to figure out the namespace issues, but here’s the XSL stylesheet that works for me:

<?xml version="1.0"?>
<xsl:stylesheet
  version='1.0'
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:feed="http://www.w3.org/2005/Atom">
  <!-- strip content and summary elements -->
  <xsl:template match="feed:content"/>
  <xsl:template match="feed:summary"/>
  <!-- print copy of everything else -->
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

I could get rid of the feed: namespace tokens in the template declarations by making feed the default namespace for the stylesheet; the opening declaration would change a bit:

<xsl:stylesheet
  version='1.0'
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/2005/Atom">

I keep the namespace explicit, however, to remind myself what I’d done when I review this stylesheet later.

The end result of the transformation is a feed that lists largely harmless titles and URLs—so people can be notified of updates—but without leaking any potentially sensitive content.