RSS Teaser Field
March 22, 2006
Since developing my own blog/content management system/RSS feeds I have learned a lot about how to manage this site and make it more efficient. I've created a backend system which, in essence, creates static HTML pages from data dynamically generated from a MySQL database. All this is done using PHP. I'm proud of the work and the amount of traffic, lower Alexa Rating, and higher Google PageRank that make the rewards that much sweeter.
One of the drawbacks to creating my own system however has been working with the RSS feeds. As part of the administrative interface to this site is a "create archive" script. This single PHP file is responsible for creating everything you see on the front-end. After I click the link to generate the archive, a process goes on behind the scenes that deletes every HTML file associated with the blog, and generates new HTML files based on what exists in the blog database. Included with that is the creation of the RSS feeds.
Creating an RSS feed really is pretty straightforward. It basically consists of an XML file with some predefined tags. There are many different XML tags you can use, but I've incorporated the basics such as a post title, description, date, etc. Once the data for each post is pulled from the database, it is simply a matter of inserting the appropriate data between two tags and saving the entire string to an XML file.
When I first started generating RSS feeds, I only incorporated text. Specifically, I would go to the description of each post and use the first 150 characters in that post for the feed. More often than not the description would get truncated in the middle of a word so I would use "..." at the end of the last character so people knew I had not inadvertently cut off the description in the middle of a word. I then increased the RSS description to somewhere around 500 characters and that is when I started to run into problems.
Every once in awhile, I would create a hyperlink inside the description of a post and it would get cut off by the truncation limit. The href tag would start, but there would be no end tag! As a result, every post before that one would have all of the text in the RSS description appear as one huge hyperlink. I don't know why, I would think that the XML ending description tag would "close" that post and start the next one as a new post, but it did not. The same thing would happen if I used a bold tag and the description got truncated before the ending bold tag. Every post before that would have all of the RSS descriptions come in entirely in bold.
Then I decided to start adding images to the feeds and I really ran into problems. Basically my feeds were coming up as invalid because of the img tags. I didn't really know what the problem was and beforehand had signed up with FeedBurner and with the help of their excellent technical support was able to fix the problem. Basically I had to run my RSS description fields through the PHP htmlentities function to convert all special characters to HTML entities so they would not trigger a start or end tag inside the RSS XML file.
Problem solved! Almost...I was still having issues with the href tags and any type of character formatting tags (bold, italics, etc.) that were getting truncated by the 500 character mark I was using before the pertinent end tag. So, my solution was to add an "RSS teaser" field to my blog entries table. This way I could specify EXACTLY which text should be presented in the RSS description field to avoid any problems with missing HTML end tags. It was really quite simple, just a matter of adding one field the entries table, adding a text box to the entries add/edit pages, and slightly editing my PHP scripts which handle the entries add/edit functions.
Now the problem has finally been solved. My RSS feeds are completely valid and I'm not having issues with truncated HTML end tags.