RSS

NaughtyWordChars

Posted January 5, 2006 | 05:00 PM (EST)



stumbleupon :NaughtyWordChars   digg: NaughtyWordChars   reddit: NaughtyWordChars   del.icio.us: NaughtyWordChars

NaughtyWordChars is a plugin to strip malformed characters out of posts pasted in from Microsoft Word documents. It operates via a post-save pre-save call back on MT::Entry, which cleans out the offending characters before they hit the DB. For older entries, there is a edit_entry param callback to clean up the characters when they are loaded in the CMS.

Naughty Word Chars config

NaughtyWordsChars detects double-quotes, single-quotes, elipses, em dashes, and en dashes. They can be replaced with either ASCII equivalents or html character entities. The detection can be toggled on an entry's title, entry body, extended entry, keywords, and excerpt. The pre-save and edit_entry callbacks can be turned on or off, though the edit_entry callback requires BigPAPI.

Here at HuffPost we're using this on MT 3.2 with Apache + mod_cgi + MySQL. I've done some testing with lighttpd + FastCGI, but I'm not confident that it's production-ready in that environment. Any feedback on that (or anything else) would be very welcome.

Anyhow, enough description. Here it is.

Update 1/6: I've only run this under utf-8 ... there's a pretty good chace it will not work with other encodings.

Update 1/6: The pre-save callback now has prority 10 rather than 11

Update 1/17: I've added a few features. You can see more info here, the link above goes to the most recent version of the plugin.

Update 1/30: The clash with BetterFileUpload should be fixed now. More info here.

Update 2/7: Another update here.

Update 10/8: This plugin is now 3.2 and 3.3 compatible.

Comments for this post are now closed