Suggestion for the content monitoring plugin

0

If you’re using this plugin, add in the term askaak.net to its filter. It’s one of those “Your computer is infected, download our virus protection which is really malware” sites.

edit: I won’t give you links but all they’re doing is putting up offsite animated gifs of youtube like video and encouraging folks to click on those pretty girl images. They get around kses as it’s nothing but a regular ‘img’ link surrounded by an ‘a’ link and the image is offsite so the gif stays animated.

  • drmike
    • DEV MAN’s Mascot

    For those who do not allow any advertising by end users, here’s the domains that cj.com uses for it’s links:

    anrdoezrs.net

    dpbolvw.net

    kqzyfj.com

    tkqlhce.com

    jdoqocy.com

    I *think* those are spelled correctly.

    We’re also seeing a lot of MLM splogs with team41.com included in the content.

  • Adam W. Warner
    • Site Builder, Child of Zeus

    I have this plugin installed and originally I had defined 8 terms. The first seven were George Carlin’s, “the seven words you can’t say on television”, and the eighth was my url, http://www.mysite.com.

    I have two people beta testing for me, and the filter was sending me an email every time a post was made or a page was created. I figured this was due to having my url in there, so I removed that and left only the seven words. However, I still get an email every time my testers add content, and I look every time, but do not find any “bad” words.

    I looked in the database for a table for this plugin, but couldn’t find one. Can anyone verify the table I should look in to see if my url is still in there by mistake, or offer any other solutions or thoughts?

  • drmike
    • DEV MAN’s Mascot

    It’s saved as the option ‘content_monitor_bad_words’ within the sitemeta table. At least that’s what I see in the plugin. Not 100% sure though as I’m at the college that locks me our of the non-standard ports and I can’t log into CPanel or Direct Admin to check.

    Make sure you don’t have any empty spaces at the end of your list. We’ve hit that problem in the past.

  • Adam W. Warner
    • Site Builder, Child of Zeus

    …still sending content notifications after verifying the disallowed words do not include my url anymore and that there are no empty spaces (as far as I can tell). I made a quick test post on a user blog and received a content notification email.

    I think I’ll remove the plugin, drop the content_monitor_bad_words option, reinstall, and see what happens.

  • Adam W. Warner
    • Site Builder, Child of Zeus

    OK, I’ve done some more testing, and here are the results:

    I removed the content monitor plugin.

    I removed the content_monitor_post_monitoring, content_monitor_email, and content_monitor_bad_words records from the wp_sitemeta table.

    I reinstalled the plugin.

    Configured the bad words (just bad words, no url).

    I set the option, Post/Page Monitoring::, to “Disable”

    Checked the wp_sitemeta table. It contained the content_monitor_post_monitoring, content_monitor_email, and content_monitor_bad_words records.

    Made a post on a user blog that contained no bad words…..no email….great.

    Made a post that contained a bad word….no email….expected because of Post/Page Monitoring:: being disabled.

    I then set Post/Page Monitoring:: to enable.

    Made a post with no bad word….received email warning of possible bad word….hmmmmm.

    Then it gets even more weird…

    Made a post with a bad word, then I received TWO emails warning of possible bad word. One from this url…http://site.example.com/?p=9, and one from this url…http://site.example.com/2008/06/03/another-test-4/

    …something definitely seems to be working incorrectly.

    Can anyone else verify the same?

  • drmike
    • DEV MAN’s Mascot

    OK, finally got a chance to look at this and I’m getting emails as well even though the content is not in the list.

    One thing I noticed though is that when the content contains words on the monitoring list, it sends the correct url. When the post doesn’t contain any of the words, it sends the old style url within the email.

    example:

    Post with monitored content: http://testuser.mydomain.tld/2008/06/09/test-post-1/

    Post without monitored content: http://testuser.mydomain.tld/?p=27

    I also noticed that posts with monitored content get a second email with the non pretty url along with the email with the pretty url.

    It’s weird.

  • drmike
    • DEV MAN’s Mascot

    Ok, here;’s where it gets really interesting. I added to the email notice code that shows how many flagged words there are. I’m using the following content:

    Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry’s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

    Flagged works are “wordpress, elvis, thelinkup”

    Making a post with just the content there gets me 5 words marked.

    Going back and adding in those words once gets me a count of three which would be right.

    So the error appears to be a false count.

    Hmmmm…..

  • drmike
    • DEV MAN’s Mascot

    As note up above, it’s those false positives. You can try the example code I linked to up above. If you write the title of a post and then click on the text area for the post, you’ll see what gets kicked out on the right hand side.

  • omar.ramos
    • Flash Drive

    I’ve been playing around with this issue for the past few hours and something I’ve noticed is that the script is unable to decipher monitored words at the end of a line and at the beginning of the next if there is a new line in between them.

    So for example:

    The fox jumped over the BAD_WORD1

    BAD_WORD2 some more text

    Will not send me an email, but:

    The fox jumped over the BAD_WORD1 BAD_WORD2 some more text

    Would send me an accurate email (marking two words as bad).

    I’ve also modified the script a little bit to include the Count of Bad Words Found, List of Bad Words Found, and the Blog Post Content.

    What’s been really annoying is trying to remove the newlines so that the script will be able to interpret everything accurately (I’m feeling they are what is causing the script to not accurately identify them as being “bad”:wink:.

  • omar.ramos
    • Flash Drive

    I love it when you’re writing a post and get a small idea, then you go try it out a few seconds after your last post and it makes everything work.

    So, I changed the preg_replace line from:

    $tmp_post_title = preg_replace(‘/[^a-zA-Z0-9-s]/’, ”, $tmp_post_title);

    to:

    $tmp_post_title = preg_replace(‘/[^a-zA-Z0-9-]/’, ”, $tmp_post_title);

    So now the script removes all of the space characters (which for some magical reason takes care of those newlines too) and now I’m getting pretty much perfect results (no false positives, but no missed bad words either).

    If anybody would like to try it out they can email me ([email protected]) and then post their results here so that the big wigs can be assured it’s working alright :slight_smile:.

  • omar.ramos
    • Flash Drive

    OK, I lied…it was only because I was making the newlines in HTML mode rather than visual mode…plus my previous post doesn’t even make any sense because I made the change to the post_title variable rather than the post_content one. I’ll keep on trying :slight_smile:.

  • omar.ramos
    • Flash Drive

    OK, here’s one that should work:

    In between these lines:

    $tmp_post_content = strip_tags($tmp_post_content);

    $tmp_post_content = preg_replace(‘/[^a-zA-Z0-9-s]/’, ”, $tmp_post_content);

    Add:

    $tmp_post_content = strip_tags($tmp_post_content);

    $tmp_post_content = preg_replace(“/(rn)+/”, ‘ ‘, $tmp_post_content);

    $tmp_post_content = preg_replace(‘/[^a-zA-Z0-9-s]/’, ”, $tmp_post_content);

    That should remove the line breaks (text areas from what I’ve read add both the return and newline characters). The + sign gets rid of any consecutive occurrences (like you get when you press enter and get a new paragraph in WordPress, so two linebreaks are actually added).

    Just to note, I haven’t commented the line that drmike said was causing the trouble. I’ll go ahead and test a bit more but this seems like a solution that works.