Code

Four quick PHP filters to reduce contact form spam.

I maintain the code for a fairly popular, if localized, blog about the Hintonburg area called Miss Vicky's Offhand Remarks. It's been around for over 5 years now. It has a contact form to allow neighbourhood residents, and anyone else, to send in tips and requests. I've resisted attaching a captcha to it, as I find them annoying. As a result we get occasional waves of bot spam. I have found that by studying the spam, I have been able to cut down on most of the seriously egregious scripts out there. Now I've pulled out (and simplified) the actual code I use, so these snippets aren't going to work as is, but they should be enough to illustrate the methodology. First, I do filter the referrer to ensure any form posted on the site appears to come from my server. Yes, this is easily faked, but if does cut down on a surprising amount of poorly written (lazy) scripts.

...
if($_SERVER['REQUEST_METHOD'] == 'POST')) {
$srv_rx = '/^http';
$srv_rx .= ($_SERVER['HTTPS'])?('s'):('');
$srv_rx .= ":\/\/".str_replace('.','\.',$_SERVER['SERVER_NAME']).'/';
if (!preg_match($srv_rx ,$_SERVER['HTTP_REFERER'])) {
//should track this, since it's probably a hacker/script
//instead, i will simply die.
$action = 'return';
}
}
...

Then I do three comment form specific checks. The first thing I look for is an inordinate amount of links. If the text is comprised of more than half urls, I throw it back. Again, easily worked around, but this seems to catch most scripts. I do let a legitimate user know that their message has failed to send, in case they want to reformat the message. Again, most scripts don't really care if you've returned anything, so I'm not giving away a trade secret here.

...
if (strlen(preg_replace('/(\W|\s)(?:(?:ht|f)tp(?:s?)\:\/\/)?(?:\w+:\w+@)?'
. '(?:(?:[-\w]+\.)+'
. '(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))'
. '(?::[\d]{1,5})?(?:\/(?:[-\w~!$+|.,\:\*\/&?#=]|%[a-f\d]{2})*)?(\W|\s)/',
'$1$2',$email_text))/strlen($email_text) < (1/2)) {
//text is more than 1/2 urls. probably a bot.
$problem = 'Email not sent because the text looked too spammy'
. ' (url to "real text" ratio too high).<br />Sorry... sort of';
break;
}
...

Since this is a contact form I'm not expecting any formatting. It should just be text. If the email is more than one third HTML, I throw it back. Again, I let the user know, since I have had users send me bits of info and code about the site itself, when they've found bugs.

if (strlen(strip_tags($email_text))/strlen($email_text) < (2/3)) {
//text is more than 1/3 html. probably a bot.
$problem = 'Email not sent because the text looked too spammy'
. ' (HTML to "real text" ratio too high).<br />Sorry... sort of';
break;
}
...

Next, look for both an HTML anchor, and a BBCode url or link tag. If they both exist, it's spam. Again, I send it back, because you never know. Some people are confused.

...
if (preg_match('/<a(?:[^>])*href/',$email_text)
&& preg_match('/\[(?:url|link)=/',$email_text)) {
//text contains both anchor tag and bbcode link. probably a bot.
$problem = 'Email not sent because the text looked too spammy'
. ' (wacky linking).<br />Sorry... sort of';
break;
}
...

And that cuts down on the majority of our contact form spam, and the rest will have to wait untill I can figure out how to write a regular expression that detects 'crazy-talk'.

Tags: ,
2009.09.02 03:09 PM | Permalink 0 Comments

Fun with static methods in Flash AS3 : controlling instances

Every once and a while there are times, especially when creating a public API, when you want to be able to hide settings and actions from plain view. Here's a fun little trick: using public static methods to control instances of a class. By creating internal interfaces, you can use static methods to control various aspects of a class that wouldn't be accessible through "normal means". Now, the following code is obviously an over simplified example, but it does show the concept. It shows how to access normally inaccessible properties, do extended actions during set up, or even simulate Constructor overloading.

package {

public class TestStatic {

private var _readOnly:boolean = false;

private var _name:String;
private var _color:String;


public function TestStatic(name:String,color:String) {
_name = name;
_color = color;
}

//secondary constructors
public static function BlueTestStatic(name:String):TestStatic {
return new StaticTest(name,'Blue');
}

//access advanced settings
public static function makeReadOnly(instance:TestStatic):void {
instance.readOnly = true;
}

//change 'read only' properties
public static function rename(instance:TestStatic,name:String):void {
instance.name = name;
}


public function get color():String {
return _color;
}

public function set color(value:String):void {
if(!_readOnly) {
_color = value;
}
else {
throw new ReferenceError("this property is read only");
}
}

public function get name():String {
return _name;
}

internal function set name(value:String):void {
if(!_readOnly) {
_name = value;
}
}

internal function set readOnly(value:Boolean):void {
_readOnly = value;
}
}
}


var ts:TestStatic = new TestStatic("Henry","Orange");
trace(tsN.name); //returns "Henry";
TestStatic.rename("Hank");
trace(tsN.name); //returns "Hank";
Tags: , ,
2009.08.19 10:26 PM | Permalink 0 Comments

JavaScript roman numeral converter.

Here's the JavaScript Roman Numeral Conversion functions I alluded to in the last post.

//each numeral, starting with the largest
var numerals = {
"M":1000,
"CM":900,
"D":500,
"CD":400,
"C":100,
"XC":90,
"L":50,
"XL":40,
"X":10,
"IX":9,
"V":5,
"IV":4,
"I":1
};
function RomanToDecimal(roman) {
//the roman numeral value
var _v = roman.toUpperCase();
//this holds the decimal equivalent
var _d = 0;
//if the roman value contains more than the allowed letters return Not a Number
if (!_v.match(/^[MDCLXVI]+$/)) return NaN;
//for each roman numeral
for (_n in numerals) {
//while this numeral is at the front of the passed string
while (_v.match(new RegExp("^"+_n))) {
//add the numeral's decimal value to the decimal equivalent
_d += numerals[_n];
//and pop off the numeral found
_v = _v.substr(_n.length);
}
}
// still letters left. Improper sequencing. return Not a Number
if (_v.length) return NaN;
//otherwise, return the decimal equivalent
else return _d;
}

function DecimalToRoman(num) {
//the decimal equvalent
var _d = parseInt(num);
//this will hold the roman value
var _r = '';
//for each roman numeral
for(var _n in numerals) {
// get the number of times (if any) it divides into the decimal value
var _x =Math.floor(_d/numerals[_n]);
// if it does divide into the decimal value
if (_x) {
// subtract that amount from the decimal value
_d -= (_x*numerals[_n]);
//and add the appropriate number of numerals to the roman value
for(var a=0;a<_x;a++) {
_r += _n;
}
}
// if the decimal value now equals zero, stop building our number
if (!_d) break;
}
return _r;
}
DecimalToRoman(RomanToDecimal("MCCLXXVIIII"));
Tags: ,
2009.08.17 01:07 PM | Permalink 0 Comments

Difference between using Object as associative array in Flash AS3 and JavaScript

As I was porting a Roman number converter (I will post it shortly) I had written in JavaScript to AS3 and ran into an interesting "quirk" of the AS3 Object class. I doesn't keep the keys in the same order that they were declared. The code used an Object literal as an associative array; holding each roman "digit" along with it's decimal equivalent in descending order by value. I'd then use a for ... in loop to cycle through the object. In JavaScript I'd get the values in the same order they had been generated. AS3 seems to be a little more cavalier. The following code in Javascript:

var o = {
"name1":1,
"name2":1,
"name3":1,
"name4":1,
"name5":1
};
var a=0;
for (var n in o) {
document.write(" o " + a + ":"+n);
a++;
}

produces this result in all major browsers (Firefox 3.5, IE 8, Safari 4, Opera 9.54, Chrome 2):
o 0:name1 o 1:name2
o 2:name3
o 3:name4
o 4:name5

However, using the following code in the Flash IDE

var o:Object = {
"name1":1,
"name2":1,
"name3":1,
"name4":1,
"name5":1
};
var a:uint = 0
for (var n:String in o) {
trace(" o " + a + ":"+n);
a++;
}

resulted in the following trace result:
o 0:name4
o 1:name5
o 2:name1
o 3:name2
o 4:name3

Oddly, it seemed to produce the exact same order each time I ran it. I tried it with a Dictionary, to see if the made a difference, but results were similar. I tried different values, to try and find a reason for the ordering -- to no avail. If someone knows why it's consistently picks the same random order, I'd love to know.

Tags: , , ,
2009.08.16 11:25 PM | Permalink 0 Comments

Flash AS3 Showdown: Object vs. Dictionary

I decided to see what the difference was between using a Dictionary and an Object to store simple name:value pairs, where name was always a String. Fired up the Flash IDE and typed out the following code:

import flash.sampler.getSize;
import flash.utils.Dictionary;
var a:uint;
var s:Date;
var find:Array = new Array()
var val:*;

trace('creating 20000 entries');
s = new Date();
var d:Dictionary = new Dictionary();
for (a = 0;a<20000;a++) {
d[String('name'+a)] = a;
}
trace('Dictionary creation: '
+ (new Date().valueOf() - s.valueOf()) + 'ms'
+ ', size: ' + getSize(d) + ' bytes');
s = new Date();
var o:Object = {};
for (a = 0;a<20000;a++) {
o[String('name'+a)] = a;
}
trace('Object creation: '
+ (new Date().valueOf() - s.valueOf()) + 'ms'
+ ', size: ' + getSize(o) + ' bytes');

for (a = 1;a <1000;a++) {
find.push(20000/a);
}
trace('reading 1000 keys');

s = new Date();
for (a = 0;a<1000;a++) {
val = d['name' + find[a]];
}
trace('Dictionary read: '
+ (new Date().valueOf() - s.valueOf()) + 'ms');

s = new Date();
for (a = 0;a<1000;a++) {
val = o['name' + find[a]];
}
trace('Object read: '
+ (new Date().valueOf() - s.valueOf()) + 'ms');


trace('search for 1000 values');
s = new Date();
for (a = 0;a<1000;a++) {
for each(val in d) {
if (val == find[a]) break;
}
}
trace('Dictionary search: '
+ (new Date().valueOf() - s.valueOf()) + 'ms');

s = new Date();
for (a = 0;a<1000;a++) {
for each(val in o) {
if (val == find[a]) break;
}
}
trace('Object search: '
+ (new Date().valueOf() - s.valueOf()) + 'ms');

Which produced the following trace results:

creating 20000 entries
Dictionary creation: 47ms, size: 160032 bytes
Object creation: 47ms, size: 160024 bytes

So, same speed, and Object is a tiny bit smaller. In multiple runs this was almost always the same value - often enough to be a reliable value.

reading 1000 keys
Dictionary read: 16ms
Object read: 16ms

Again, for direct key reads, the two are same speed. In multiple runs this was always the same value.

search for 1000 values
Dictionary search: 2047ms
Object search: 2219ms

Now here, Object falls down bit. The interesting thing is that, despite having the same values created for the search each time, the search ms were all over the map. Each time I fired this, I got a different value. I'm assuming this much churn was causing the garbage collector to kick in sporadically. That said, the difference between the values would vary from 100 to 400 ms. The one constant though, was that Object searches were almost always slower than Dictionary searches, and the instances where Object was faster were few and far between. However, the time was always just over 2000ms for each. So, the average speed per search was still aprox. 2-2.5ms per search for both Object and Dictionary.

Conclusion: unless you really need to use an strict type checking for your keys, then there's no discernible difference between the two, except that using Object saves you an import.

Tags: , ,
2009.08.14 11:14 PM | Permalink 0 Comments
← Previous Page 2 of 2