PHP Retweeting Twitter Bot Using Streaming API Search Results

Submitted by nigel on Friday 16th March 2012

The introduction of the Twitter Streaming API now provides the Twitter developer with an accurate method of obtaining real-time search results. Previously the developer would have had to use the existing REST service which is not guaranteed to provide either timely or thorough results. The new Streaming API can be used with basic authentication based on an existing account, whilst the REST API requires the more complex OAuth method of authentication.

Armed with this knowledge I decided to develop a Twitter Bot with the following functionality:

  • Search for configurable terms on Twitter
  • Remove those tweets sent by the bot itself to prevent loops
  • Follow any and every mention in a tweet
  • Do not retweet retweets to minimise follower annoyance which could lead to being flagged as spamming

Twitter expects coded solutions using the Streaming API to be available 24/7 to receive search results, and if they are not they will be quickly barred from accessing the Twitter service. In addition, these Twitter clients could be potentially bombarded with search results if they are searching on a hashtag that is trending. It is important therefore that the client should only do the bare minimum of work - save the tweet as fast as possible and loop around to wait for the next one. In coding parlance, this is the Collector.

This flies in the face of the normal PHP ground rules - a short duration script that disappears once its job has done. Fenn Bailey has constructed a PHP class to circumvent that particular issue, using sockets. His class Phirehose (firehose geddit?) needs to be downloaded. We need to build a driver for this class and to include some configuration options at the same time. Phirehose is available on Git at https://github.com/fennb/phirehose so grab yourself a copy and save it into the directory structure you are going to use for this project.

Our Collector driver will instantiate the Phirehose class and write any captured tweets to a trivial database table to be consumedlater. I have elected to use PHP's PDO class for database access to make it portable, although I personally use MySQL.
collector.php

<?php
// Copyright @badzillacouk <a href="http://www.badzilla.co.uk
//" title="www.badzilla.co.uk
//">www.badzilla.co.uk
//</a> Licence GPL. This program may be distributed as per the terms of GPL and all credits
// must be retained
//
// If you find this script useful, please consider a donation to help me fund my web presence
// and encourage me to develop more products to be placed under the terms of GPL
// To donate, go to <a href="http://www.badzilla.co.uk" title="http://www.badzilla.co.uk">http://www.badzilla.co.uk</a> and click on the donation button
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 

require_once('config.php');
require_once(
'Phirehose/Phirehose.php');



class 
MyStream extends Phirehose {

    private 
$db_dsn;
    private 
$db_user;
    private 
$db_password;

    public function 
enqueueStatus($status) {

        try {
            
$dbh = new PDO($this->db_dsn$this->db_user$this->db_password);
        } catch (
PDOException $e) {
            echo 
'Connection failed: ' $e->getMessage();
        }

        
// add to database 
        
$stat json_decode($status);
        
$dbh->prepare('INSERT INTO tweets SET id = :id, content = :content')->execute(array(':id' => $stat->id_str
                                                                                            
':content' => serialize($stat)));

    }


    public function 
setConfig($db_dsn$db_user$db_password) {

        
$this->db_dsn $db_dsn;
        
$this->db_user $db_user;
        
$this->db_password $db_password;
    }

}




$stream = new MyStream($account_user$account_passwordMyStream::METHOD_FILTER);
$stream->setConfig($db_dsn$db_user$db_password);
$stream->setTrack($track);
$stream->consume();
?>

I'm keeping all the configuration options in their own config file which will be shared with the consumer. You will need to populate this file - hopefully all the settings should be self-explanatory for most PHP developers.
config.php

<?php
// Twitter account credentials
$account_user '';
$account_password '';


// get these values from <a href="http://dev.twitter.com/apps" title="http://dev.twitter.com/apps">http://dev.twitter.com/apps</a> after setting up an application (select your app and click on My Access Token)
$access_token '';
$access_token_secret '';


// get these values from <a href="http://twitter.com/apps" title="http://twitter.com/apps">http://twitter.com/apps</a> after setting up an application
$consumer_key '';
$consumer_secret '';

// twitter API params
$retweet_prefix 'RT';

// Tracking keywords. Add into array here
$track = array();


// database details
$db_dsn '';
$db_user '';
$db_password '';
?>
Here is the database schema I am using to store the tweets for later consumption.
-- --------------------------------------------------------
 
--
-- Table structure for table `tweets`
--
 
CREATE TABLE IF NOT EXISTS `tweets` (
  `id` decimal(20,0) unsigned NOT NULL,
  `content` text NOT NULL,
  `t_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `t_time` (`t_time`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Now we can work on the Consumer part of the project. This will sequentially read the database records and retweet and then delete the record. I am using the REST API for this part of the service which uses OAuth. So you will need to download the PHP class that handles this and include it into your directory structure. In addition you will need a Twitter authentication wrapper class built by Abraham Williams which can be downloaded from https://github.com/abraham/twitteroauth. To cap it off, most of the legwork for our consumer has already been developed by Michael Soares in his Retweeter class at https://github.com/mikesoares/php_retweeter. Download this too and stick it into the working directory.

This is where I discovered a versioning problem between Retweeter and twitterauth. Whilst I was coding up a class to extend Retweeter I had to write overrides for the verifyAccess and tweet methods due to a change in the way twitterauth handles error responses.

So my Retweeter extender loops through all the database records and retweets unless it is already a retweet or was originated by the robot in the first instance. It is possible to determine whether a tweet is a retweet for checking on the existence of retweeted_status in the returned object.

The consumer needs to check how many Twitter REST API calls are remaining in the current hour before attempting to retweet. This makes sense since (a) there is no point in the application being rate-limited and (b) the API call to check the remaining calls is free. If we have insufficient headroom then the attempt to retweet is abandoned for 30 seconds.
consumer.php

<?php
// Copyright @badzillacouk <a href="http://www.badzilla.co.uk
//" title="www.badzilla.co.uk
//">www.badzilla.co.uk
//</a> Licence GPL. This program may be distributed as per the terms of GPL and all credits
// must be retained
//
// If you find this script useful, please consider a donation to help me fund my web presence
// and encourage me to develop more products to be placed under the terms of GPL
// To donate, go to <a href="http://www.badzilla.co.uk" title="http://www.badzilla.co.uk">http://www.badzilla.co.uk</a> and click on the donation button
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 

set_include_path(get_include_path() . PATH_SEPARATOR 'twitterauth' PATH_SEPARATOR 'php_retweeter');

require_once(
'config.php');
require_once(
'php_retweeter.php');



class 
MyRetweet extends Retweeter {

    private 
$db_dsn;
    private 
$db_user;
    private 
$db_password;

    public function 
consumeStatus($account_user$prefix) {

        
$curr_limit $this->oauth->get('account/rate_limit_status')->remaining_hits;

        try {
            
$dbh = new PDO($this->db_dsn$this->db_user$this->db_password);
        } catch (
PDOException $e) {
            echo 
'Connection failed: ' $e->getMessage();
        }


        
$sth $dbh->prepare('SELECT id, content FROM tweets ORDER BY t_time ASC;');
        
$sth->execute();
        while(
$res $sth->fetch(PDO::FETCH_OBJ)) {
            
$stat unserialize($res->content);

            
// Construct a list of mentions to follow
            
$users = array();
            
$users[] = $stat->user->screen_name;
            if (isset(
$stat->entities->user_mentions) and is_array($stat->entities->user_mentions)) 
                foreach(
$stat->entities->user_mentions as $value
                    if (
$value->screen_name != $account_user)
                        
$users[] = $value->screen_name;

            
// Worse case we'll need 2 x users + 1 API calls. Do we have enough headroom?
            
if (((count($users) * 2) + 1) >= $curr_limit)
                return;

            
// Retweet unless already a retweet / one of mine
            
if (!isset($stat->retweeted_status) and $stat->user->screen_name != $account_user)  
                
$this->tweet($prefix ' ' '@' $stat->user->screen_name ': ' $stat->text);

            
// check if friendship exists, if it doesn't, create one
            
foreach($users as $user)
                if (
$user != $account_user)
                    if (!
$this->oauth->get('friendships/exists', array('screen_name_a' => $account_user'screen_name_b' => $user)))
                        
$this->oauth->post('friendships/create', array('screen_name' => $user));

            
// Ruthlessly delete db record whether success or not
            
$del $dbh->prepare('DELETE FROM tweets WHERE id = :id;');
            
$del->bindParam(':id'intval($res->id), PDO::PARAM_INT);
            
$del->execute();

            
sleep(2);
        }

    }

    public function 
setConfig($db_dsn$db_user$db_password) {

        
$this->db_dsn $db_dsn;
        
$this->db_user $db_user;
        
$this->db_password $db_password;
    }


    
// Overridden methods below. php_retweet currently not compatable with OAuth so rewritten
    
public function verifyAccess() {

            
$credentials $this->oauth->get('account/verify_credentials');

            return (
$this->oauth->http_code == 200) ? TRUE FALSE;
    }

    private function 
tweet($tweet) {

            
// limit char count, post it and check for error
            
$this->oauth->post('statuses/update', array('status' => $this->_truncateText(htmlspecialchars_decode($tweet))));

            return (
$this->oauth->http_code == 200) ? TRUE FALSE;            
    }

}




// create our object    
$retweeter = new MyRetweet($access_token$access_token_secret$consumer_key$consumer_secret);
$retweeter->setConfig($db_dsn$db_user$db_password);

// just want to make sure we've been authenticated properly, else fail 
if ($retweeter->verifyAccess() == FALSE) {
    echo 
'Access could not be verified';
    exit;
}

while(
TRUE) {
    
$retweeter->consumeStatus($account_user$retweet_prefix);
    
sleep(30);
}
?>
Possible Additional Functionality

Keep Alive Script

The most obvious shortcoming as this system stands right now is it needs to be monitored and checked that collector.php and consumer.php are still running. A cron job should really check every minute or so, and if either process has died, then they should be restarted.

Whitelist

As mentioned previously,the bot will not retweet a retweet. There may well be circumstances were that is desirable however. For instance, perhaps the bot was being used for a campaign. Celebrity Tweeters were tweeted asking for a RT. Should one of the celebrities actually do a retweet, then that really aught to be retweeted to all the bot's followers. This would maximise the campaign's reach and effect. It would therefore make sense to have a whitelist of all celebrities that have been tweeted, and the code should check whether a retweet originates from a whitelisted celebrity.

Blacklist

The reality is a bot can be hijacked for the nefarious to preach their message which may be contrary to the bot's intention. The bot's functionality should be extended to provide the option of not sending tweets originated from blacklisted Tweeters or containing specific phrases.

Statistical Hashtag Data

Statistical hashtag information can be obtained by using 3rd party Twitter associates, but this is usually a costed option. By using the collector.php script to capture search results, it would be possible to write a complementary PHP script to perform statistical analysis.

Badzilla is available for ad hoc freelance PHP Twitter developments. Please register on the site and send a private message on LinkedIn if you have a requirement.

blog terms
PHP Twitter