Scraping Instagram with R, with PHP

I’ve had reason lately to be collecting information regarding the sale of human remains online, in various places. One such is Instagram.

Working with Instagram is not straightforward. One approach that I had been using was a package for R called ‘InstaR‘ by Pablo Barbera. It worked great, after some initial confusion on my part on how to get the damned thing to authenticate (which involves setting up an Instagram developer’s account, etc etc.). Then, in the middle of last year, Instagram changed its developer API rules *such that* the only data I could access with the api *was my own*. So, all that publicly exposed data, but no tool to grab it. (If you read the new ToS, you can only get your wider access to the data approved if you’re commercializing your app – that is, the reason you’re wanting the data – and ‘research’ is not an approved choice. I’m drawing on my memory here, not having the ToS in front of me at the moment).

Long story short: no more data for me.

But then I came across a PHP library that did the trick, paging through the publicly displayed results. You can get it here: https://github.com/postaddictme/instagram-php-scraper. In what follows, I’m talking Mac; Windows folks, you’re on your own.

Getting it to work on this machine required installing Composer, a package manager for PHP which I could do from the terminal. I didn’t initially realize that one could run a PHP file from the terminal prompt same as one would run python etc:

$ php whatever.php

Who knew, eh? Anyway, with composer installed, the next hurdle is getting composer to do its damned job. Turns out this:

$ composer require raiym/instagram-php-scraper

actually had to be this:

$ composer.phar require raiym/instagram-php-scraper

The extra .phar probably means that I haven’t set something properly somewhere, but screw it. It works.

Then, it becomes a matter of writing the php to do what you want it to do, and piping the output to where it needs to go. In this, I found this post by Tom Woodward super helpful. End result:

<?php

require_once 'vendor/autoload.php';

use InstagramScraper\Instagram;

error_reporting(E_ALL);
ini_set("display_errors", 1);

$tag = '_whatever_it_is_youre_looking_for_';

$medias = Instagram::getMediasByTag($tag, 3000); //sets the number of results returned
echo json_encode($medias, JSON_UNESCAPED_SLASHES | JSON_PRETTY_PRINT); 
?>

so, from the terminal line:

$ php myweescript.php > output.json

Then, in order for me to do the next stage of the affairs, I need to convert the json to csv. One can do it with jq but json2csv made life so much easier. Make sure to install it with command line options, like so:

$ npm install json2csv --save -g

And of course, you have to have npm and node.js to make *that* work…

Anyway, good luck.

Advertisements