This page is meant mostly as another example of interacting with the Instagram API. For some of the HTML tutorials, I'll be using the data from government Instagram accounts. But maybe you want to use your own data? Then follow the steps in this guide.
You should start with the montage the world with Instagram and Google guide
Note: This guide is under a bit of construction
We will be accessing the /users/(userid)/media/recent endpoint
, which is documented here.
In bold are the parameters we care about:
You'll need your own ACCESS_TOKEN
. And use your own USERNAME
instead of mypubliclands
(which belongs to the U.S. Bureau of Land Management)
# paste in your own token if you didn't initialize it with .bashrc
ACCESS_TOKEN="$INSTAGRAM_TOKEN"
BASE_URL='https://api.instagram.com/v1'
USERNAME='mypubliclands'
You might know your Instagram username, but you may not know your Instagram user_id. So we need to use the users/search endpoint
curl "$BASE_URL/users/search?access_token=$ACCESS_TOKEN&q=$USERNAME" > profile.json
Inside profile.json
, the response for mypubliclands
looks like this:
{
"meta": {
"code": 200
},
"data": [
{
"username": "mypubliclands",
"bio": "The Bureau of Land Management manages more than 245 million acres of public land, with some of the most breathtaking landscapes anywhere on earth.",
"website": "http://www.blm.gov",
"profile_picture": "https://instagramimages-a.akamaihd.net/profiles/profile_225371825_75sq_1348864741.jpg",
"full_name": "Bureau of Land Management",
"id": "225371825"
}
]
}
Your response may have more than one result in the data
array, so eyeball the appropriate id
value. If you happen to be the most popular account in the array of results, you can grab the id
programmatically by assuming that you are the first in the results:
cat profile.json | jq -r '.data[0] .id'
# 225371825
And now we can set a USER_ID
variable for the next step:
USER_ID='225371825'
Let's set some variables:
COUNT="50"
MEDIA_ENDPOINT="https://api.instagram.com/v1/users/$USER_ID/media/recent?access_token=$ACCESS_TOKEN&count=$COUNT"
To get the data from your first 50 photos:
curl -o temp.json $MEDIA_ENDPOINT
This endpoint returns images in reverse chronological order, i.e. most recent, first. If we want anything older than the 50 most recent images, we have to use the max_id
parameter, which is described as:
MAX_ID Return media earlier than this max_id.
So, among the images that we've just curled into temp.json
, we need the id
of the oldest image. And if the images in temp.json
are sorted starting from most recent, that means we want the last image in temp.json
.
Using jq:
cat temp.json | jq --raw-output '.data | reverse[0] .id'
237497413860994153_181309234
So let's pass it in to the API call as the value for max_id
:
nid=$(cat temp.json | jq --raw-output '.data | reverse[0] .id')
curl "$MEDIA_ENDPOINT&max_id=$nid" -o "$nid.json"
Whatever exists in $nid.json
should be a new batch of photos (assuming you have more than 50 photos).
So basically, the routine for collecting all the photos is:
# get the first batch of photos
curl $MEDIA_ENDPOINT -o first_batch.json
# get the 2nd batch, based on the `id` of the
# *last* photo in the current batch
nid=$(cat first_batch.json | jq --raw-output '.data | reverse[0] .id')
curl "$MEDIA_ENDPOINT&max_id=$nid" -o "$nid.json"
And at this point, you just repeat the following code:
# get the nth batch...and repeat
nid=$(cat $nid.json | jq --raw-output '.data | reverse[0] .id')
echo $nid
curl -s "$MEDIA_ENDPOINT&max_id=$nid" -o "$nid.json"
– until you get an empty batch, i.e. $nid
is equal to null
If you have a lot of photos, then copying-and-pasting should feel tedious. So now is a chance to practice your while
loops. You don't have to do this, but you should aspire to learn it as it's the only way to make a hands-free automated process of re-downloading your Instagram data at will.
There are many ways to structure the loop; I've chosen a straightforward, if inelegant way:
nid
is set to an empty string, i.e. ''
while
loop initiates and keeps going until $nid
is not null
. Note that null
is not the same as the empty string, ''
, which is how we enter the loop in the first placecurl
is stored in the response
variablejq
to parse $response
, by looking at the .data
array, reversing it, and then grabbing the .id
of the 0th
element, which we assign to the nid
variabledata
array, the $nid
is not null
, and we save the JSON contained in $response
into $nid.json
(that way, each response is in a different JSON file)$nid
is null
nid=''
while [[ $nid != null ]]; do
# if the max_id is empty, you'll get the most recent photos by default
response=$(curl -s "$MEDIA_ENDPOINT&max_id=$nid")
nid=$(echo $response | jq --raw-output '.data | reverse[0] .id')
echo "nid: $nid"
if [[ $nid != null ]]; then
# name each response with the id of its oldest photo
echo $response > $nid.json
fi
sleep 2
done
# For `mypubliclands', the echo output will look like this:
#nid: 886567966460382181_225371825
#nid: 853735645642525166_225371825
#nid: 823860786682955993_225371825
#nid: 804346466643249380_225371825
#nid: 776955719837321679_225371825
#nid: 742745337740692263_225371825
#nid: 704359738076072008_225371825
#nid: 673222394963069579_225371825
#nid: 648089023618518012_225371825
#nid: 584804608801278362_225371825
#nid: 568969379649981259_225371825
#nid: 526931139443330000_225371825
#nid: 502980035944941622_225371825
#nid: 470412227080128899_225371825
#nid: 415184442418301822_225371825
#nid: 390018053906925925_225371825
#nid: 363732785461758349_225371825
#nid: 342023866460965417_225371825
#nid: 318780852670335378_225371825
#nid: 302846550820555048_225371825
#nid: 291165315748628823_225371825
#nid: 290833966017841191_225371825
#nid: null
Once you're done, you should have a few .json
files, and you can do this to see the direct URLs to every Instagram image of yours:
cat mypubliclands/*.json | jq -r '.data[] .link'
cat mypubliclands/*.json | jq -r '.data[] .likes .count'
(Under construction, but basically, use jq)
The following is just a proof-of-concept. You don't need to execute it yourself, but it's a demonstration of taking the previous logic and throwing it in a loop, i.e., having a loop within a loop, so that we can save the data for multiple U.S. government agencies' Instagrams:
This loop does what we did previously, though I keep things organized by creating a rawjson
directory and then a subdirectory inside rawjson
for each agency.
ACCESS_TOKEN="$INSTAGRAM_TOKEN"
BASE_URL='https://api.instagram.com/v1'
for username in glaciernps nasa usinterior mypubliclands USFWS smithsonian; do
echo Fetching $username
sleep 1
mkdir -p ./rawjson/$username
curl -o ./rawjson/$username/profile.json \
-s "$BASE_URL/users/search?access_token=$ACCESS_TOKEN&q=$username"
# Get the user_id from the username
user_id=$(cat ./rawjson/$username/profile.json | jq -r '.data[0] .id')
echo "Found user_id of $user_id for $username"
MEDIA_ENDPOINT="https://api.instagram.com/v1/users/$user_id/media/recent?access_token=$ACCESS_TOKEN&count=$COUNT"
# begin the loop to paginate all the photo data
nid=''
while [[ $nid != null ]]; do
# if the max_id is empty, you'll get the most recent photos by default
response=$(curl -s "$MEDIA_ENDPOINT&max_id=$nid")
nid=$(echo $response | jq --raw-output '.data | reverse[0] .id')
echo "$username,$user_id nid: $nid"
if [[ $nid != null ]]; then
# name each response with the id of its oldest photo
echo $response > "./rawjson/$username/$nid.json"
fi
sleep 1
done
done
Now get all of the longitude/latitude
mkdir -p "geocoded/images"
# make the top 25 images
datafile="geocoded/data.json"
cat ./rawjson/*/*.json | \
jq '.data[] | select(.location .latitude != null)' | \
jq -s '.' > $datafile
while read -r img; do
id=$(echo $img | cut -d ',' -f 1)
url=$(echo $img | cut -d ',' -f 2)
echo $url
# Download the image from instagram
curl -s $url -o "geocoded/images/$id.jpg"
sleep 1
done < <(cat $datafile | \
jq -r '.[] | [.id, .images .standard_resolution .url] | @csv' | tr -d '"' )
for username in glaciernps nasa usinterior mypubliclands USFWS smithsonian; do
mkdir -p "top25/$username/images"
# make the top 25 images
datafile="top25/$username/data.json"
cat ./rawjson/$username/*.json | \
jq '.data[]' | \
jq -s '.' | \
jq 'sort_by(.likes .count) | reverse[0:25]' > $datafile
echo "Downloading top images for $username"
sleep 3
while read -r img; do
id=$(echo $img | cut -d ',' -f 1)
url=$(echo $img | cut -d ',' -f 2)
echo $url
curl -s $url -o "top25/$username/images/$id.jpg"
sleep 1
done < <(cat $datafile | \
jq -r '.[] | [.id, .images .standard_resolution .url] | @csv' | tr -d '"' )
done
Of course we can montage them all like this:
montage top25/*/images/*.jpg insta.jpg
Now we have all the data we need to make a custom page of our Instagram photos. Let's make a simple page that consists of just a grid of our photos.
And while we're prototyping, instead of generating a page of hundreds of photos, let's just start with the top 20 photos by Like count…we can sort/extract those 20 photos with jq and make a top-photos.json
to work from:
First, make a file that is an array of all of the images (this extracts the .data
arrays from each file and combines them into one array stored inside all.json
):
# I'm not sure why I have to use "slurp" here, but whatever, I'm
# going for it
cat *.json | jq '.data[]' | jq -s '.' > all.json
#
cat all.json | jq 'sort_by(".likes .count") | reverse[0:24]' > top.json
cat > basic.html<<EOF
<html>
<body>
<style>
.images{ width: 900px; margin: auto auto; }
.image{ float: left; width: 33% }
</style>
<div class="images">
EOF
for url in $(cat top.json | jq -r '.[] .images .low_resolution .url'); do
cat >> basic.html <<EOF
<div class="image">
<img src="$url">
</div>
EOF
done
cat >> basic.html <<EOF
</div> <!-- end of .images -->
EOF
Let's add some data
First we have to reformat JSON
cat top.json | jq '.[] | { link: .link,
image_url: .images .standard_resolution .url,
thumb_url: .images .low_resolution .url,
caption: .caption .text,
created_time: .created_time,
lat: .location .latitude,
lng: .location .longitude } | @json'
The script
# better.sh
cat > basic.html<<EOF
<html>
<head>
<link rel="http://stash.compciv.org/assets/css/bootstrap.min.css">
</head>
EOF
# insert jquery
# insert isotopejs
#
cat >> better.html<<EOF
<body>
<div class="container">
EOF
while read j; do
obj=$(echo $j | jq -r '.')
img_url=$(echo $obj | jq '.image_url')
caption=$(echo $obj | jq '.caption')
echo $img_url
cat >> better.html <<EOF
<section class="row">
<div class="col-sm-4">
<div class="image">
<img src="$img_url">
</div>
</div>
<div class="col-sm-4">
<div class="caption">
$caption
</div>
</div>
</section>
EOF
done < <(cat top.json | jq '.[] | { link: .link,
image_url: .images .standard_resolution .url,
thumb_url: .images .low_resolution .url,
caption: .caption .text,
created_time: .created_time,
lat: .location .latitude,
lng: .location .longitude } | @json')
cat >> better.html <<EOF
</div> <!-- end of .container -->
EOF