At 17:16 (UTC) on 1st June, Instagram made a planned change to their API. To achieve the high quality of service that Crisp’s customers have come to expect, Crisp’s services use this API to collect new comments and posts for our Instagram moderation.
Part of the ability for Crisp to deliver fast, affordable moderation to our clients lies in how we collect our data from different online communities. When available, Crisp makes use of the community’s API (application programming interface).
API’s are the backbone of the Internet, and we use them to collect content that was posted on pages owned by our customers so that we spend less time browsing through individual customer pages and more time doing what we do best – moderation!
Unfortunately, what the photo-sharing social network's communications hadn't mentioned was that these changes would also introduce a number of other, unplanned breaking changes to their existing service. Around the time of the update, a large number of users and developers from around the world took to Twitter to complain about changes that had stopped their apps from working. The result of this caused some pretty major issues: we stopped collecting Instagram content.
Due to our dependency on external services like the Instagram API, to ensure the best service we can offer, we have two on-call engineers on standby, 24/7. By 17:34, our platform monitoring service had automatically alerted the on-call engineers to an emerging issue that was affecting our Instagram data collection; they started looking into the problem immediately and had pinpointed the exact problem within twenty minutes’ time.
At this point, now knowing the nature and cause of the issue, an on-call developer was able to start working on a fix to counter the problems caused by Instagram’s change. At 18:56, the development work necessary to resolve the issue was completed.
At Crisp, the quality of our service is very important to us. We have a strong review and testing procedure that any platform changes must undergo before they can go live. Given that this was a critical issue with a major part of our stack, we immediately started an emergency change control process, with a view to releasing the hotfix as soon as we were happy that it was correct. The code changes were put in front of the scrutinising eyes of another Crisp developer and a member of the Testing team for review, and at 20:41 the proposed changes were given the green light and had been deployed onto our test environment for a final verification.
Fifteen minutes later, at 20:55, the fix had been through a full QA, verified to resolve the issue and signed off for release to the live platform. Just two minutes after being given the all-clear, the code was deployed to live and collection of Instagram data was successfully resumed without a hitch. Due to our service’s resilience, we were also able to catch up on comments that we originally missed as a result of Instagram’s change, and we moderated everything created during the downtime straight away as we got back up and running.
Whilst many other software companies and services were left crippled by last night’s changes, at Crisp we were able to deliver world class support to our customers and detect the problem, identify the issue and release a low risk hotfix to our platform within under 4 hours.