Do you know this situation: you are listening to a playlist from Spotify on your stereo but just can’t figure out who is singing that song? Since you are sitting comfortably on your couch (too far away from your stereo) you think how this problem could be solved with some clever engineering. You quickly draw up a mental design in your head: a ESP8266 based device connects over the net to Spotify. Then every few seconds it downloads information about the currently played song. Maybe some cover art too and display it on a color display! It can’t be THAT hard, right? This post explains the challenges I faced when implementing the ESP8266 Spotify remote and how I solved them.

Order Now on Amazon US and have it at your door steps by tomorrow!

We also ship to any other country. Order now in our shop!

Challenge Accepted

Building a remote control for a service like Spotify is a fine engineering challenge. It involves several interesting aspects of modern connected systems. First of all you have to find out if and how you can control the Spotify player over the network. Services like Spotify usually (hopefully) use some kind of security to allow only permitted users to control your player. Once you managed the security aspects you have to find out what you can actually do with the application programming interface (API) offered by the service. Can you control Play And Pause? Can you display current progress of the song? Does the service allow you access additional information like cover art?

Once you understand all the possibilities you go on and design how your device should look like and what features it should initially have. I wanted to implement this for the ESP8266 Color Display Kit so the hardware design was already clear. The color display with touch screen should display the current position and total duration of the song. The touch screen should let me pause and play the song and to jump to the next or previous one in the current play list. So everything is clear, right? Well, the challenge just started.

OAuth 2.0 on ESP8266

Spotify allows you to get a lot of information and control the player over a service they call “Web API“. The idea is that third party software developers can develop new and cool applications which extend the Spotify functionality. Third party means that it is not developed by the end user/ consumer nor Spotify. In this situation service providers usually use a security mechanism called OAuth. The third party developer registers an application with the service provider. Then he develops the application and installs it either on a device or as a web application on a server. If you as end user want to use the third party application you’ll have to tell Spotify that you grant this application access.

Triangle of Trust

It’s a bit like telling the bank that your accountant can access your bank account. You don’t do this with everybody, only if you trust your accountant, right? The OAuth protocol allows you to grant access to your resources to a third party. The example of your bank and the accountant this is a triangular relationship where every party has to trust the other two: the bank needs to know that you entitled your accountant to access your bank information. Also only the accountant should get access and not an impostor!

Because of this triangle of trust the flow to enable access to a resource like the Spotify Web API is a bit complicated. It involves several steps before the third party application can start using the service:

The third party application needs to get credentials. Spotify calls these credentials Client Id and Client Secret.
With these credentials the application requests access to the users resource. If this is the first time the user has to agree to it on a web page. If everything is OK Spotify redirects the web page back to the application together with a temporary access code
With this access code the application requests another access code which is valid for an hour. Together with this one hour token you also get a refresh token.
Now the application can start using the API by sending the access code and it’s own credentials to Spotify
Once the access code runs out the application can use the refresh token to request a new access code

Here is the Spotify diagram for this:

You can find this diagram here. If you don’t want your user to do this process every time the device starts you’ll also have to store refresh token permanently in the flash memory. Next time the device starts it will use the refresh token to get the one hour key…

mDNS to the rescue

All quite easy! Do a couple of web requests and exchange some codes for other ones! The challenge here is actually to manage these steps properly and deal with exceptions. What if the user did not grant access to the application? How to detect that the access token is not valid anymore?

For our Spotify remote there is an additional problem: how can the user start this workflow? Our application is not a web application (yet). There are several solutions to this but I decided to turn the remote control into a web application. The ESP starts a web server and the user connects to it. But there is another challenge. After the user grants the application access to the service Spotify redirects the browser back to the web application. To do so it needs to know where the web application is hosted. As an additional security measure Spotify needs a white list with all allowed redirection targets. IP addresses like 192.168.0.100 can change all the time so we need to give our ESP a name.

We can use a protocol called mDNS to announce the name in the network. On the Arduino/ESP8266 platform you can do this with:

MDNS.begin("esp8266");

After that you can reach the ESP8266 at http://esp8266.local. Now we can tell Spotify that the redirection target is http://esp8266.local/callback:

The funny thing about this is that Spotify doesn’t even know where esp8266.local is. It is only known in your local network! This still works because Spotify instructs your browser – which is in your network – to do the redirect. Your browser asks the network stack: “Who is esp8266.local” and the network stack answers: “Try IP 192.168.0.100”. The browser then connects to the ESP and opens the web page. Cool, isn’t it?

The MiniGrafx Frame Buffer

So we managed to get access to the API! We can now fetch information about the active player and the currently played song. Wonderful. Almost done. Now we just have to display the information on the color display. And since we don’t like flickering on a display we are going to use our MiniGrafx library.

There are at least two different ways you can draw on a (color) display like the 2.4″ that comes with the Color Display kit. You can draw directly to the screen or you draw the image in memory and then write the content to the screen. The later approach has the advantage that you don’t see a black screen when you erase the content. Without this frame buffer we would see the cleared (black) screen for a split second and our eye would perceive it as flickering which can be very annoying.

Reduce the Color Depth!

Like many things in life the frame buffer also has disadvantages: it consumes more memory and it is slower than direct screen access. With the ESP8266 the first disadvantage is actually a real problem. You have to reserve a lot of memory for the frame buffer. The display uses 16bits or two bytes per pixel. With a dimension of 320×240 pixels that makes ~150kb of memory! The ESP8266 doesn’t even have a fraction of that left for you. But there are some tricks we can use here. Some of those tricks where already used with the first game consoles and home computers. Here is a nice book about those tricks if you are interested.

If we don’t have enough memory for 16bit colors we can reduce the amounts of colors used! Another name for reducing the number of colors to save memory is called a color palette. The MiniGrafx library allows you to define the number of colors initially. If we can live with only four colors we only need 2 bits per pixel which is only about 19kb for the frame buffer!

A frame buffer with color palettes works like this: for each pixel in a 2bit buffer we only store an index between 0 and 3. Before we write the buffer to the display we check our look up table which color code we should use for index 0, index 1, index 2 and index 3.

Image from here

Cover Art with 2bit Color?

But wait! We said we wanted to display cover art of the currently played song! How would this look like with 2bit color depth?

Not very nice, right? But then again maybe there is a little trick? We don’t need to update the cover art all the time! We actually only need to do this when the song changes.

So far our MiniGrafx library only supported frame buffers using the whole screen. So it had to be extended to allow partial update. This new feature allows you now to define a smaller frame buffer and copy it anywhere on the display. This feature is sometimes also called a sprite.

With this feature we can now update the current song information several times a second with a frame buffer that only uses 2bit colors to save memory. The cover art which is updated only every few minutes is written directly to the screen. This illustrates the separation:

JPEG Decoding

The next challenge was to display the images encoded with JPEG on the screen. To my big luck there is a library for the ESP8266 that can decode JPEG images. The interface is rather complicated but this is probably due to the fact that the library decodes one MCU (minimum coded unit) at a time. Then the library lets you draw it on the screen. MCUs are blocks of variable size defined in the JPEG format. Relatively common sizes are 8×8 pixels or 16×16 pixels. After drawing an MCU you have to ask the library for the next block and so on…

Initially everything looked fine until I saw strange artifacts in the displayed cover art. It took me many hours where I thought that I must have a bug in my code. Until I finally had the idea to upgrade to the latest version of the JPEG library and the problem was gone! So, sometimes it actually IS the library and not your code.

Non-Blocking Screen Updates

In my day-time job I usually program in Java, JavaScript or these days TypeScript. This means that my code runs on powerful computers. Multi-threading is usually done for me by the operating system, the web server or other components of the software stack. This means that I often don’t have to worry if things can run in parallel.

A micro controller like the ESP8266 is a totally different story. You’ll have to take care of this by yourself. Imagine this scenario: in the Spotify remote code you just downloaded the latest information of the currently playing song: we are at position 2:31 of a 5:25 song. Downloading this information usually takes less than a second but this time it took 2 seconds! Maybe the server was slow or some network hick-up. This means the last time we updated the screen was around 2:29 into the song and this was displayed on screen. For the user of our beautiful Spotify remote the time would jump from 2:29 to 2:31 without ever displaying 2:30. This is not cool and certainly not what we expect from a modern device.

The Drawing Callback

But how can we solve this? Make the download faster? This reduces the chance of these time jumps but it will still happen in case of connection issues. So the other solution is to give back control to the drawing routine while we are still downloading the JSON object and guess the current time. Every now and then in the JSON parser we tell the drawing routine to update the player information. In our previous example this might happen several times between the two JSON downloads:

2:29:300(ms): The last download of the JSON object has finished and the player information is updated
2:29:700: A new download has started. The download routine gives back control to the drawing routine. The drawing routine measures the time since the last update (400ms) and updates the time. 2:29:700 will be displayed as 2:29. No visible change…
2:30:100: We are still downloading the JSON file with the player information. The drawing routine calculates how much time has passed since the last JSON download (800ms) and updates the time as 2:30.
2:30:500: New update, time still displayed as 2:30
2:30:900: New update, time still displayed as 2:30
2:31:100: the JSON download has finished. The player information is updated and the time displayed as 2:31

This means of course, that we are guessing that the player is still running. If the user had stopped the player at 2:29:400 the Spotify Remote would update the time as if nothing had happened.

The same of course also applies when we download cover art. Downloading a JPEG file and displaying it on screen takes even longer than the JSON file with the player information. So between receiving bits and bytes we always give back control to the drawing routine to update the time.

Summary

In this post I showed you some of the problems I faced when I implemented the ESP8266 Spotify Remote control. There were other challenges of course. One of them being the calibration of the touch screen. But these three topics I showed here were certainly the most interesting ones.

If you like to build this project you can find the instructions and the code on this Github repository: https://github.com/ThingPulse/esp8266-spotify-remote/ If you don’t have the hardware yet to build the Spotify Remote, the Color Weather Station or the Plane Spotter then you can find it here

Order Now on Amazon US and have it at your door steps by tomorrow!

We also ship to any other country. Order now in our shop!

2 thoughts on “The ESP8266 Spotify Remote – Engineering Challenges”

Stephen Ludgate
2018-08-10

Great article Daniel, very informative and interesting how you solved many of the issues. Great project too – I have just got a touch screen in to hook up to a spare ESP8266 I have to try it out. It’s a very “Happy Wife, Happy Life” project to get brownie points at home!
George I
2020-09-19

Wonderful project! I was looking for something exactly like this and found your project.