Sunday, February 14, 2016

Data scraping to save money

Recently, I heard about a new company called Raise.  On the surface, they are in the business of buying used gift cards and then reselling to their customers at a discount.  I am really only interested in their service because I (unfortunately) occasionally go to Starbucks early in the morning to get breakfast.  Their turkey bacon breakfast sandwich with egg whites makes me feel like I'm going with a healthier option (instead of McDonald's), and the price is acceptable.  Raise makes the price even better, and their mobile app makes purchasing these gift cards a snap!

However, when is the right time to buy a discounted gift card?  "Buy low" is certainly the right mindset, but I have seen the discounts fluctuate from anywhere between 8% and 19% recently.  I thought it would at least be cool to be able to track gift card discounts over time so I'd be in the position to make a more informed decision when I need to get another card.

I couldn't find a Raise API online, so I opted to scrape their pages for data.  To simplify the approach, I limited the data of interest to the best discount available for a gift card, i.e. the first one on the page.  Thankfully, Raise displays all GC information without a user login, so I wouldn't have to deal with this extra level of complexity that I'm not experienced enough to handle right now.

My regex skills are rudimentary, but easily came up with a pattern that was good enough to scrape the data I wanted from the HTML source.  Now I just needed a way to post the data on a webpage.  Since I'm not a web developer, I had to come up with a really kludgy implementation that I think will work:



The app is complete and will download the Raise webpage for the Starbucks GC every hour.  It then runs my regex and scrapes the GC data from the page.  If the discount is different than the previous query (my app will query once an hour), then it will push the new value to data.sparkfun.com.

You can check out the raw data here.  There's some garbage in there from testing, so just ignore that.

https://data.sparkfun.com/streams/2J5Knq47y5cNbOrq3lj2

Next, I will be addressing the visualization part of this project.  When (if) I figure it out, I'll blog.  Hopefully, I have the ability to host Google Charts in a blog.

Are you interested in data for a different gift card?  Post a comment, and I might just add it to my query!

No comments:

Post a Comment