Scraping the Apple App Store
iOS developers would profit from the historical rating, reviews, and rank information that iTunes provides and so should be able to easily download and store such information. Unfortunately, Apple is a tad paranoid with regard to the information it provides on the App Store data. We think that a app distributor (including Apple) should provide programmatic (API) to access its store’s data. If you want to build your own “App Store Scraper”, you will find below a few hints.
Developers normally access the App Store via Apple iTunes. iTunes behaves like a specialized browser that sends HTTP queries to a web-server. The web-server replies in different ways depending on whether it identifies the caller is iTunes or a web browser. If you want to see all reviews in the UK for the application with id=xxxxxxxxx (look for a real id starting from here)., you should request the file:
If you paste this URL into your browser, you won’t be able to see the same amount of information you would see on iTunes. It might also be that you cannot see anything at all, and your browser will ask to open iTunes. Still, the URL above is the same visited by iTunes –the only difference being in the way iTunes sends the request. Fortunately, you can cheat Apple’s server into believing you are using iTunes when you’re actually not, by making a request via cURL, an common application on most GNU/Linux distributions, that has been ported also to Windows.
2. Open a terminal window (META+R, digit CMD);
Once you have cURL installed, both on Windows and *nix, cut and paste in your terminal:
curl -H ‘Host: itunes.apple.com’ -H ‘Accept-Language: en-us, en;q=0.50′ -H ‘X-Apple-Store-Front: 143444,5′ -H ‘X-Apple-Tz: 3600′-U ‘iTunes/9.2.1 (Macintosh; Intel Mac OS X 10.5.8) AppleWebKit/533.16”http://itunes.apple.com/WebObjects/MZStore.woa/wa/customerReviews?s=143444&id=xxxxxxxxx&displayable-kind=11′
If you are prompted for a password, just type enter. You should see now the actual XML file seen by iTunes, with all reviews.