Scraping the Apple App Store
iOS developers would profit from the historical rating, reviews, and rank information that iTunes provides and so should be able to easily download and store such information. Unfortunately, Apple is a tad paranoid with regard to the information it provides on the App Store data. We think that a app distributor (including Apple) should provide programmatic (API) to access its store’s data. If you want to build your own “App Store Scraper”, you will find below a few hints.
Developers normally access the App Store via Apple iTunes. iTunes behaves like a specialized browser that sends HTTP queries to a web-server. The web-server replies in different ways depending on whether it identifies the caller is iTunes or a web browser. If you want to see all reviews in the UK for the application with id=xxxxxxxxx (look for a real id starting from here)., you should request the file:
If you paste this URL into your browser, you won’t be able to see the same amount of information you would see on iTunes. It might also be that you cannot see anything at all, and your browser will ask to open iTunes. Still, the URL above is the same visited by iTunes –the only difference being in the way iTunes sends the request. Fortunately, you can cheat Apple’s server into believing you are using iTunes when you’re actually not, by making a request via cURL, an common application on most GNU/Linux distributions, that has been ported also to Windows.
1. If you are on Windows, and do not have cURL installed, download it from here, unzip it, and add the bin directory to the PATH variable;
2. Open a terminal window (META+R, digit CMD);
Once you have cURL installed, both on Windows and *nix, cut and paste in your terminal:
curl -H ‘Host: itunes.apple.com’ -H ‘Accept-Language: en-us, en;q=0.50′ -H ‘X-Apple-Store-Front: 143444,5′ -H ‘X-Apple-Tz: 3600′-U ‘iTunes/9.2.1 (Macintosh; Intel Mac OS X 10.5.8) AppleWebKit/533.16”http://itunes.apple.com/WebObjects/MZStore.woa/wa/customerReviews?s=143444&id=xxxxxxxxx&displayable-kind=11′
If you are prompted for a password, just type enter. You should see now the actual XML file seen by iTunes, with all reviews.
Keep getting the error, -bash: syntax error near unexpected token `(‘
As I’m not a programmer, I have no idea what’s wrong.
Be sure you are copying the command on a single line
curl -H ‘Host: itunes.apple.com’ -H ‘Accept-Language: en-us, en;q=0.50′ -H ‘X-Apple-Store-Front: 143444,5′ -H ‘X-Apple-Tz: 3600′ -U ‘: iTunes/9.2.1 (Macintosh; Intel Mac OS X 10.5.8) AppleWebKit/533.16′ ‘http://itunes.apple.com/WebObjects/MZStore.woa/wa/customerReviews?s=143444&id=xxxxxxxxx&displayable-kind=11′
To make it fully automatic
Actually, I have it reduced to this now and it still works:
curl -H “X-Apple-Store-Front: 143444,5″ -U “:” “http://itunes.apple.com/WebObjects/MZStore.woa/wa/customerReviews?s=143444&id=452118074&displayable-kind=11&sort=4″
143444,5 is the ID of the iTunes U.S.A. storefront. You can see all the others in this beautiful Ruby script: http://github.com/gonzoua/random-stuff/blob/master/appstorereviews.rb
sort=4 is to sort by most recent review
It’s unfortunate this blog’s configuration insists on transcribing the quotes into artsy characters. So beware, you will need to transcribe them into some sort of standard quotes after you copy-paste the above command for it to work
Yep, it refuses to have normal quotes… Thanks for your comments by the way!
Thanks a lot for this post. Can anybody explain what “displayable-kind” means?
My wireshark says “displayable-kind=2″ and i would like to understand what that means..
Thx a lot
Is there a paid service from Apple which helps us to rip the above historical rating, reviews, price and rank information and other details
Have a look at this: http://www.apple.com/itunes/affiliates/resources/documentation/itunes-enterprise-partner-feed.html
The data they provide are not great though