Not working with stats.nba.com

Posted almost 7 years ago by atomant

Post a topic
Answered
a
atomant

I'm able to scrape the site locally, but its not working through Crawlera. I've tried sending additional headers and it just hangs. I can get it working in browser and locally in Ruby


  

# Create URL

url = 'http://stats.nba.com/stats/leaguedashteamstats?Season=2014-15&MeasureType=Base&SeasonType=Regular+Season&PerMode=PerGame&Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&SeasonSegment=&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision='
uri = URI.parse(url)

# Proxy
proxy_host = "proxy.crawlera.com"
proxy_port = 8010
proxy_user = "APIKEY:"
proxy = Net::HTTP::Proxy(proxy_host, proxy_port, proxy_user)

# GET from site
request = Net::HTTP::Get.new(uri)
request["Accept-Language"] = "en-US,en;q=0.8,ru;q=0.6"
request["Accept-Encoding"] = "gzip, deflate"
request["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
request["Connection"] = "keep-alive"
#request["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
# req_options = {use_ssl: uri.scheme == "https",}

# Proxy response
response = proxy.start(uri.host,uri.port) do |http|
 http.request(request)
end

# Working bit without proxy
# response = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
#   http.request(request)
# end
 

  

0 Votes

nestor

nestor posted almost 7 years ago Admin Best Answer

0 Votes


12 Comments

Sorted by
nestor

nestor posted almost 7 years ago Admin Answer

0 Votes

a

atomant posted almost 7 years ago

The code works fine with other URLs, just not the stats.nba.com one..

0 Votes

a

atomant posted almost 7 years ago

I added the rest from Chrome and I'm still seeing it timeout every time..



 

      request["Accept-Language"] = "en-US,en;q=0.8,ru;q=0.6"
      request["Accept-Encoding"] = "gzip, deflate"
      request["Accept"] = "application/json, text/plain, */*"
      request["Connection"] = "keep-alive"
      request["x-nba-stats-token"] = "true"
      request["Referer"] = "http://stats.nba.com/teams/traditional/"
      request["x-nba-stats-origin"] = "stats"

 

0 Votes

nestor

nestor posted almost 7 years ago Admin

Don't think so. Try adding the rest of browser headers like "Cache-Control: max-age=0"

0 Votes

a

atomant posted almost 7 years ago

Still timing out after 3 minutes

stats.nba.com will apparently timeout for AWS IPs.. Is it possible they're doing the same for proxy ips?

0 Votes

nestor

nestor posted almost 7 years ago Admin

Can you increase it to 180 at try again? Crawlera might be retrying with a different IP so the request could probably take longer.

0 Votes

a

atomant posted almost 7 years ago

60 second timeout

Tried it many times in the last several days

0 Votes

nestor

nestor posted almost 7 years ago Admin

What's the timeout value on your client? Shouldn't be too short, otherwise Crawlera might not have enough time to respond.

0 Votes

a

atomant posted almost 7 years ago

A timeout from the client

 

Net::ReadTimeout: Net::ReadTimeout
        from (irb):65:in `block in irb_binding'
        from (irb):64

 

0 Votes

nestor

nestor posted almost 7 years ago Admin

What kind of timeout? A timeout from your client or a timeout error from Crawlera with a 504 code?

0 Votes

a

atomant posted almost 7 years ago

Thanks for the quick response.

After I noticed that I left the API key in I generated a new one.

I'm getting a timeout error when running it through Crawlera, and it works fine locally

0 Votes

nestor

nestor posted almost 7 years ago Admin

That doesn't seem to be a Crawlera API Key, also please not that this is a public forum so I've removed it from your post. In any case, please ensure you are using the API Key provided in your Crawlera dashboard https://app.scrapinghub.com/o/orgid/crawlera/crawlerauser/setup and let me know what error you get.

0 Votes

Login to post a comment