Is there a way to use Portia to scrape a JSON response?
Best Answer
n
nestor
said
almost 6 years ago
Unfortunately, Splash cannot render that URL so it won't be possible to scrape with Portia. JSON responses are possible to scrape as long as Portia is able to render as HTML.
Unfortunately, Splash cannot render that URL so it won't be possible to scrape with Portia. JSON responses are possible to scrape as long as Portia is able to render as HTML.
r
rpm61
said
almost 6 years ago
Is that why I can't seem to get any items out of this rather simple JSON response using regex?
I do get the response to load from this url, but my annotations produce no items, it starts trying to extract and just keeps spinnin'. Is there some trick to annotations on straight JSON, or does everything need to be framed in HTML?
nestor
said
almost 6 years ago
@rpm61 Enable JS in Portia and when you run the spider in Scrapinghub it should extract the JSON response.
ivajason
Is there a way to use Portia to scrape a JSON response?
Unfortunately, Splash cannot render that URL so it won't be possible to scrape with Portia. JSON responses are possible to scrape as long as Portia is able to render as HTML.
- Oldest First
- Popular
- Newest First
Sorted by Oldest Firstnestor
Yeah, it is possible to extract JSON response using Portia.
Bob Kolo
How is this done? I am getting no response from a url that returns json in chrome.
nestor
What's the URL?
Bob Kolo
https://api.nbc.com/v3.14/videos?fields[videos]=title,description,type,genre,vChipRating,vChipSubRatings,guid,published,runTime,airdate,available,seasonNumber,episodeNumber,expiration,entitlement,tveAuthWindow,nbcAuthWindow,externalAdId,uplynkStatus,dayPart,internalId,keywords,permalink,embedUrl,credits,selectedCountries,copyright&fields[shows]=active,category,colors,creditTypeLabel,description,frontends,genre,internalId,isCoppaCompliant,name,navigation,overrideFeaturedVideoCol,reference,schemaType,shortDescription,shortTitle,showTag,social,sortTitle,tuneIn,type,urlAlias&fields[images]=derivatives,path,width,attributes&fields[seasons]=seasonNumber,contestantTitle&fields[genereticProperties]=showCollection.collections,reltioGuestCalendar&include=image,show.season,show.genereticProperties.showCollection.collections&derivatives=landscape.widescreen.size640.x1&filter[show]=384bac0b-0daf-4947-8f93-0f060fe3451b&filter[available][value]=2018-01-09T09:00:00-05:00&filter[available][operator]=<=&filter[type][value]=Full Episode&filter[type][operator]==&filter[seasonNumber]=4&page[number]=2&page[size]=6&sort=-airdate
nestor
Unfortunately, Splash cannot render that URL so it won't be possible to scrape with Portia. JSON responses are possible to scrape as long as Portia is able to render as HTML.
rpm61
Is that why I can't seem to get any items out of this rather simple JSON response using regex?
https://jsonplaceholder.typicode.com/posts/1
I do get the response to load from this url, but my annotations produce no items, it starts trying to extract and just keeps spinnin'. Is there some trick to annotations on straight JSON, or does everything need to be framed in HTML?
nestor
@rpm61 Enable JS in Portia and when you run the spider in Scrapinghub it should extract the JSON response.
-
Unable to select Scrapy project in GitHub
-
ScrapyCloud can't call spider?
-
Unhandled error in Deferred
-
Item API - Filtering
-
newbie to web scraping but need data from zillow
-
ValueError: Invalid control character
-
Cancelling account
-
Best Practices
-
Beautifulsoup with ScrapingHub
-
Delete a project in ScrapingHub
See all 446 topics