Is there a way to use Portia to scrape a JSON response?
0 Votes
nestor posted
almost 7 years ago
AdminBest Answer
Unfortunately, Splash cannot render that URL so it won't be possible to scrape with Portia. JSON responses are possible to scrape as long as Portia is able to render as HTML.
0 Votes
7 Comments
Sorted by
nestorposted
almost 7 years ago
Admin
Yeah, it is possible to extract JSON response using Portia.
0 Votes
B
Bob Koloposted
almost 7 years ago
How is this done? I am getting no response from a url that returns json in chrome.
Unfortunately, Splash cannot render that URL so it won't be possible to scrape with Portia. JSON responses are possible to scrape as long as Portia is able to render as HTML.
0 Votes
r
rpm61posted
almost 7 years ago
Is that why I can't seem to get any items out of this rather simple JSON response using regex?
I do get the response to load from this url, but my annotations produce no items, it starts trying to extract and just keeps spinnin'. Is there some trick to annotations on straight JSON, or does everything need to be framed in HTML?
0 Votes
nestorposted
almost 7 years ago
Admin
@rpm61 Enable JS in Portia and when you run the spider in Scrapinghub it should extract the JSON response.
Is there a way to use Portia to scrape a JSON response?
0 Votes
nestor posted almost 7 years ago Admin Best Answer
Unfortunately, Splash cannot render that URL so it won't be possible to scrape with Portia. JSON responses are possible to scrape as long as Portia is able to render as HTML.
0 Votes
7 Comments
nestor posted almost 7 years ago Admin
Yeah, it is possible to extract JSON response using Portia.
0 Votes
Bob Kolo posted almost 7 years ago
How is this done? I am getting no response from a url that returns json in chrome.
0 Votes
nestor posted almost 7 years ago Admin
What's the URL?
0 Votes
Bob Kolo posted almost 7 years ago
https://api.nbc.com/v3.14/videos?fields[videos]=title,description,type,genre,vChipRating,vChipSubRatings,guid,published,runTime,airdate,available,seasonNumber,episodeNumber,expiration,entitlement,tveAuthWindow,nbcAuthWindow,externalAdId,uplynkStatus,dayPart,internalId,keywords,permalink,embedUrl,credits,selectedCountries,copyright&fields[shows]=active,category,colors,creditTypeLabel,description,frontends,genre,internalId,isCoppaCompliant,name,navigation,overrideFeaturedVideoCol,reference,schemaType,shortDescription,shortTitle,showTag,social,sortTitle,tuneIn,type,urlAlias&fields[images]=derivatives,path,width,attributes&fields[seasons]=seasonNumber,contestantTitle&fields[genereticProperties]=showCollection.collections,reltioGuestCalendar&include=image,show.season,show.genereticProperties.showCollection.collections&derivatives=landscape.widescreen.size640.x1&filter[show]=384bac0b-0daf-4947-8f93-0f060fe3451b&filter[available][value]=2018-01-09T09:00:00-05:00&filter[available][operator]=<=&filter[type][value]=Full Episode&filter[type][operator]==&filter[seasonNumber]=4&page[number]=2&page[size]=6&sort=-airdate
0 Votes
nestor posted almost 7 years ago Admin Answer
Unfortunately, Splash cannot render that URL so it won't be possible to scrape with Portia. JSON responses are possible to scrape as long as Portia is able to render as HTML.
0 Votes
rpm61 posted almost 7 years ago
Is that why I can't seem to get any items out of this rather simple JSON response using regex?
https://jsonplaceholder.typicode.com/posts/1
I do get the response to load from this url, but my annotations produce no items, it starts trying to extract and just keeps spinnin'. Is there some trick to annotations on straight JSON, or does everything need to be framed in HTML?
0 Votes
nestor posted almost 7 years ago Admin
@rpm61 Enable JS in Portia and when you run the spider in Scrapinghub it should extract the JSON response.
0 Votes
Login to post a comment