Doing AutoExtract with PHP Curl

Posted almost 5 years ago by techytimo

Post a topic
Answered
t
techytimo

I am trying to extract content from a url with php. 

 

$curl = curl_init();
curl_setopt_array($curl, [
  CURLOPT_HTTPHEADER => [
    'Content-Type: application/json',
    'APIKEY: myapikeymyapikey'
  ],
  CURLOPT_RETURNTRANSFER => 1,
  CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
  CURLOPT_POST => 1,
  CURLOPT_POSTFIELDS => [
    'url' => 'https://blog.scrapinghub.com/gopro-study',
    'pageType' => 'article'
  ]
]);
$resp = curl_exec($curl);
curl_close($curl);
return $resp;


 

I get the error


{

title"No authentication token provided",

type"http://errors.xod.scrapinghub.com/unauthorized.html"

}

0 Votes

nestor

nestor posted almost 5 years ago Admin Best Answer

Try:


    CURLOPT_POSTFIELDS => json_encode([[
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ]])


0 Votes


15 Comments

Sorted by
p

promoautotest posted about 4 years ago

  $ch = curl_init();

        curl_setopt($ch, CURLOPT_URL, 'https://autoextract.scrapinghub.com/v1/extract');

        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

        curl_setopt($ch, CURLOPT_POST, 1);

        curl_setopt($ch, CURLOPT_POSTFIELDS, "[{\"url\": \"https://www.automoto.it/listino\", \"pageType\": \"vehicle\"}]");

        curl_setopt($ch, CURLOPT_USERPWD, 'My Key ' . ':' . '');

        curl_setopt($ch, CURLOPT_ENCODING, 'gzip, deflate');

        

        $headers = array();

        $headers[] = 'Content-Type: application/json';

        curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

        

        $result = curl_exec($ch);

        $json_string = json_decode($result, true);   => couldn't get all car details please help . 


0 Votes

e

elbilportalendk posted over 4 years ago

I tried to scrape multiple car details from below website. It contain around 720 Tesla model cars with pagination. I tried to scrape those cars in curl


https://suchen.mobile.de/fahrzeuge/search.html?damageUnrepaired=NO_DAMAGE_UNREPAIRED&isSearchRequest=true&makeModelVariant1.makeId=135&scopeId=C&sfmr=false&sortOption.sortBy=creationTime&sortOption.sortOrder=DESCENDING

<?php
$curl = curl_init();
$username = 'My api key';
$password = '';
curl_setopt_array($curl, [
    CURLOPT_HTTPHEADER => [
      'Content-Type: application/json',
    ],
    CURLOPT_USERPWD => "$username:$password",
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
    CURLOPT_HTTPAUTH => CURLAUTH_BASIC,
    CURLOPT_POST => 1,
    CURLOPT_POSTFIELDS => json_encode([
    ['url' => 'https://suchen.mobile.de/fahrzeuge/search.html?damageUnrepaired=NO_DAMAGE_UNREPAIRED&isSearchRequest=true&makeModelVariant1.makeId=135&scopeId=C&sfmr=false&sortOption.sortBy=creationTime&sortOption.sortOrder=DESCENDING
',
        'pageType' => 'vehicle'
    ]])
]);
$resp = curl_exec($curl);
curl_close($curl);
echo "<pre>";print_r(json_decode($resp));
?>
 


I'm always getting below error

Proxy error: banned


Please provide me api code that support php

0 Votes

nestor

nestor posted almost 5 years ago Admin

You're welcome!

0 Votes

t

techytimo posted almost 5 years ago

Thanks! That finally works.

0 Votes

nestor

nestor posted almost 5 years ago Admin Answer

Try:


    CURLOPT_POSTFIELDS => json_encode([[
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ]])


0 Votes

t

techytimo posted almost 5 years ago

Im dealing with another error:

{
detail: "an array of dicts is expected",
title: "Malformed JSON",
type: "http://errors.xod.scrapinghub.com/malformed-json.html"
}

 My code is:

  

$curl = curl_init();
$username = 'myapikey';
$password = '';
curl_setopt_array($curl, [
    CURLOPT_HTTPHEADER => [
      'Content-Type: application/json',
    ],
    CURLOPT_USERPWD => "$username:$password",
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
    CURLOPT_HTTPAUTH => CURLAUTH_BASIC,
    CURLOPT_POST => 1,
    CURLOPT_POSTFIELDS => http_build_query([
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ])
]);
$resp = curl_exec($curl);
curl_close($curl);
return $resp;

  

0 Votes

t

techytimo posted almost 5 years ago

It got past the error. I guess the documentation should have made that clear that the main API Key cannot work for other APIs

0 Votes

t

techytimo posted almost 5 years ago

Thanks Nestor. Ill update you on this

0 Votes

nestor

nestor posted almost 5 years ago Admin

Login to your account and on the left side there's a billing tab, open that page and select Autoextract, then just follow the checkout page.

0 Votes

t

techytimo posted almost 5 years ago

How to sign up for the Autoextract key while already having the Cloud API Key? I tried to sign up here but it told me my username is already in use since I already have an account. https://scrapinghub.com/autoextract

0 Votes

nestor

nestor posted almost 5 years ago Admin

That's the Scrapy Cloud API Key, it doesn't work for the Autoextract.


You can subscribe to Autoextract for a 14 day free trial or 10k requests on your billing page on your org. You will then get an API Key for it.

0 Votes

t

techytimo posted almost 5 years ago

I am using the one API key on my account at https://app.scrapinghub.com/account/apikey

Do I need a paid subscription to test the Autoextract service?

0 Votes

nestor

nestor posted almost 5 years ago Admin

CURLOPT_USERNAME and CURLOPT_PASSWORD also exist, but yeah CURLOPT_USERPWD would work too.


Which APIKey are you using? I looked into your account and I don't see an Autoextract subscription.

Don't post the key here, I'm just wondering if you have another account.

0 Votes

t

techytimo posted almost 5 years ago

I guess you mean CURLOPT_USERPWD?

The code below Ibrings the same error.

  

$curl = curl_init();
$username = 'myapikey';
$password = '';
curl_setopt_array($curl, [
    CURLOPT_HTTPHEADER => [
      'Content-Type: application/json'
    ],
    CURLOPT_USERPWD => "$username:$password",
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
    CURLOPT_POST => 1,
    CURLOPT_HTTPAUTH => CURLAUTH_BASIC,
    CURLOPT_POSTFIELDS => http_build_query([
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ])
]);
$resp = curl_exec($curl);
curl_close($curl);
return $resp;

  

0 Votes

nestor

nestor posted almost 5 years ago Admin

Try passing your APIKEY with CURLOPT_USERNAME not as a header.

0 Votes

Login to post a comment