Start a new topic
Answered

Doing AutoExtract with PHP Curl

I am trying to extract content from a url with php. 

 

$curl = curl_init();
curl_setopt_array($curl, [
  CURLOPT_HTTPHEADER => [
    'Content-Type: application/json',
    'APIKEY: myapikeymyapikey'
  ],
  CURLOPT_RETURNTRANSFER => 1,
  CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
  CURLOPT_POST => 1,
  CURLOPT_POSTFIELDS => [
    'url' => 'https://blog.scrapinghub.com/gopro-study',
    'pageType' => 'article'
  ]
]);
$resp = curl_exec($curl);
curl_close($curl);
return $resp;


 

I get the error


{

title"No authentication token provided",

type"http://errors.xod.scrapinghub.com/unauthorized.html"

}


Best Answer

Try:


    CURLOPT_POSTFIELDS => json_encode([[
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ]])



Try passing your APIKEY with CURLOPT_USERNAME not as a header.

I guess you mean CURLOPT_USERPWD?

The code below Ibrings the same error.

  

$curl = curl_init();
$username = 'myapikey';
$password = '';
curl_setopt_array($curl, [
    CURLOPT_HTTPHEADER => [
      'Content-Type: application/json'
    ],
    CURLOPT_USERPWD => "$username:$password",
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
    CURLOPT_POST => 1,
    CURLOPT_HTTPAUTH => CURLAUTH_BASIC,
    CURLOPT_POSTFIELDS => http_build_query([
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ])
]);
$resp = curl_exec($curl);
curl_close($curl);
return $resp;

  

CURLOPT_USERNAME and CURLOPT_PASSWORD also exist, but yeah CURLOPT_USERPWD would work too.


Which APIKey are you using? I looked into your account and I don't see an Autoextract subscription.

Don't post the key here, I'm just wondering if you have another account.

I am using the one API key on my account at https://app.scrapinghub.com/account/apikey

Do I need a paid subscription to test the Autoextract service?

That's the Scrapy Cloud API Key, it doesn't work for the Autoextract.


You can subscribe to Autoextract for a 14 day free trial or 10k requests on your billing page on your org. You will then get an API Key for it.

How to sign up for the Autoextract key while already having the Cloud API Key? I tried to sign up here but it told me my username is already in use since I already have an account. https://scrapinghub.com/autoextract

Login to your account and on the left side there's a billing tab, open that page and select Autoextract, then just follow the checkout page.

Thanks Nestor. Ill update you on this

It got past the error. I guess the documentation should have made that clear that the main API Key cannot work for other APIs

Im dealing with another error:

{
detail: "an array of dicts is expected",
title: "Malformed JSON",
type: "http://errors.xod.scrapinghub.com/malformed-json.html"
}

 My code is:

  

$curl = curl_init();
$username = 'myapikey';
$password = '';
curl_setopt_array($curl, [
    CURLOPT_HTTPHEADER => [
      'Content-Type: application/json',
    ],
    CURLOPT_USERPWD => "$username:$password",
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
    CURLOPT_HTTPAUTH => CURLAUTH_BASIC,
    CURLOPT_POST => 1,
    CURLOPT_POSTFIELDS => http_build_query([
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ])
]);
$resp = curl_exec($curl);
curl_close($curl);
return $resp;

  

Answer

Try:


    CURLOPT_POSTFIELDS => json_encode([[
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ]])


Thanks! That finally works.

You're welcome!

I tried to scrape multiple car details from below website. It contain around 720 Tesla model cars with pagination. I tried to scrape those cars in curl


https://suchen.mobile.de/fahrzeuge/search.html?damageUnrepaired=NO_DAMAGE_UNREPAIRED&isSearchRequest=true&makeModelVariant1.makeId=135&scopeId=C&sfmr=false&sortOption.sortBy=creationTime&sortOption.sortOrder=DESCENDING

<?php
$curl = curl_init();
$username = 'My api key';
$password = '';
curl_setopt_array($curl, [
    CURLOPT_HTTPHEADER => [
      'Content-Type: application/json',
    ],
    CURLOPT_USERPWD => "$username:$password",
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
    CURLOPT_HTTPAUTH => CURLAUTH_BASIC,
    CURLOPT_POST => 1,
    CURLOPT_POSTFIELDS => json_encode([
    ['url' => 'https://suchen.mobile.de/fahrzeuge/search.html?damageUnrepaired=NO_DAMAGE_UNREPAIRED&isSearchRequest=true&makeModelVariant1.makeId=135&scopeId=C&sfmr=false&sortOption.sortBy=creationTime&sortOption.sortOrder=DESCENDING
',
        'pageType' => 'vehicle'
    ]])
]);
$resp = curl_exec($curl);
curl_close($curl);
echo "<pre>";print_r(json_decode($resp));
?>
 


I'm always getting below error

Proxy error: banned


Please provide me api code that support php
Login to post a comment