Start a new topic
Answered

Doing AutoExtract with PHP Curl

I am trying to extract content from a url with php. 

 

$curl = curl_init();
curl_setopt_array($curl, [
  CURLOPT_HTTPHEADER => [
    'Content-Type: application/json',
    'APIKEY: myapikeymyapikey'
  ],
  CURLOPT_RETURNTRANSFER => 1,
  CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
  CURLOPT_POST => 1,
  CURLOPT_POSTFIELDS => [
    'url' => 'https://blog.scrapinghub.com/gopro-study',
    'pageType' => 'article'
  ]
]);
$resp = curl_exec($curl);
curl_close($curl);
return $resp;


 

I get the error


{

title"No authentication token provided",

type"http://errors.xod.scrapinghub.com/unauthorized.html"

}


Best Answer

Try:


    CURLOPT_POSTFIELDS => json_encode([[
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ]])



  $ch = curl_init();

        curl_setopt($ch, CURLOPT_URL, 'https://autoextract.scrapinghub.com/v1/extract');

        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

        curl_setopt($ch, CURLOPT_POST, 1);

        curl_setopt($ch, CURLOPT_POSTFIELDS, "[{\"url\": \"https://www.automoto.it/listino\", \"pageType\": \"vehicle\"}]");

        curl_setopt($ch, CURLOPT_USERPWD, 'My Key ' . ':' . '');

        curl_setopt($ch, CURLOPT_ENCODING, 'gzip, deflate');

        

        $headers = array();

        $headers[] = 'Content-Type: application/json';

        curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

        

        $result = curl_exec($ch);

        $json_string = json_decode($result, true);   => couldn't get all car details please help . 


I tried to scrape multiple car details from below website. It contain around 720 Tesla model cars with pagination. I tried to scrape those cars in curl


https://suchen.mobile.de/fahrzeuge/search.html?damageUnrepaired=NO_DAMAGE_UNREPAIRED&isSearchRequest=true&makeModelVariant1.makeId=135&scopeId=C&sfmr=false&sortOption.sortBy=creationTime&sortOption.sortOrder=DESCENDING

<?php
$curl = curl_init();
$username = 'My api key';
$password = '';
curl_setopt_array($curl, [
    CURLOPT_HTTPHEADER => [
      'Content-Type: application/json',
    ],
    CURLOPT_USERPWD => "$username:$password",
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
    CURLOPT_HTTPAUTH => CURLAUTH_BASIC,
    CURLOPT_POST => 1,
    CURLOPT_POSTFIELDS => json_encode([
    ['url' => 'https://suchen.mobile.de/fahrzeuge/search.html?damageUnrepaired=NO_DAMAGE_UNREPAIRED&isSearchRequest=true&makeModelVariant1.makeId=135&scopeId=C&sfmr=false&sortOption.sortBy=creationTime&sortOption.sortOrder=DESCENDING
',
        'pageType' => 'vehicle'
    ]])
]);
$resp = curl_exec($curl);
curl_close($curl);
echo "<pre>";print_r(json_decode($resp));
?>
 


I'm always getting below error

Proxy error: banned


Please provide me api code that support php

You're welcome!

Thanks! That finally works.

Answer

Try:


    CURLOPT_POSTFIELDS => json_encode([[
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ]])


Im dealing with another error:

{
detail: "an array of dicts is expected",
title: "Malformed JSON",
type: "http://errors.xod.scrapinghub.com/malformed-json.html"
}

 My code is:

  

$curl = curl_init();
$username = 'myapikey';
$password = '';
curl_setopt_array($curl, [
    CURLOPT_HTTPHEADER => [
      'Content-Type: application/json',
    ],
    CURLOPT_USERPWD => "$username:$password",
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
    CURLOPT_HTTPAUTH => CURLAUTH_BASIC,
    CURLOPT_POST => 1,
    CURLOPT_POSTFIELDS => http_build_query([
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ])
]);
$resp = curl_exec($curl);
curl_close($curl);
return $resp;

  

It got past the error. I guess the documentation should have made that clear that the main API Key cannot work for other APIs

Thanks Nestor. Ill update you on this

Login to your account and on the left side there's a billing tab, open that page and select Autoextract, then just follow the checkout page.

How to sign up for the Autoextract key while already having the Cloud API Key? I tried to sign up here but it told me my username is already in use since I already have an account. https://scrapinghub.com/autoextract

That's the Scrapy Cloud API Key, it doesn't work for the Autoextract.


You can subscribe to Autoextract for a 14 day free trial or 10k requests on your billing page on your org. You will then get an API Key for it.

I am using the one API key on my account at https://app.scrapinghub.com/account/apikey

Do I need a paid subscription to test the Autoextract service?

CURLOPT_USERNAME and CURLOPT_PASSWORD also exist, but yeah CURLOPT_USERPWD would work too.


Which APIKey are you using? I looked into your account and I don't see an Autoextract subscription.

Don't post the key here, I'm just wondering if you have another account.

I guess you mean CURLOPT_USERPWD?

The code below Ibrings the same error.

  

$curl = curl_init();
$username = 'myapikey';
$password = '';
curl_setopt_array($curl, [
    CURLOPT_HTTPHEADER => [
      'Content-Type: application/json'
    ],
    CURLOPT_USERPWD => "$username:$password",
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_URL => 'https://autoextract.scrapinghub.com/v1/extract',
    CURLOPT_POST => 1,
    CURLOPT_HTTPAUTH => CURLAUTH_BASIC,
    CURLOPT_POSTFIELDS => http_build_query([
        'url' => 'https://blog.scrapinghub.com/gopro-study',
        'pageType' => 'article'
    ])
]);
$resp = curl_exec($curl);
curl_close($curl);
return $resp;

  

Try passing your APIKEY with CURLOPT_USERNAME not as a header.

Login to post a comment