videocamWeb Data Extraction Summit - September 30th, 2021.
Join some of the greatest minds in web scraping to educate, inspire, and innovate.
Register for free!
Start a new topic

Difference in FormRequest Post action in python 3 against python2

Hi Team,


Below FormRequest post action is working fine in python 2 for scrapy:1.5 but now working in scrapy:1.5-py3. Having callback_method information in meta dict is causing failure. Please highlight or suggest work around.


yield FormRequest(url=login_url, callback=self.after_login, formdata=form_data
,meta={'followup_url':url,'callback_method':callback_method})


Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
GeneratorExit

Unhandled error in Deferred:

[twisted] Unhandled error in Deferred:

Traceback (most recent call last): Less

File "/usr/local/lib/python3.6/site-packages/twisted/internet/base.py", line 1243, in run
self.mainLoop()
File "/usr/local/lib/python3.6/site-packages/twisted/internet/base.py", line 1252, in mainLoop
self.runUntilCurrent()
File "/usr/local/lib/python3.6/site-packages/twisted/internet/base.py", line 878, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/local/lib/python3.6/site-packages/twisted/internet/task.py", line 671, in _tick
taskObj._oneWorkUnit()
--- <exception caught here> ---
File "/usr/local/lib/python3.6/site-packages/twisted/internet/task.py", line 517, in _oneWorkUnit
result = next(self._iterator)
File "/usr/local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 63, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scraper.py", line 183, in _process_spidermw_output
self.crawler.engine.crawl(request=output, spider=spider)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 210, in crawl
self.schedule(request, spider)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 216, in schedule
if not self.slot.scheduler.enqueue_request(request):
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scheduler.py", line 57, in enqueue_request
dqok = self._dqpush(request)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scheduler.py", line 86, in _dqpush
self.dqs.push(reqd, -request.priority)
File "/usr/local/lib/python3.6/site-packages/queuelib/pqueue.py", line 35, in push
q.push(obj) # this may fail (eg. serialization error)
File "/usr/local/lib/python3.6/site-packages/scrapy/squeues.py", line 15, in push
s = serialize(obj)
File "/usr/local/lib/python3.6/site-packages/scrapy/squeues.py", line 27, in _pickle_serialize
return pickle.dumps(obj, protocol=2)
builtins.TypeError: can't pickle _thread.lock objects


[twisted] Less

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twisted/internet/task.py", line 517, in _oneWorkUnit
result = next(self._iterator)
File "/usr/local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 63, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scraper.py", line 183, in _process_spidermw_output
self.crawler.engine.crawl(request=output, spider=spider)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 210, in crawl
self.schedule(request, spider)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 216, in schedule
if not self.slot.scheduler.enqueue_request(request):
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scheduler.py", line 57, in enqueue_request
dqok = self._dqpush(request)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scheduler.py", line 86, in _dqpush
self.dqs.push(reqd, -request.priority)
File "/usr/local/lib/python3.6/site-packages/queuelib/pqueue.py", line 35, in push
q.push(obj) # this may fail (eg. serialization error)
File "/usr/local/lib/python3.6/site-packages/scrapy/squeues.py", line 15, in push
s = serialize(obj)
File "/usr/local/lib/python3.6/site-packages/scrapy/squeues.py", line 27, in _pickle_serialize
return pickle.dumps(obj, protocol=2)
TypeError: can't pickle _thread.lock objects




Please note that above statement is working fine in local environment for python 3.6. Issue only when running via scrapinghub

 

Hi Team,

     Was able to make it work in scrapy:1.5-py3 using getattr()

response.meta['callback_url'] = response.url
response.meta['callback_method'] = callback_method.__name__
yield FormRequest(url=login_url, callback=self.after_login, formdata=form_data, meta=response.meta)
def after_login(self, response):
yield response.follow(url=response.meta['callback_url'], callback=getattr(self,response.meta['callback_method']),
meta=response.meta)

Explanation on why earlier code failed when passing function via FormRequest meta under scrapy:1.5-py3 would be helpful


Login to post a comment