Getting Error caught on signal handler: <bound method ? on Yield
S
Skirmish Abdullah
started a topic
over 5 years ago
I have a scrapy script that works locally, but when I deploy it to Scrapinghub, it's giving all errors. Upon debugging, the error is coming from Yielding the item.
This is the error I get.
ERROR [scrapy.utils.signal] Error caught on signal handler: <bound method ?.item_scraped of <sh_scrapy.extension.HubstorageExtension object at 0x7fd39e6141d0>> Less
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "/usr/local/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/usr/local/lib/python2.7/site-packages/sh_scrapy/extension.py", line 45, in item_scraped
item = self.exporter.export_item(item)
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 304, in export_item
result = dict(self._get_serialized_fields(item))
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 75, in _get_serialized_fields
value = self.serialize_field(field, field_name, item[field_name])
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 284, in serialize_field
return serializer(value)
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 290, in _serialize_value
return dict(self._serialize_dict(value))
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 300, in _serialize_dict
key = to_bytes(key) if self.binary else key
File "/usr/local/lib/python2.7/site-packages/scrapy/utils/python.py", line 117, in to_bytes
'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got int
It doesn't specify the field with issues, but by process of elimination, I came to realize it's this part of the code:
try:
item["media"] = {}
media_index = 0
media_content = response.xpath("//audio/source/@src").extract_first()
if media_content is not None:
item["media"][media_index] = {}
preview = item["media"][media_index]
preview["Media URL"] = media_content
preview["Media Type"] = "Audio"
media_index += 1
except IndexError:
print "Index error for media " + item["asset_url"]
I cleared some parts up to make it easier to tackle, but basically this part is the issue. Something it doesn't like about the item media.
I'm beginner in both Python and Scrapy. So sorry if this turns out to be silly basic Python mistake.
1 Comment
thriveni
said
over 5 years ago
Hi,
I see that the recent jobs do not have the errors. It would be great if you can share the solution that you put in place as it would help other users facing similar issue.
Skirmish Abdullah
I have a scrapy script that works locally, but when I deploy it to Scrapinghub, it's giving all errors. Upon debugging, the error is coming from Yielding the item.
This is the error I get.
It doesn't specify the field with issues, but by process of elimination, I came to realize it's this part of the code:
I cleared some parts up to make it easier to tackle, but basically this part is the issue. Something it doesn't like about the item media.
I'm beginner in both Python and Scrapy. So sorry if this turns out to be silly basic Python mistake.