I have a scrapy script that works locally, but when I deploy it to Scrapinghub, it's giving all errors. Upon debugging, the error is coming from Yielding the item.
This is the error I get.
ERROR [scrapy.utils.signal] Error caught on signal handler: <bound method ?.item_scraped of <sh_scrapy.extension.HubstorageExtension object at 0x7fd39e6141d0>> Less
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "/usr/local/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/usr/local/lib/python2.7/site-packages/sh_scrapy/extension.py", line 45, in item_scraped
item = self.exporter.export_item(item)
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 304, in export_item
result = dict(self._get_serialized_fields(item))
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 75, in _get_serialized_fields
value = self.serialize_field(field, field_name, item[field_name])
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 284, in serialize_field
return serializer(value)
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 290, in _serialize_value
return dict(self._serialize_dict(value))
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 300, in _serialize_dict
key = to_bytes(key) if self.binary else key
File "/usr/local/lib/python2.7/site-packages/scrapy/utils/python.py", line 117, in to_bytes
'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got int
It doesn't specify the field with issues, but by process of elimination, I came to realize it's this part of the code:
try:
item["media"] = {}
media_index = 0
media_content = response.xpath("//audio/source/@src").extract_first()
if media_content is not None:
item["media"][media_index] = {}
preview = item["media"][media_index]
preview["Media URL"] = media_content
preview["Media Type"] = "Audio"
media_index += 1
except IndexError:
print "Index error for media " + item["asset_url"]
I cleared some parts up to make it easier to tackle, but basically this part is the issue. Something it doesn't like about the item media.
I'm beginner in both Python and Scrapy. So sorry if this turns out to be silly basic Python mistake.
0 Votes
1 Comments
thriveniposted
almost 6 years ago
Admin
Hi,
I see that the recent jobs do not have the errors. It would be great if you can share the solution that you put in place as it would help other users facing similar issue.
I have a scrapy script that works locally, but when I deploy it to Scrapinghub, it's giving all errors. Upon debugging, the error is coming from Yielding the item.
This is the error I get.
It doesn't specify the field with issues, but by process of elimination, I came to realize it's this part of the code:
I cleared some parts up to make it easier to tackle, but basically this part is the issue. Something it doesn't like about the item media.
I'm beginner in both Python and Scrapy. So sorry if this turns out to be silly basic Python mistake.
0 Votes
1 Comments
thriveni posted almost 6 years ago Admin
Hi,
I see that the recent jobs do not have the errors. It would be great if you can share the solution that you put in place as it would help other users facing similar issue.
0 Votes
Login to post a comment