Release notes¶
Scrapy 2.4.1 (2020-11-17)¶
- Fixed feed exports overwrite support (issue 4845, issue 4857, issue 4859)
- Fixed the AsyncIO event loop handling, which could make code hang (issue 4855, issue 4872)
- Fixed the IPv6-capable DNS resolver
CachingHostnameResolverfor download handlers that callreactor.resolve(issue 4802, issue 4803) - Fixed the output of the
genspidercommand showing placeholders instead of the import part of the generated spider module (issue 4874) - Migrated Windows CI from Azure Pipelines to GitHub Actions (issue 4869, issue 4876)
Scrapy 2.4.0 (2020-10-11)¶
Highlights:
Python 3.5 support has been dropped.
The
file_pathmethod of media pipelines can now access the source item.This allows you to set a download file path based on item data.
The new
item_export_kwargskey of theFEEDSsetting allows to define keyword parameters to pass to item exporter classesYou can now choose whether feed exports overwrite or append to the output file.
For example, when using the
crawlorrunspidercommands, you can use the-Ooption instead of-oto overwrite the output file.Zstd-compressed responses are now supported if zstandard is installed.
In settings, where the import path of a class is required, it is now possible to pass a class object instead.
Modified requirements¶
Python 3.6 or greater is now required; support for Python 3.5 has been dropped
As a result:
- When using PyPy, PyPy 7.2.0 or greater is now required
- For Amazon S3 storage support in feed exports or media pipelines, botocore 1.4.87 or greater is now required
- To use the images pipeline, Pillow 4.0.0 or greater is now required
(issue 4718, issue 4732, issue 4733, issue 4742, issue 4743, issue 4764)
Backward-incompatible changes¶
CookiesMiddlewareonce again discards cookies defined inRequest.headers.We decided to revert this bug fix, introduced in Scrapy 2.2.0, because it was reported that the current implementation could break existing code.
If you need to set cookies for a request, use the
Request.cookiesparameter.A future version of Scrapy will include a new, better implementation of the reverted bug fix.
Deprecation removals¶
scrapy.extensions.feedexport.S3FeedStorageno longer reads the values ofaccess_keyandsecret_keyfrom the running project settings when they are not passed to its__init__method; you must either pass those parameters to its__init__method or useS3FeedStorage.from_crawler(issue 4356, issue 4411, issue 4688)Rule.process_requestno longer admits callables which expect a singlerequestparameter, rather than bothrequestandresponse(issue 4818)
Deprecations¶
- In custom media pipelines, signatures that
do not accept a keyword-only
itemparameter in any of the methods that now support this parameter are now deprecated (issue 4628, issue 4686) - In custom feed storage backend classes,
__init__method signatures that do not accept a keyword-onlyfeed_optionsparameter are now deprecated (issue 547, issue 716, issue 4512) - The
scrapy.utils.python.WeakKeyCacheclass is now deprecated (issue 4684, issue 4701) - The
scrapy.utils.boto.is_botocore()function is now deprecated, usescrapy.utils.boto.is_botocore_available()instead (issue 4734, issue 4776)
New features¶
The following methods of media pipelines now accept an
itemkeyword-only parameter containing the source item:- In
scrapy.pipelines.files.FilesPipeline:file_downloaded()file_path()media_downloaded()media_to_download()
- In
scrapy.pipelines.images.ImagesPipeline:file_downloaded()file_path()get_images()image_downloaded()media_downloaded()media_to_download()
- In
The new
item_export_kwargskey of theFEEDSsetting allows to define keyword parameters to pass to item exporter classes (issue 4606, issue 4768)Feed exports gained overwrite support:
- When using the
crawlorrunspidercommands, you can use the-Ooption instead of-oto overwrite the output file - You can use the
overwritekey in theFEEDSsetting to configure whether to overwrite the output file (True) or append to its content (False) - The
__init__andfrom_crawlermethods of feed storage backend classes now receive a new keyword-only parameter,feed_options, which is a dictionary of feed options
- When using the
Zstd-compressed responses are now supported if zstandard is installed (issue 4831)
In settings, where the import path of a class is required, it is now possible to pass a class object instead (issue 3870, issue 3873).
This includes also settings where only part of its value is made of an import path, such as
DOWNLOADER_MIDDLEWARESorDOWNLOAD_HANDLERS.Downloader middlewares can now override
response.request.If a downloader middleware returns a
Responseobject fromprocess_response()orprocess_exception()with a customRequestobject assigned toresponse.request:- The response is handled by the callback of that custom
Requestobject, instead of being handled by the callback of the originalRequestobject - That custom
Requestobject is now sent as therequestargument to theresponse_receivedsignal, instead of the originalRequestobject
- The response is handled by the callback of that custom
When using the FTP feed storage backend:
- It is now possible to set the new
overwritefeed option toFalseto append to an existing file instead of overwriting it - The FTP password can now be omitted if it is not necessary
- It is now possible to set the new
The
__init__method ofCsvItemExporternow supports anerrorsparameter to indicate how to handle encoding errors (issue 4755)When using asyncio, it is now possible to set a custom asyncio loop (issue 4306, issue 4414)
Serialized requests (see Jobs: pausing and resuming crawls) now support callbacks that are spider methods that delegate on other callable (issue 4756)
When a response is larger than
DOWNLOAD_MAXSIZE, the logged message is now a warning, instead of an error (issue 3874, issue 3886, issue 4752)
Bug fixes¶
- The
genspidercommand no longer overwrites existing files unless the--forceoption is used (issue 4561, issue 4616, issue 4623) - Cookies with an empty value are no longer considered invalid cookies (issue 4772)
- The
runspidercommand now supports files with the.pywfile extension (issue 4643, issue 4646) - The