User Guide
httpx-cache
provides:
- A sync/async
httpx
compatible caching client and/or transport. - Support for an in memeory dict cache and a file cache.
- Support for different serializers: dict, str, bytes, msgpack
Client
httpx
recommends usig a client instance of anything more that experimentation, one-off scripts, or prototypes.
Caching is one such advanced use cases, that's why httpx-cache
provides it's own Custom client that has exactly the same features as the original httpx.Client
(inherits from the httpx.Client
class), but wraps the default (or custom) transport in an httpx_cache.CacheControlTransport
.
Usage with Default Values
Excluding the caching algorithms, httpx_cache.Client
(or AsyncClient
) behaves similary to httpx.Client
(or AsyncClient
).
For caching, httpx_cache.Client
adds 3 new key-args to the table:
cache
: An optional value for which cache type to use, defaults to an in-memory dict cache if not provided.cacheable_methods
: tuple of str http methods that support caching (if a request does not use one of these methods, it's corresponding response will not be cached), defaults to('GET',)
cacheable_status_codes
: tuple of int http status codes that supports caching (if response does not have one of these status codes, it will not be cached), defaults to:(200, 203, 300, 301, 308)
always_cache
: bool, if True, all valid responses will be cached, regardless of theno-store
directive set in either the request or response, defaults to False.
Note: When using the httpx_cache
client or transport, a new property will be added to the response to specify whether it comes from cache or not: response.from_cache: bool
Example usage:
import httpx_cache
with httpx_cache.Client() as client:
response1 = client.get("https://httpbin.org/get") # will be cached
response2 = client.get("https://httpbin.org/get") # will get it from cache
assert response1.from_cache is False
assert response2.from_cache is True
AsyncClient
Same as httpx.AsyncClient
, httpx_cache
also provides an httpx_cache.AsyncClient
that supports samencaching args as httpx_cache.Client
.
import httpx_cache
async with httpx_cache.AsyncClient() as client:
response1 = await client.get("https://httpbin.org/get") # will be cached
response2 = await client.get("https://httpbin.org/get") # will get it from cache
assert response1.from_cache is False
assert response2.from_cache is True
Response Stream
When using a streaming response, the response will not be cached until the stream is fully consumed. The reason being that to cache a response we need it to have a content property and this content is set only when the user has fully consumed the stream.
(httpx_cache handles this automatically with a callback, it should have no effect on the user usual routines when using a stream.)
import logging
import tempfile
import rich.progress
from rich.logging import RichHandler
import httpx_cache
logging.basicConfig(
level="DEBUG", format="%(message)s", datefmt="[%X]", handlers=[RichHandler()]
)
logger = logging.getLogger("httpx_cache.example")
with tempfile.NamedTemporaryFile() as download_file:
url = "https://speed.hetzner.de/100MB.bin"
with httpx_cache.Client() as client:
logger.info(f"Running '{url}' download for the first time ...")
with client.stream("GET", url) as response:
total = int(response.headers["Content-Length"])
logger.info(
"A streaming response is cached only after the stream is consumed."
)
with rich.progress.Progress(
"[progress.percentage]{task.percentage:>3.0f}%",
rich.progress.BarColumn(bar_width=None),
rich.progress.DownloadColumn(),
rich.progress.TransferSpeedColumn(),
rich.progress.TimeElapsedColumn(),
) as progress:
download_task = progress.add_task("Download", total=total)
for chunk in response.iter_bytes():
download_file.write(chunk)
progress.update(
download_task, completed=response.num_bytes_downloaded
)
logger.info(f"Running same '{url}' download for the second time ...")
logger.info(
"The response is cached so it should take 0 seconds to iter over "
"the bin again !"
)
with client.stream("GET", url) as response2:
total = int(response2.headers["Content-Length"])
with rich.progress.Progress(
"[progress.percentage]{task.percentage:>3.0f}%",
rich.progress.BarColumn(bar_width=None),
rich.progress.DownloadColumn(),
rich.progress.TransferSpeedColumn(),
rich.progress.TimeElapsedColumn(),
) as progress:
download_task = progress.add_task("Download", total=total)
for chunk in response2.iter_bytes():
download_file.write(chunk)
progress.update(
download_task, completed=response2.num_bytes_downloaded
)
(This script is complete, it should run "as is")
Transport
If you prefer to use the original httpx Client, httpx-cache
also provides a transport that can be used dircetly with it:
The custom caching transport is created following the guilelines here.
The (Async-)CacheControlTransport
also accepts the 3 key-args:
cache
: An optional value for which cache type to use, defaults to an in-memory dict cache if not provided.cacheable_methods
: tuple of str http methods that support caching (if a request does not use one of these methods, it's corresponding response will not be cached), defaults to('GET',)
cacheable_status_codes
: tuple of int http status codes that supports caching (if response does not have one of these status codes, it will not be cached), defaults to:(200, 203, 300, 301, 308)
Note: When using the httpx_cache
client or transport, a new property will be added to the response to specify whether it comes from cache or not: response.from_cache: bool
import httpx
import httpx_cache
with httpx.Client(transport=httpx_cache.CacheControlTransport()) as client:
response = client.get("https://httpbin.org/get")
# async with httpx.AsyncClient(transport=httpx_cache.AsyncCacheControlTransport()) as client:
# response = await client.get("https://httpbin.org/get")
Cache Types
DictCache (default)
In-memory dict cache:
import httpx
import httpx_cache
with httpx_cache.Client(cache=httpx_cache.DictCache()) as client:
response = client.get("https://httpbin.org/get")
FileCache
import httpx_cache
with httpx_cache.Client(cache=httpx_cache.FileCache()) as client:
response = client.get("https://httpbin.org/get")
By default the cached files will be saved in $HOME/.cache/httpx-cache
folder.
It can be customized using the argument: cache_dir
:
import httpx_cache
with httpx_cache.Client(cache=httpx_cache.FileCache(cache_dir="./my-custom-dir")) as client:
response = client.get("https://httpbin.org/get")
fsspec/universal_pathlib integration
Filecache also works out of the box with fsspec/universal_pathlib so that you can use any filesystem supported by fsspec as a cachedir. Please check the fsspec/universal_pathlib docs for the list of supported filesystems (and schemes)
Example with an s3 filesystem:
(don't forget to also install the s3fs
package to use this backend: pip install universal_pathlib s3fs
)
import httpx_cache
from upath import UPath
cache_dir = UPath("s3://my-bucket/httpx-cache")
cache = httpx_cache.FileCache(cache_dir=cache_dir)
with httpx_cache.Client(cache=cache) as client:
response = client.get("https://httpbin.org/get")
# OR async client
# async with httpx_cache.AsyncClient(cache=cache) as client:
# response = await client.get("https://httpbin.org/get")
# should contain one file, with the cached response
print([f for f in cache_dir.iterdir()])
RedisCache
You need to install redis
package to use this cache type, or install httpx-cache[redis]
to install it automatically.
import httpx_cache
from httpx_cache.cache.redis import RedisCache
with httpx_cache.Client(cache=RedisCache(redis_url="redis://localhost:6379/0")) as client:
response = client.get("https://httpbin.org/get")
By default all cached responses are saved under the namespace htppx_cache
.
Optionally a TTL can be provided so that the cached responses expire after the given time (as a python timedelta).
It can also accepts direct instances of redis.Redis
or redis.StrictRedis
clients.
import httpx_cache
from redis import Redis
from httpx_cache.cache.redis import RedisCache
redis_client = Redis(host="localhost", port=6379, db=0)
cache = RedisCache(redis=redis_client, namespace="my-custom-namespace", default_ttl=timedelta(hours=1))
with httpx_cache.Client(cache=cache) as client:
response = client.get("https://httpbin.org/get")
Serializer Types
Before caching an httpx.Response it needs to be serialized to a cacheable format supported by the used cache type (Dict/File).
Serializer | DictCache | FileCache | RedisCache |
---|---|---|---|
DictSerializer | |||
StringJsonSerializer | |||
BytesJsonSerializer | |||
MsgPackSerializer |
A custom serializer can be used anytime with:
import httpx_cache
with httpx.Client(cache=httpx_cache.DictCache(serializer=httpx_cache.DictSerializer())) as client:
response = client.get("https://httpbin.org/get")
httpx-cache
provides the following serializers:
DictSerializer
The base serializer used in all other serializers, converts an httpx.Response
object into python dict that represents the response. The idea is that using the created dict we should be able to recreate exactly the same response.
The serialized dict has the following elements:
{
"status_code": "int, required, status code of the response",
"headers": "List[Tuple[str, str]], required, list of headers of the original response, can be an empty list",
"encoding": "str, optional, encoding of the response if not Null",
"_content": "bytes, optional, content of the response if exists (usually if stream is consumed, or response originally has just a basic content), if not found, 'stream_content' should be provided.",
"stream_content": "bytes, optional, in case the response contains a stream that is loaded only after the transport finishies his work, will be converted to an httpx.BytesStream when recreating the response."
}
StringJsonSerializer
Inherits from DictSerializer
, this is the result of json.dumps
of the above generated dict.
BytesJsonSerializer
Inherits from StringJsonSerializer
, utf-8
encoded json string.
MsgPackSerializer (default)
Inherits from DictSerializer
, this is the result of msgpack.dumps
of the above generated dict.