Bulk Data Download Functions
This page documents all the bulk data download functions the library offers. New functions are added based on community feedback.
To suggest a bulk download function which might be useful to be in the lib or to provide feedback/questions on existing ones, join our Discord Server
Below is a short description of the available bulk data functions in the library
Name |
Summary |
---|---|
Full Range Aggregates |
Historical OCHLV candles for a large duration |
Bulk Ticker Details |
Ticker Details for a date range |
Bulk Aggregate Bars (Full Range)
Available on both regular and async clients, this function makes it easy to get historical price history (OCHLV candles) for a large duration. For example One Minute candles for AMD for the past 15 years
How does the function work
Skip if you don’t care :D
This function attempts to work around the 50k candles’ limit set by polygon. 50k is enough for 1 month duration, but not for 15 years as you can tell
The library splits your specified date range into smaller chunks of time, gets data for them in parallel (threads/coroutines) or sequential (if you say so), merges all responses, drops duplicates & candles outside the specified range and finally returns a single list of all candles.
General Advice
If you are looking to use this functionality for MANY symbols (read more than 4-5), then it is better to use the async client. Due to GIL limitation in python, regular client can’t run more than 1 threadpool at a time.
For most people, the default values should be enough, but for the ones who hate themselves ( :P ), it is possible to customize the behavior however they like.
The concept is the same for all clients (stocks, options, forex and crypto). Knowing it once is enough for all other clients as they all have same method names.
Enough Talking, Show me how to use it
You may call this function in two ways,
Calling the usual
client.get_aggregate_bars()
method and passingfull_range=True
.Directly calling
client.get_full_range_aggregate_bars()
(added inv1.0.9
) (do NOT use for now. There is a know issue to be fixed in a coming release)
for example the below two calls are identical
# 'client' can be any client out of stocks, options, forex & crypto.
client.get_aggregate_bars('AMD', '2005-06-28', '2022-06-28', full_range=True)
# same effect as above, different call (added in v1.0.9)
client.get_full_range_aggregate_bars('AMD', '2005-06-28', '2022-06-28')
The output format is a single list of all candles.
Now that you know how to use it, below is info on how to customize how the function runs, to suit your architecture and requirements
By default: library runs an internal Threadpool (regular sync client) OR a set of coroutines (async client). Both run different smaller time chunks in parallel.
If you don’t want it to run in parallel (not recommended), you can just specify
run_parallel=False
. doing that will make the library request data one by one, using the last response received as the new start point until end date is reached. This might be useful if you’re running a thread pool of your own and don’t want the internal thread pool to mess with your own thread pool. on async client, always prefer to run parallelIn parallel (default) style run, if you deal with an asset which trades for a higher number hours in a day, you can add the additional argument
smaller_time_chunks=True
. This will make the library further reduce its time chunk size. This argument was renamed fromhigh_volatility
tosmaller_time_chunks
in v1.0.9By default, function will also print some warnings if they occur. You can turn off those warnings using
warnings=False
.When working with the parallel versions, you also have the ability to specify how many concurrent threads/coroutines you wish to spawn using
max_concurrent_workers=a new number
ONLY change it if you know you need it. This can sometimes help reduce loads or gain performance boost. The default isyour cpu core count * 5
By default, the results returned will be in ascending order (oldest candles first in the final output). To change that simply specify descending order. You can either pass an enum
polygon.enums.SortOrder
(recommended) or pass a stringsort='desc'
.
I want to do it manually, but could use some help
Sure. The function used by these functions internally to split a large date range into smaller time chunks, is also available to use directly.
It returns a list of time chunks with their respective start and end times. You can then use your own logic to get data for those chunks, process and merge it however you like.
The method you want to call is client.split_date_range()
, like so:
import polygon
client = polygon.StocksClient('KEY')
time_frames = client.split_date_range('2005-06-28', '2022-06-28', timespan='minute')
print(time_frames)
By default the list returned will have newer timeframes first. To change that, pass
reverse=False
The argument
smaller_time_chunks
is available here too and can be used for assets which are traded a high number of hours in a day. This argument was renamed fromhigh_volatility
tosmaller_time_chunks
in v1.0.9
Here is the method signature
- Base.split_date_range(start, end, timespan: str, high_volatility: bool = False, reverse: bool = True) list
Internal helper function to split a BIGGER date range into smaller chunks to be able to easily fetch aggregate bars data. The chunks duration is supposed to be different for time spans. For 1 minute bars, multiplier would be 1, timespan would be ‘minute’
- Parameters:
start – start of the time frame. accepts date, datetime objects or a string
YYYY-MM-DD
end – end of the time frame. accepts date, datetime objects or a string
YYYY-MM-DD
timespan – The frequency type. like day or minute. see
polygon.enums.Timespan
for choiceshigh_volatility – Specifies whether the symbol/security in question is highly volatile. If set to True, the lib will use a smaller chunk of time to ensure we don’t miss any data due to 50k candle limit. Defaults to False.
reverse – If True (the default), will reverse the order of chunks (chronologically)
- Returns:
a list of tuples. each tuple is in format
(start, end)
and represents one chunk of time frame
Bulk Ticker Details
Available on both regular and async clients, this function makes it easy to get ticker details for a specified ticker, for each day in a given date range.
It’s useful for quickly collecting data such as historical outstanding shares for a symbol.
How does the function work
Skip if you don’t care :D
This function would generate a final list of dates from the range of dates and/or custom dates.
The response for all dates is fetched in parallel (threads/coroutines) or sequential (if you say so)
The function returns an
OrderedDict
with the dates as keys and the ticker details as values.
General Advice
If you are looking to use this functionality for MANY symbols (read more than 4-5), then it is better to use the async client. Due to GIL limitation in python, regular client can’t run more than 1 threadpool at a time.
For most people, the default values should be enough, but for the ones who hate themselves ( :P ), it is possible to customize the behavior however they like.
The method is ONLY available on
ReferenceClient
for obvious reasons.
Enough Talking, Show me how to use it
Some example calls:
res = client.get_bulk_ticker_details('AMD', '2005-06-28', '2022-07-11')
res = client.get_bulk_ticker_details('AMD', from_date='2005-06-28', to_date='2022-07-11') # this & above are equivalent
res = client.get_bulk_ticker_details('NVDA', custom_dates=['2005-06-28', '2022-07-20']) # without date range
res = client.get_bulk_ticker_details('NVDA', from_date='2005-07-02', to_date='2022-07-11',
custom_dates=['2005-06-28', '2022-07-01']) # with custom dates and a range
- Return Value
The function returns an
OrderedDict
with the dates as keys and the ticker details as values. Iterating over the result would iterate over a fixed order (ascending by default) of the dates. You can setsort='desc'
to reverse.
Customizing Behavior:
When using async client, just await the method call.
res = await client.get_bulk_ticker_details(...)
You CAN supply both a date range (from-to) and custom_dates. You MUST supply either one of those. Duplicate dates are dropped by the library internally.
If you don’t want it to run in parallel (not recommended), you can just specify
run_parallel=False
. doing that will make the library request data one by one. This might be useful if you’re running a thread pool of your own and don’t want the internal thread pool to mess with your own thread pool. on async client, always prefer to run parallelBy default, function will also print some warnings if they occur. You can turn off those warnings using
warnings=False
.When working with the parallel versions, you also have the ability to specify how many concurrent threads/coroutines you wish to spawn using
max_concurrent_workers=a new number
ONLY change it if you know you need it. This can sometimes help reduce loads or gain performance boost. The default isyour cpu core count * 5
Here is the method signature:
- SyncReferenceClient.get_bulk_ticker_details(symbol: str, from_date=None, to_date=None, custom_dates: list | None = None, run_parallel: bool = True, warnings: bool = True, sort='asc', max_concurrent_workers: int = 10) OrderedDict
Get ticker details for a symbol for specified date range and/or dates. Each response will have detailed information about the ticker and the company behind it (on THAT particular date) Official Docs
- Parameters:
symbol – The ticker symbol to get data for
from_date – The start date of the date range. Must be specified if custom_dates is not supplied
to_date – The end date of the date range. Must be specified if custom_dates is not supplied
custom_dates – A list of dates, for which to get data for. You can specify this WITH a range. Each date can be a
date
,datetime
object or a stringYYYY-MM-DD
run_parallel – If true (the default), it will use an internal
ThreadPool
to get the responses in parallel. Note That since python has the GIL restrictions, it would mean that if you have a ThreadPool of your own, only one ThreadPool will be running at a time and the other pool will wait. set to False to get all responses in sequence (will take time)warnings – Defaults to True which prints warnings. Set to False to disable warnings.
sort – The order of sorting the final results. Defaults to ascending order of dates. See
polygon.enums.SortOrder
for choicesmax_concurrent_workers – This is only used if run_parallel is set to true. Controls how many worker threads are spawned in the internal thread pool. Defaults to
your cpu core count * 5
- Returns:
An OrderedDict where keys are dates, and values are corresponding ticker details.
I want to do it manually, but could use some help
Sure. The function used to get a list of unique, sorted dates between two dates, is also available to use directly. Call it like:
# client can be any client instance out of stocks, options, references, forex or crypto
all_dates = client.get_dates_between('2005-03-08', '2022-06-28')
all_dates = client.get_dates_between('2005-03-08', '2022-06-29', include_to_date=False)
You can then use your own logic to get data for these dates, process and aggregate them however you like. Here is the method signature
- Base.get_dates_between(from_date=None, to_date=None, include_to_date: bool = True) list
Get a list of dates between the two specified dates (from_date and to_date)
- Parameters:
from_date – The start date
to_date – The end date
include_to_date – Whether to include the end date in the list
- Returns:
A list of dates between the two specified dates