Goal: Ingest all order related data into our data lake. Question: How can we do this more efficiently considering both time and credit cost? We are also finding that api response time gets slower as we go through more pages of orders. Current method:
2 queries because we found it was faster this way rather than getting order data within date range
If it’s because of credits, then yes it’s a less expensive approach, but I don’t think it might save you much time and you’ll be making two requests instead of one.
Also for sorting credit limit usage you might also include some pause in between requests.
I do think both approaches should work fine
Let me know if it is because you are experiencing a specific issue when making those.
Thanks in advance!
We have taken that approach (Get all order ids first then get order detail by ids) due to api response time increasing dramatically when using cursor to go through orders once the order size is bigger than 4-5k. We increased the page size in our calls within quota limits and with this 2 step process, the total amount of calls and time spent was reduced. We have anywhere from ~4k to to ~74k orders per day. It is currently taking us ~1 calendar day to get ~1 week of data, with variance depending on orders per day. We got ~650k of orders from 2020-05-01 to 2020-05-31.
Total order id download : 4320 Next cursor : YXJyYXljb25uZWN0aW9uOjQzMTk= Rest API started :2020-06-17T22:32:37.148-07:00 Rest API ended :2020-06-17T22:32:47.453-07:00
10 seconds to return get order ids only
When reach 5k: Rest API started :2020-06-17T22:38:52.733-07:00, Rest API ended :2020-06-17T22:39:04.561-07:00
12 seconds
" request_id ":“5eeafdec7df300f2824d1682”
And with number increases it slow down further, 6400 : Rest API started :2020-06-17T22:43:48.536-07:00 Rest API ended :2020-06-17T22:44:03.038-07:00
15 seconds
" request_id ":“5eeaff14cfa49aee12c902b3”
Thanks for that @jwz !!
I see what you mean. Let me research a bit to see if there is something we could to to optimize that (because as there is lot of orders it might take that time to load all of them and then paginate)
Or if there is some alternative to what you are doing.
I will let you know asap what I find.
Thanks again!
Tom
Hi @jwz
I just wanted to provide a heads up about this filter updated_at:
It is only for the order object so it won’t show all cases, such as when a shipment is created but the status doesn’t change (For example, when you partially fulfill an order).
Let me know if this doesn’t change anything and you still be working with this filter.
Thanks again!
Tom
Hi @tomasw
Thanks for the heads up!
We are getting shipments and products separately like below so I think we are OK overall for updated data? Please let me know if not.
Shipments Query
{
shipments(date_from:“2020-04-20”, date_to:“2020-04-21”) {
complexity
request_id
data(first: 15, after: “”) {
pageInfo {
hasNextPage
hasPreviousPage
startCursor
endCursor
}
edges {
node {
id
legacy_id
order_id
user_id
warehouse_id
pending_shipment_id
address {
name
address1
address2
city
state
country
zip
phone
}
picked_up
needs_refund
refunded
delivered
shipped_off_shiphero
dropshipment
created_date
shipping_labels {
id
legacy_id
account_id
}
warehouse {
id
legacy_id
company_name
}
order {
id
legacy_id
}
line_items(first: 5, after: “”) {
pageInfo {
hasNextPage
hasPreviousPage
startCursor
endCursor
}
edges {
node {
line_item {
id
legacy_id
quantity
}
}
}
}
}
}
}
}
}
Hi @jwz
As for the product query are you expecting to see updates at inventory level?
The reason I ask this is that updated_at doesn’t work for inventory updates, but other updates such as Price, Dimensions, etc.
For the orders query + shipments I will have to ask around just to be sure I’m not missing any scenarios either. I will let you know what I can find about it
Hi @jwz
On inventory level you might want to try with the inventory_changes mutation, you can filter by date and it should return the inventory changes made during that period with their reason
Hi @jwz
The locations part may only work for Dynamic Slotting accounts. Your account is Static Slotting (which means you only have 1 location per product) I tested it and if you remove that part it should work
Something like this: