Get orders data more efficiently

Hi,

Goal: Ingest all order related data into our data lake.
Question: How can we do this more efficiently considering both time and credit cost? We are also finding that api response time gets slower as we go through more pages of orders.
Current method:
2 queries because we found it was faster this way rather than getting order data within date range

1. Query for order ID within date range

{
orders(updated_from: “2020-04-20 00:00:00”, updated_to: “2020-04-20 23:59:59”,) {
complexity
request_id
data(first: 90, sort: “order_number”) {
pageInfo {
hasNextPage
hasPreviousPage
startCursor
endCursor
}
edges {
node {
id
}
}
}
}
}

2. Query for order data using order ID

{
order(id: “T3JkZXI6MTA4NDMyODE2”) {
complexity
data {
authorizations {
transaction_id
authorized_amount
postauthed_amount
refunded_amount
card_type
date
}
line_items(first: 90, after:"") {
pageInfo {
hasNextPage
hasPreviousPage
startCursor
endCursor
}
edges {
node {
id
legacy_id
sku
partner_line_item_id
quantity
price
product_name
option_title
fulfillment_status
quantity_pending_fulfillment
quantity_shipped
warehouse
quantity_allocated
backorder_quantity
custom_options
custom_barcode
eligible_for_return
customs_value
warehouse_id
locked_to_warehouse_id
subtotal
barcode
created_at
updated_at
order_id
product {
id
legacy_id
account_id
}
serial_numbers {
id
legacy_id
line_item_id
serial_number
scanned
created_at
updated_at
}
}
}
}
rma_labels {
id
legacy_id
account_id
order_id
rma_id
shipment_id
shipping_name
tracking_number
status
carrier
shipping_method
cost
box_code
dimensions {
weight
height
width
length
}
address {
name
address1
address2
city
state
country
zip
phone
}
paper_pdf_location
thermal_pdf_location
pdf_location
image_location
delivered
picked_up
refunded
needs_refund
profile
full_size_to_print
partner_fulfillment_id
created_at
updated_at
}
returns {
id
legacy_id
account_id
order_id
partner_id
reason
status
label_type
label_cost
cost_to_customer
shipping_carrier
shipping_method
exchanges {
id
legacy_id
exchange_order_id
return_id
account_id
exchange_order {
id
line_items {
edges {
node {
id
}
}
}
}
original_return {
id
}
}
dimensions {
weight
height
width
length
}
address {
name
address1
address2
city
state
country
zip
phone
}
line_items {
id
legacy_id
account_id
line_item_id
warehouse_id
product_id
return_id
quantity
condition
is_component
type
reason
created_at
updated_at
line_item {
id
}
warehouse {
id
}
}
created_at
}
id
legacy_id
order_number
partner_order_id
shop_name
fulfillment_status
order_date
total_tax
subtotal
total_discounts
total_price
auto_print_return_label
custom_invoice_url
account_id
email
profile
gift_note
packing_note
required_ship_date
shipping_lines {
title
carrier
method
price
}
tags
holds {
fraud_hold
address_hold
shipping_method_hold
operator_hold
payment_hold
}
flagged
saturday_delivery
ignore_address_validation_errors
priority_flag
allocation_priority
shipping_address {
first_name
last_name
company
address1
address2
city
state
state_code
zip
country
country_code
email
phone
}
billing_address {
first_name
last_name
company
address1
address2
city
state
state_code
zip
country
country_code
email
phone
}
shipping_lines {
title
carrier
method
price
}
tags
flagged
saturday_delivery
ignore_address_validation_errors
priority_flag
allocation_priority
shipments {
id
legacy_id
order_id
user_id
warehouse_id
pending_shipment_id
address {
name
address1
address2
city
state
country
zip
phone
}
picked_up
needs_refund
refunded
delivered
shipped_off_shiphero
dropshipment
created_date
shipping_labels {
id
legacy_id
account_id
}
warehouse {
id
legacy_id
}
order {
id
legacy_id
}
line_items(first: 100) {
edges {
node {
line_item {
id
legacy_id
quantity
}
}
}
}
}
holds {
fraud_hold
address_hold
shipping_method_hold
operator_hold
payment_hold
}
}
}
}

Thank you!

1 Like

Hi, would like an answer to this as well. We have several integrations stalled due to the credit limits.

Hi @jwz @Devin_Mabra
So you are making something like this:

query {
  orders(order_date_from:"2020-03-04",order_date_to:"2020-03-05") {
    request_id
    complexity
    data(first: 10) {
      edges {
        node {
          id
          legacy_id
        }
      }
    }
  }
}  

And then with that ID doing:

query {
  order(id: "117384888") {
    request_id
    complexity
    data {
      id
      legacy_id
      order_number
      partner_order_id
      shop_name
      fulfillment_status
      order_date
      line_items(first: 5) {
        edges {
          node {
            id
            legacy_id
            sku
            partner_line_item_id
            quantity
            subtotal
            barcode
            created_at
            updated_at
            order_id
          }
        }
      }
    }
  }
}

Is that correct?

In that case why not do it at once with:

query {
  orders(order_date_from: "2020-03-04", order_date_to: "2020-03-05") {
    request_id
    complexity
    data(first: 10) {
      edges {
        node {
          id
          legacy_id
          order_number
          partner_order_id
          shop_name
          fulfillment_status
          order_date
          line_items(first: 5) {
            edges {
              node {
                id
                legacy_id
                sku
                partner_line_item_id
                quantity
                subtotal
                barcode
                created_at
                updated_at
                order_id
              }
            }
          }
        }
      }
    }
  }
}

If it’s because of credits, then yes it’s a less expensive approach, but I don’t think it might save you much time and you’ll be making two requests instead of one.

Also for sorting credit limit usage you might also include some pause in between requests.

I do think both approaches should work fine

Let me know if it is because you are experiencing a specific issue when making those.
Thanks in advance!

Thanks for your response!

We have taken that approach (Get all order ids first then get order detail by ids) due to api response time increasing dramatically when using cursor to go through orders once the order size is bigger than 4-5k. We increased the page size in our calls within quota limits and with this 2 step process, the total amount of calls and time spent was reduced. We have anywhere from ~4k to to ~74k orders per day. It is currently taking us ~1 calendar day to get ~1 week of data, with variance depending on orders per day. We got ~650k of orders from 2020-05-01 to 2020-05-31.

Update with example request IDs

Total order id download : 4320
Next cursor : YXJyYXljb25uZWN0aW9uOjQzMTk=
Rest API started :2020-06-17T22:32:37.148-07:00
Rest API ended :2020-06-17T22:32:47.453-07:00
10 seconds to return get order ids only

Query
{“query”:"{ orders(updated_from: “2020-05-29”, updated_to: “2020-05-30”) { complexity request_id data(first: 80, after: “YXJyYXljb25uZWN0aW9uOjQzOTk=” ) { pageInfo { hasNextPage hasPreviousPage startCursor endCursor } edges { node { id } } } } } " }

" request_id ":“5eeafcaa7726b9dae07b6181”

When reach 5k:
Rest API started :2020-06-17T22:38:52.733-07:00,
Rest API ended :2020-06-17T22:39:04.561-07:00
12 seconds
" request_id ":“5eeafdec7df300f2824d1682”

And with number increases it slow down further, 6400 :
Rest API started :2020-06-17T22:43:48.536-07:00
Rest API ended :2020-06-17T22:44:03.038-07:00
15 seconds
" request_id ":“5eeaff14cfa49aee12c902b3”

Thank you!

Thanks for that @jwz !!
I see what you mean. Let me research a bit to see if there is something we could to to optimize that (because as there is lot of orders it might take that time to load all of them and then paginate)
Or if there is some alternative to what you are doing.
I will let you know asap what I find.
Thanks again!
Tom

Hi @jwz
I just wanted to provide a heads up about this filter updated_at:
It is only for the order object so it won’t show all cases, such as when a shipment is created but the status doesn’t change (For example, when you partially fulfill an order).
Let me know if this doesn’t change anything and you still be working with this filter.
Thanks again!
Tom

Hi @tomasw
Thanks for the heads up!
We are getting shipments and products separately like below so I think we are OK overall for updated data? Please let me know if not.

Shipments Query

{
shipments(date_from:“2020-04-20”, date_to:“2020-04-21”) {
complexity
request_id
data(first: 15, after: “”) {
pageInfo {
hasNextPage
hasPreviousPage
startCursor
endCursor
}
edges {
node {
id
legacy_id
order_id
user_id
warehouse_id
pending_shipment_id
address {
name
address1
address2
city
state
country
zip
phone
}
picked_up
needs_refund
refunded
delivered
shipped_off_shiphero
dropshipment
created_date
shipping_labels {
id
legacy_id
account_id
}
warehouse {
id
legacy_id
company_name
}
order {
id
legacy_id
}
line_items(first: 5, after: “”) {
pageInfo {
hasNextPage
hasPreviousPage
startCursor
endCursor
}
edges {
node {
line_item {
id
legacy_id
quantity
}
}
}
}
}
}
}
}
}

Products Query

{
products(updated_from: “2020-04-20”, updated_to: “2020-04-21”) {
data(first: 10, after: “”, sort: “”) {
pageInfo {
hasNextPage
hasPreviousPage
startCursor
endCursor
}
edges {
node {
id
legacy_id
account_id
name
sku
barcode
country_of_manufacture
dimensions {
weight
height
width
length
}
tariff_code
kit
kit_build
no_air
final_sale
customs_value
customs_description
not_owned
dropship
needs_serial_number
thumbnail
large_thumbnail
created_at
updated_at
tags
images {
src
position
}
vendors {
vendor_id
vendor_sku
price
}
warehouse_products {
id
legacy_id
account_id
warehouse_id
warehouse_identifier
price
value
value_currency
on_hand
inventory_bin
inventory_overstock_bin
reserve_inventory
replenishment_level
reorder_amount
reorder_level
backorder
allocated
available
custom
customs_value
created_at
updated_at
}
}
}
}
}
}

Hi @jwz
As for the product query are you expecting to see updates at inventory level?
The reason I ask this is that updated_at doesn’t work for inventory updates, but other updates such as Price, Dimensions, etc.

For the orders query + shipments I will have to ask around just to be sure I’m not missing any scenarios either. I will let you know what I can find about it

Thanks again!
Tom

Hi @tomasw
Yes, we do want updates on inventory level. What would you recommend?
Thank you!

Hi @jwz
On inventory level you might want to try with the inventory_changes mutation, you can filter by date and it should return the inventory changes made during that period with their reason

Something like:

query {
  inventory_changes(date_from: "2020-06-30") {
    request_id
    complexity
    data(first: 25) {
      edges {
        node {
          user_id
          account_id
          warehouse_id
          sku
          previous_on_hand
          change_in_on_hand
          reason
          cycle_counted
          location_id
          created_at
          location {
            id
          }
        }
      }
    }
  }
}

Hi @tomasw
Thank you!

We got some errors, could you also take a look at these?

    {
        "message": "Unexpected Error",
        "operation": "inventory_changes",
        "field": "id",
        "request_id": "5f04b2e20fa8bfa5ea730093",
        "code": 22
    },

Hi @jwz
The locations part may only work for Dynamic Slotting accounts. Your account is Static Slotting (which means you only have 1 location per product) I tested it and if you remove that part it should work
Something like this:

{
  inventory_changes(date_from: "2020-07-06") {
    request_id
    complexity
    data(first: 25) {
      edges {
        node {
          user_id
          account_id
          warehouse_id
          sku
          previous_on_hand
          change_in_on_hand
          reason
          cycle_counted
          location_id
          created_at
        }
      }
    }
  }
}

Let me know if that looks better
Thanks again!