Unexplained missing records in bulk query

Hi there,

I’m actively troubleshooting missing records from warehouse_products, and I’m troubled by what I’ve found.

I’m using two queries: a bulk query which gets me all the warehouse_product records between two creation dates and an individual SKU query (which gets me the results of an individual SKU).

For this example, I’m looking for warehouse_products created between 2023-01-01 and 2023-02-01. When I get the results back from the bulk query, I would expect to find a SKU created on 2023-01-03 in the first page, but it isn’t there.

Here’s more information on the queries:

    1. request_id from bulk query: 63dd52821209bdd36ce5f223
    1. request_id from individual query: 63dd52c8b7c1372eb6a5a893

I’m quite concerned that we can’t trust the data coming out of the GraphQL API. Can someone please help us determine a root cause?

All the best,
Colin

Hey @cschouten,

Thanks for hanging in there!
Looking into this now, I’ll have an update for you shortly.

Best,
RayanP

Thanks @sh-agent.

I’m afraid this data gap is causing operational issues, so please let us know once a root cause is determined.

All the best,
Colin

Hey Colin!

Thanks for hanging in there and I apologize for the delay here.

When looking at the 100th result on page 1 of your query, I am seeing SKU: 40218850164835. This SKU has the created_at time of: “2023-01-03T01:26:11” according to the query I ran.

When I queried the SKU 44464870981878, which you expected to be on page 1 of your results in the former query; the created_at time returned was: “2023-01-03T00:13:26”.

This seems to make sense to me as that 40218850164835 was made hours before 44464870981878.

Let me know if I missed anything here, or if this doesn’t make any sense.

Best,
RayanP

Hi @sh-agent. I’m still confused, but I might’ve provided the wrong request_ids.

You mentioned that SKU 40218850164835 should be the 100th result on Page 1 of the response. When I re-ran the bulk query (getting all warehouse_product records from 2023-01-01 to 2023-02-01), I don’t see the result. I would expect 40218850164835 to be on the first page as again it was created on “2023-01-03T01:26:11” and the last record on the bulk query response has a created_at timestamp of “2023-01-04T15:32:09.”

Bulk query request_id: 63e17ff3773f927790c0278e
Individual query request_id: 63e1806c1c0e06d17fd5c90b

Would you mind taking a look? Thanks!

All the best,
Colin

To clarify, what we’re ultimately trying to determine is if we’re missing warehouse_product records when we make requests using that “bulk query.” The aforementioned results leads me to believe that’s the case.

All the best,
Colin

Hi @sh-agent,

Any updates on this issue?

All the best,
Colin

Hello @cschouten!

I believe whats happening here is the sorting order of the results.

By default they are sorted by warehouse_id - updated_at. You can use this value “YXJyYXljb25uZWN0aW9uOjExOTk=” as your after cursor which corresponds to position 1199, and you will see that in that page you get the jump from one WH to another.

You can try updated the code to this: data(first: $first, after: $after sort:"created_at") and they will be ordered by creation datetime regardless or their warehouse_id and updated_at date.

Please let me know if this helps.

Have a nice day!
TomasFD

1 Like

Thanks @tomasfd for that clarity. That did help!

All the best,
Colin

1 Like