Continuing download via API without duplicating results

Hello!

I’m still very new to GraphQL queries, so I apologize if this is a dumb question.

I’m using the API to get picking stats (picks_per_day query), and I’m wondering about how to continue a download where I left off if it gets stopped for any reason, but I want to do it without getting duplicates for records I’ve already downloaded or missing any records I haven’t yet downloaded.

Let me give an example.

Let’s say I want to download a month’s worth of picks from the API. I setup my query to search between 2019-06-01 and 2019-06-30 and run it, and I loop through getting groups of 100 until I’ve exhausted the credits. When I check the results, maybe I’ve gotten as far as part way through the 20th. Since the credits refresh after an hour, I’ll have to wait to pick it back up, but when I do, I want to skip the ones I’ve already gotten for the first part of the 20th while making sure I get the remainder of the 20th.

So my questions are:

  1. Are the cursors that come down with the records fixed, or are they generated on the fly? In other words, if I save the last cursor I got from my previous results, can I use that to just pick up where I left off, even if it’s more than an hour later? What about if I change my date range to be 2019-06-20 through 2019-06-30 and still try to use the last cursor I downloaded?
  2. Are the cursors that are generated somewhat sequential? In other words, if I’m downloading records for the current day, but the day is still going on, and I come back later to download more records, will the cursors on the new records always be “higher” than the previously generated ones?

Thanks!

Hello Jeremy! Those are excellent questions btw.
For cursors, they may vary if data or sorting changes, so I believe it might not be the best approach.
Another option would be to break it down into days, so you can keep better track of credits.

Just out of curiosity: Is there any reason you are not using the app to get this information?
Thanks!

Thanks for the response, Tomas!

Another option would be to break it down into days, so you can keep better track of credits.

I group the days so I can save on credits. In my experience so far, if I request 100 records, but only 14 get returned, I’m still charged for the 100 credits. If I can group my days, then I can flow from one day to the next without losing those credits. Otherwise, getting each day individually would result in a lot of wasted credits.

Is there any reason you are not using the app to get this information?

I’m not sure what you’re asking. Which information are you referring to? The picks_per_day query?
Or information related to the technical side of how cursors work? I wasn’t even aware that there was an app. My understanding is that the picks_per_day query is only available via the API, but I’m interested to learn more.

Thanks!

Update:

I’ve been messing around with the picks_per_day query, and I see what tomasw means by the cursors changing if the sorting or data changes. Pulling overlapping date ranges yields different cursors for the same records. But this has me wondering…

Are there any downsides to always sending the exact same query, even days, weeks, or even months apart, but returning the records after the most recent cursor received?

For example, let’s say that today I query the picks_per_hour query and filter it to return everything after 2019-07-28 and order it by created_at (ascending) and order_number. (ascending). If I run it today, I’ll get back a certain set of results, and we’ll say that the last cursor I get back is “ae1yba8v6a4er9ace”. What would happen if I run it again tomorrow, sending that exact same query, still with the date of 2019-07-28 and the same sort, but instead of changing the date, I just use the “after” filter to return everything after the cursor “ae1yba8v6a4er9ace”?

For something like the picks_per_day query where the data is fairly sequential and the data doesn’t change except to gain more records, would this work?

Thanks!

@jeremyw cursors are tightly coupled with the query you are issuing, so using them with a different query will result in unexpected results.

You are right, picks per day is only available through the api, the previous suggestion to use the app was a missunderstanding, sorry about that.

In order to make querying more efficient you could query filtering by day, and there’s no need to always use the max default of 100. I would make the query to iterate by 10 or 20 items per call, making one extra to check for more (we are looking to fix the hasNextPage issue).

something like:

query {
  picks_per_day(date_from: "2019-07-29", date_to: "2019-07-30") {
    complexity
    data(first: 10) {
      edges {
        node {
          xxx
        }
      }
    }
  }
}

You can run this the 30th at 00:01, and then iterate adding after: xxx with the cursor from the last result, until you have no results.

query {
  picks_per_day(date_from: "2019-07-29", date_to: "2019-07-30") {
    complexity
    data(first: 10, after: "last item cursor") {
      edges {
        node {
          xxx
        }
      }
    }
  }
}

eventually you could alternate with a first: 1 to check if there are more results, to avoid wasting 20 credits when there are no more results.

1 Like