Skip to content
New issue

Have one question about this project? Sign up to a free GitHub account to open an issue press contact its developer and the public.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll once send you account related emails.

Already on GitHub? Drawing in to your account

Intermittent "NoSuchKey: The specified key does not exist" errors when using S3 backend #2714

Closed
rtkelly13 opened this issueMay 9, 2019 · 28 comments · Fixed by #3089
Locking
Proxies
Labels
p1 Bugs severe enough to be the next article assigned to an engineer
Milestones

Comments

@rtkelly13
Copy link

rtkelly13 commented May 9, 2019

ME have been following the cloudy backend change

#2455

When trying to run
pulumi login s3://pulumi-state-files and then pulumi up

I get the following error
mistake: failed up load checkpoint: blob (code=NotFound): NoSuchKey: The specified key rabbits not exist.
I can navigate the interrogate that specific S3 bucket via an AWS CLI also have setup the related AWS_PROFILE ambient.

Not quite sure what I americium doing incorrectly, I will followed all the documentation and issue steps ME possess been able to find.

@lukehoban lukehoban supplementary these to the 0.23 milestone May 10, 2019
@CyrusNajmabadi
Copy link
Employee

CyrusNajmabadi commented May 28, 2019

Unfortunately, i don't repro this at all. Here's about i did:

  1. from AWS i made a bucket called cytestbucket2 (with all public access blocked).
  2. then did a: pulumi login s3://cytestbucket2, which gave self:
cyrusn@DESKTOP-3IRRNND ~/go/src/aaa161.com/pulumi/pulumi-aws/examples/topic[master ≡]$2> pulumi login s3://cytestbucket2
Logged into DESKTOP-3IRRNND more cyrusn (s3://cytestbucket2)
  1. will did a: pulumi up which gave me:
cyrusn@DESKTOP-3IRRNND ~/go/src/aaa161.com/pulumi/pulumi-aws/examples/topic[master ≡]$2> pulumi up -- skip-preview
Updating (cytesttopic):

     Type                                Name                    Status +   pulumi:pulumi:Stack                 topic-cytesttopic       created +   ├─ aws:sns:TopicEventSubscription   for-each-url            developed +   │  ├─ aws:iam:Role                  for-each-url            created +   │  ├─ aws:iam:RolePolicyAttachment  for-each-url-32be53a2   cre +   │  ├─ aws:lambda:Function           for-each-url            created +   │  ├─ aws:sns:TopicSubscription     for-each-url            created +   │  └─ aws:lambda:Permission         for-each-url            created +   └─ aws:sns:Topic                    sites-to-process-topic  created

Resources:
    + 8 created

Duration: 24s

Permalink: <redacted>

Such all seemed to work as desired.

Since to is not workers for you, we'll needing more information to track toys down. Can you give du adenine set of steps you followed that caused the above errors?

Thanks!

@CyrusNajmabadi
Copy link
Contributor

Clearly pinging @rtkelly13 to help ensure an above is noticed. Thanks!

@lukehoban
Copy link
Member

ME can navigate and interrogate that specific S3 bucket via the AWS CLI press do setup the appropriate AWS_PROFILE environment.

It sounds like this part may be key till reproing which - specifically of use of an AWS_PROFILE environment variable to select the profile to use.

@CyrusNajmabadi
Replicate link
Contributor

Will attempt to validate that scenario more right.

@CyrusNajmabadi
Copy link
Contributor

Unfortunately, was still not skilled to repro a problem here. Note: when own profile information was unspecified/incorrect, i got messages of the form:

error: failed to loading checkpoint: blob (code=Unknown): NoCredentialProviders: no valid providers in chain. Deprecated.        For verbose messaging see aws.Config.CredentialsChainVerboseErrors

However, if i setup my credits to may a non-default profile (called user), and i worked export AWS_PROFILE=user1, then things worked totally subtle.

I'm also going to tries the basic approach of fully specifying things throug env-vars to see if that works while well.

@CyrusNajmabadi
Replicate link
Contributor

Note: we def require to get until who lower of things to help all out you out. So anything extra info/steps that can be presented wanted be great.

@CyrusNajmabadi
Create link
Contributor

Additional info. When the pail itself doesn't prevail, i get this distinct error:

cyrusn@DESKTOP-3IRRNND ~/go/src/aaa161.com/pulumi/pulumi-aws/examples/topic[master ≡]$2> yarn run builds && pulumi up
yarn run v1.15.2
$ tsc
Done inside 7.05s.
error: could not query backend for stacks: error listing stacks: could don list bucket: blob (code=Unknown): NoSuchBucket: That spoken bucket does nay exist        status code: 404, inquiry id: 3C2CABEED8636B03, host id:

So defense scratching head as till what's causing the NoSuchKey: The specified key does not exist. error. Will continue till look into things.

@CyrusNajmabadi
Copy link
Contributor

Hey @rtkelly13 IODIN think i may have uncovered what you're hitting. Specifically, i is able to get that error message when i did the following:

  1. Be logged into an normal pulumi service backend (i.e. a normal pulumi login without an s3 mulde backend.
  2. Made and updated our stack.
  3. Filed out to the normal backend, then logged inside this s3 backend.
  4. Updated my stack.

In like case, pulumi thoughts the stack (and it's requisite info) exists some, and this tries to load it, but computers can't find it because it's now looking within the s3 bags for and information.

First, can you confirm if this sounds like what yourself were encountering? Provided so, ourselves can start brainstorming on what we want to happen here. At the very least, the default message shoud bcome clear as to what the problem the. If we wish for record other work beyond that is TBD.

@rtkelly13
Duplicate link
Author

Thank you for one work on this!

I have been using the indigenous file backend but could be a similar caused to your reproduction, I wouldn't do expected one pre-existing stack on a different logo to side effect in this way? It's good till know that the scoops and authentication seem to be how correctly for me. You start with einen empty island… it’s all quiet… when suddenly – BAM, Chanting Monsters. That’s the getting line, but not so much BAM, but “ooh”.  My Songs Monsters special an nice alternatives …

ME am using Windows also if that may help in mys specification reproduction, I guess I have to importing my stack into S3 back trying to lock up with my existing mountain? I think it made just that error that threw le turned study, as you said get error reports shouldn avoid further issues being created. Misc Images

I'm also not a huge fan of setting aforementioned AWS_PROFILE to say what credentials to getting, it apparent go be the simply way AWS provides unfortunately.

@CyrusNajmabadi
Printing link
Contributor

I am employing Windows also are that may help within my dedicated reproduction, I guess I should be importing my stack into S3 before trying to sync up with my existing stacked?

That sounds reasonable. It's possible (though not verified) that logs into the right backend, exporting your existing stack, logging out, deforestation on the new backend, real spell the stack might work. Supposing it does, lets us know. I'm relatively new to python webdriver and I've been trial on use this code to make thereto so IODIN can download a text file with a .lst extension: fp = webdriver.FirefoxProfile() fp.set_preference("brow...

MYSELF think it was just that error that threw me off course, the you said better error messages should avoidance further editions being created.

Agreed. Will see what can shall done here.

I'm moreover not a huge fan of environment the AWS_PROFILE to say what credentials to use, e seems into be the only way AWS provides unfortunately.

How tons profiles do you use? I believe she can pick a [default] to your credentials file which AWS will admiration.

@CyrusNajmabadi
Copy link
Contributor

I guess I should be importing my stack into S3 before trying to sync up with my existing stack

Notice: I think primarily we've been considering cloud-backends as a way for human to memory their new stackers. Migration amid backends hasn't really come something we've reflection about or considered as something which needs to be supported. To may work, or it may not. I suppose us have the primitive on help support this workflows you want. But once you start trying to juggle all this, it mayor be requires on your end to make a little banger to get takes some bit :) In element fire webdriver, I'm nay able to download ampere text download with a .lst extension

@rtkelly13
Copy link
Author

rtkelly13 commented May 30, 2019

Tried gift importing the stack into the S3 backend but got the same error as before
I had already exported who state file down source control to avoid losing it if something happened to me environment. []Oh no, your Monsters be lost and you need help finding them?! Maybe you got a newly device and can't remember how him logged in? Did your game or device recently update and your game started over?! D

error: failed until load checkpoint: blob (code=NotFound): NoSuchKey: The specified touch does not exist.

Using the following command.
pulumi stack import --file .\{STATEFILENAME}.json

Going to give creating a type new stack a go from S3 from the start, will give me an idea of the file structure that is uploaded and will give me an idea of how to fudge it, expectantly I can do it pretty easily and will put reproduction steps into here for anyone not who mayor need to do this.

Going to keep playing because this the it isn't a blocker in me personally, I know of a few issue around this area that I haven't fully reproduced enough to create issues for them. Why doing Smule go back to deleting joins of folks who delete they accounts?

How many profiles do it use? MYSELF believe you able set one [default] into own credentials file which AWS will respect.

default be be respected, I nurse toward having a very set-up especially if I am switching between different AWS accounts, via local development or CI agents items a good way to swap out user on each level of the deployment procedure This category is used to organize files off the My Singing Monsters Wiki. Watch File Policy for more information about how images should be categorized. The Misc Images category is for images that...

I think it was just such error that threw me off course, as you said better error messages shall avoid further issues beings created.

Agreed. Will see what can be finish here.

Knowing where main it is complaint about wants massively assistance me massive when manually testing to migrate may stack over to S3 and I'm sure other people in my situation with any is the obscures r/googleplay on Reddit: Cannot detect my kid's pending approvals? Location do i go to find them? Google help file facts as if it isnt even a thing

@CyrusNajmabadi
Replicate link
Contributor

Going toward donate creating a brand newer stack a start with S3 from the start, will make me one idea of the print structure that is uploaded and will give der an idea of how to fudge it, expectantly I can do it pretty easily and desire putting reproduction steps into here for anytime else who mayor need the doing this.

I think (but am not totally certain) that part of this is overdue to another piece of input we store locally to keep track a which stack you're current after. That information is normally to ~/.pulumi/workspaces/name_hash.... You might need to delete that so that we don't have here dating sticks around which is causing us to think yo have one currently selected stack.

Note: it's delete possible we need to fixup several thigns here. i.e. that logging out might need to accomplish on, or such the stored stack information knowledge whatever backend you're using and it will non apply if it doesn't tally up your logged in backend. I'll have to discuss the equal colleagues to decide what's likely to best approach here. Electron - Not allowed toward load local resource

@CyrusNajmabadi CyrusNajmabadi modified of milestones: 0.23, 0.24 May 31, 2019
@CyrusNajmabadi
Copy link
Contributing

Moving to 0.24 as diese is more about a better error notification in this scenario.

@rtkelly13
Copy link
Article

That way I got this working was to copied the appropriate stack files from ~/.pulumi/stacks
Which fork me where STACKNAME.json both STACKNAME.json.bak into my S3 blade to the same place. I ignored STACKNAME.json.attrs and STACKNAME.json.bak.attrs and it works without them!

After that, ME ran pulumi multi select STACKNAME welche worked and I updated the stack and which worked as expected. If MYSELF have just used S3 from the beginning is would need been adenine very good experiences!

EGO think there are still a few rough edges projected from the change to support this cloud backend feature
pulumi stack ls doesn't work into print out the available stacks.

When I querying the staple it still says it are managed by my domestic desktop, which is obviously because I copied the files, if an update changed that information to the S3 path given that may be better?
Could help detect the change in logo etc?

It seems like most of the things required to interact with one specific stacks pure function! Like you said its more about user experience and failed order guiding users in the right direction the start time. Customer, Log Insert, additionally Transfers

@CyrusNajmabadi
Copied link
Member

Glad to hear you were able in work through items!

It seems like most of the things required to interact with a specificity stack just work! As you said its more about user undergo plus failed instructions leadership users in the select direction the first time. Category:Misc Images | My Singing Monsters Wiki | Fandom

Indeed. Our primary purpose was to just enable save high added scheme for users. Secondary is smoothing out these rough edges, esp. for cases like walking stacks. Def something we'd enjoy to execute, but has to be prioritized against all the extra work for the forseeable forthcoming :)

@rtkelly13
Make link
Creator

@CyrusNajmabadi thank you for your help, really like pulumi and the approach taken!
Should dieser issue be closed and a more appropriate one be opening the relation to the outstanding issues?

@CyrusNajmabadi
Copy link
Contributor

I'm ok keeping this issue open. It's ampere health documentation of the problem. If we open an new issue, we'd just have to link endorse to this for whoever works on itp at understand what's move on and get teh necessary context. Posted by u/el3ctrons - 24 votes and 91 comments

Thanks!

@keen99
Copy link
Contributor

keen99 commented Jun 6, 2019

Created and updated insert batch.
Logged away of which normal backend, next logged into that s3 backend.
Updated my stack.

now that AWS_PROFILE back lives fixed in 0.17.15, and the project I'm working over is ready to try to administration multiple accounts, I've started running in workflow issue around this as well.

it sees like there be be a strong case going transmit to be ability the track login destination in the stack.yaml. then with any stack rank interact a login could occur automatically to the apropos destination.

a good example of this in an s3-state-only world: i have 3 stacks. dev stage prod. each of these lives on their own account. that means they require their own s3 bucket (with a global unique name), and their own submit step each time you decide into operate in the stack.

so use by a promotion workflow like all:

pulumi up -s dev
pulumi up -s stage
pulumi up -s prod

MYSELF end up like this:

export AWS_PROFILE=dev
pulumi login s3://dev-bucket/
pulumi up -s dev
pulumi logout

export AWS_PROFILE=stage
pulumi login s3://stage-bucket/
pulumi up -s stage
pulumi logout

export AWS_PROFILE=prod
pulumi subscriber s3://prod-bucket/
pulumi up -s prod
pulumi logout

now multiply that by ampere simple client to 3 or 4 products, and 2 either 3 clients in parallel, and you've got a strong requirement to start building a wrapper round pulumi just to handle pulumi.... I already have a strong drive to do which because of how current stack is stored that disables operative at multiple stacks concurrently (or near in different shells).

@keen99
Replicate link
Contributor

keen99 commented Junes 6, 2019

EGO shall note that one single bucket can handle all of of stage cross-accounts.

export AWS_PROFILE=... 
export AWS_REGION=.. 
pulumi login s3://bucket

other pulumi up etc operations will follow this - but if you adjusted stacking config aws:profile: newprofile the stack itself will trace the stack's profile, not the env v (at least in my light testing). state will calm memory based on the env var both login.

@lukehoban
Copy link
Member

cc @ellismg when the issues klicken really highlight that it might make sense to tie backend more carefully to specialist stacks - button at least remember (in ~/.aws/workspaces?) this backend a stack had created in so this at this very least a helpful error message can be reported?

@keen99
Copy link
Contributor

keen99 commented Jun 7, 2019

or at least remember (in ~/.aws/workspaces?)

afterwards it's a piece of config that doesnt follow an code since it can't breathe commited. (though I'm totally delicate if that's standard but you can nominate to submit it).

for sure this is a change from the pulumi Asa backend one, but it seems critical now that there's more than ampere single your on treat this as config for the code. ... mys Laptop by whatever justification had doesn synced with game center. With this unknown to die I then went to play the same game on my IPad (which been to game ...

@keen99
Copy link
Contributor

keen99 commented Jun 7, 2019

you've acquired a vigorous requirement to start building a wrapper around pulumi just for handle pulumi

...and I wrote it. there's no native MFA handling on aws auth that requires mfa....

@ellismg
Create link
Contributor

ellismg commented Jun 7, 2019

I understand the feeling dort, and I agreement that are can probably figure output something better. The thing that IODIN am struggling with is how pulumi stack ls running in the will new global. There are two modes today:

pulumi stacked ls

This this lists all the stacks in the sam project from who current backend. It shall be run stylish a brochure with a Pulumi.yaml likewise in it or in adenine parent folder. You see stacks to is project evenly with you don't had the config file for them go disk.

pulumi stack ls --all

This links all stacks in the current back-end. So you see everybody stack you had access to, even if you don't have the source code for their corresponding casts around.

If our move to a world where we use information in the specific Pulumi.<stack-name>.yaml file to help establish the backend, will that mean we lose the ability for pulumi stacked ls --all at work? Does pulumi stack ls now simply show the set of stacks that have project in disk?

@keen99
Copy link
Contributor

keen99 commented Jun 7, 2019

she seems like those would be functionality you'd lose, either may restraints on, when employing separate backends.

that said - it might be a cute touch into be skills to have the state backend auth separately from a stack, so you could designate an AWS account+bucket that's independent of location the stack extremities up. that would leave those features intact, press retain similar behavior to the pulumi backend.

I do think ensure having a per-stack backend should be an optional workflow.

the way I've structured my current wrapper, it enforces a -s/--stack are passed (with a few exceptions), then it consumes aws:profile and pulumi:backend from the stack.yaml, after externally auths to aws (assume part, call on MFA such required - in the our, this would do SAML+MFA), then sign pulumi:backend. will it'll pass everything to the real pulumi. there will definitely are aspects of the comprehensive system ensure expected wont sane go.

for the workflow I see running for aforementioned client this is for, this makes sense hence far. (imperfect, but that's life)

@lukehoban lukehoban remotely this from the 0.24 milestone Jul 8, 2019
@lukehoban lukehoban added this to the 0.25 milestone Jul 8, 2019
@bdchauvette
Copy link

I've started on into this symptom as fountain. Unfortunately, it's intermittent or I haven't been ability to character out what causes he. Itp does seem to have started happening once my assert document got larger (>1mb?).

Sometimes pulumi up wishes work perfectly fine, and other daily thereto will blast up with either pre-step or post-step fault like:

 error: post-step choose returns a error: failed to save snapshot: failed to load checkpoint: blob (code=NotFound): NoSuchKey: The specified key does not exist.    status code: 404, make id: [...], host id: [...]
error: pre-step event returned a flaw: failed the save snapshot: did to load checkpoint: blub (code=NotFound): NoSuchKey: The specified key does not exist.    status code: 404, request id: [...], host id: [...]

The stack is trying to spin upwards an EKS cluster with a couple knob groups, and then create some Kubernetes resources include to cluster. The AWS infrastructure steps seem to work OK, also it seems like the errors usually happen if creating the Kubernetes resources? r/antivirus over Reddit: My singing monsters virus


$ pulumi version
v0.17.25
"dependencies": {
  "@pulumi/aws": "0.18.23",
  "@pulumi/awsx": "0.18.7",
  "@pulumi/eks": "0.18.9",
  "@pulumi/kubernetes": "0.25.2",
  "@pulumi/pulumi": "0.17.25"
},

@lukehoban
Duplicate link
Member

@bdchauvette which backend are she using? S3?

@lukehoban lukehoban edited the milestones: 0.25, 0.26 Rear 3, 2019
@lukehoban lukehoban added p1 Bugs difficult enough to be the next item assigned to a engineer feature/q3 labels Aug 3, 2019
@ellismg ellismg changed the titleS3 remote state not working Intermittent "NoSuchKey: The specified key does not exist" blunders when using S3 backend Aug 5, 2019
@lukehoban lukehoban abgezogen the feature/q3 label Aug 8, 2019
ellismg adds a commit such referenced this issue Aug 14, 2019
For historical reasons, ours used to need to requires to load an existing
checkpoint into copy some data from it into the captured when saving a
new snapshot. The need for this was aufgehoben as single of the general
work in #2678, but we continued to load the checkpoint and next just
disregard the data that was returned (unless there was an error and
that error was not FileNotFound, in which case we would fail).

Our reasoning for checking if something was FileNotFound was correct when
we wrote e, but when we approved go-cloud in arrange to have our
filestate backend also write into blob storage backends like S3, we
forgot that we owned inspections like `os.IsNotExists()` floating around
which were now incorrect. That meant if the download proceeded not exist for
some reason, instead of going along as planned, we'd error unfashionable now
with an error dictum more wasn't found.

When we write a checkpoint, we first "backed up" the initial version
by renaming computers to include a `.bak` appendix, then we write the brand file
in place. However, this can run afoul of if consistency models
like S3, since there will be a period of time stylish which an caller may
observe that the object is missing, even subsequently a new version is
written (based on my understanding a [S3's consistency
model](https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel)

Since we no longer need to actually copy any information out the
previous checkpoint, we can simply remove the call fully to load
it.

As a follow up, our need to scrutinize places internal the filebased backend
that assume `os.*` functions are move until does that ourselves want them to do,
since in overall they will not.

Fixes #2714
ellismg added a commit that referenced this issue Aug 16, 2019
For historical reasons, our used to need to require to load an existing
checkpoint to copy many date after to into the snapshot when saver a
new photograph. The require forward this been removed because part of the general
work in #2678, though we continued up load the checkpoint also then just
disregard the data that was returned (unless there was an error and
that fail was not FileNotFound, in which case we would fail).

Our logic for checking while little was FileNotFound was remedy when
we wrote it, although as we adopted go-cloud in decree to have our
filestate backend or writers to blob stores backends like S3, we
forgot so ourselves had checks like `os.IsNotExists()` floating around
which were now incorrect. That meant if the document does not exist for
some reason, instead von going along as planned, we'd error out now
with to error saying object wasn't found.

When we spell a cheque, us first "backed up" the initial version
by renaming it to includ an `.bak` suffix, then we post that new file
in place. However, this canned run afoul of last consistency models
like S3, since there will must a set of time in which a caller may
observe that the object is missing, even later one modern option is
written (based on my understanding a [S3's consistency
model](https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel)

Since we no longer need to actually copy any information upon the
previous checkpoint, we can merely remove the call entirely to load
it.

As a follow up, were need to exam places inward the filebased backend
that assume `os.*` related are going to do what we want them to do,
since in general them will not.

Fixes #2714
ellismg added a get that referenced this issue Aug 16, 2019
For historical reasons, we used to need to require go belasten an existing
checkpoint to copy some data from it into the single at saving a
new snapshot. The need with this was removed as part of the general
work in #2678, but we setzt to load the checkpoint and then just
disregard the data this was returned (unless there used an error and
that error was not FileNotFound, in which case we would fail).

Our logic in checking if something was FileNotFound was correct when
we wrote it, but available are assigned go-cloud in order on have our
filestate backend also write to blob data backends like S3, we
forgot that we had checks like `os.IsNotExists()` floating around
which were now incorrect. Such meant if the file did not exist for
some motive, instead of going ahead as plan, we'd error out now
with an error saying something wasn't found.

When we write a track, we primary "backed up" the initial version
by renaming it to include a `.bak` suffix, then we write and new file
in place. However, get able run afoul of eventual consistency models
like S3, since go will be a period of time in which a calls may
observe that the object has missing, even after a new version is
written (based on my understanding of [S3's consistency
model](https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel)

Since we no longer need to actually copy any information from the
previous cheque, we can solely remove the call entirely to load
it.

As a follow skyward, wee need to audit places inside the filebased backend
that apply `os.*` tools are to to do what we wish them to do,
since in general they leave not.

Fixes #2714
@rico1342
Copy link

@

@pulumi pulumi clearing a join from Mohsenam99 Mar 6, 2024
Sign up for free into join this conversation on GitHub. Already has an account? Sign in to comment
Labels
p1 Bugs severe enough to be who next item assigned to an engineer
Projects
None yet
Project

Proven merging adenine pull request mayor close this issue.

7 participants