-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Refactor AWS Chunked logic from HTTP clients #3635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
083ea7a to
a6b040e
Compare
src/aws-cpp-sdk-core/include/aws/core/http/interceptor/ChunkingInterceptor.h
Outdated
Show resolved
Hide resolved
src/aws-cpp-sdk-core/include/aws/core/http/interceptor/ChunkingInterceptor.h
Outdated
Show resolved
Hide resolved
src/aws-cpp-sdk-core/source/http/interceptor/ChunkingInterceptor.cpp
Outdated
Show resolved
Hide resolved
src/aws-cpp-sdk-core/source/http/interceptor/ChunkingInterceptor.cpp
Outdated
Show resolved
Hide resolved
src/aws-cpp-sdk-core/include/aws/core/client/ClientConfiguration.h
Outdated
Show resolved
Hide resolved
src/aws-cpp-sdk-core/source/http/interceptor/ChunkingInterceptor.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we calling setg in the constructor?, specifically to nullptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its to initialize the streambuf as an empty state.
Its also done the same way in the code snipet you showed me
class chunked_stream : public std::streambuf {
public:
chunked_stream(std::unique_ptr<std::istream> is) : source(std::move(is)) {
setg(nullptr, nullptr, nullptr);
}
protected:
int_type underflow() override {
if (gptr() == egptr()) {
if (!trailer.empty()) {
buffer = trailer[trailer_pos++];
if (trailer_pos == trailer.size()) {
trailer.clear();
trailer_pos = 0;
}
setg(&buffer, &buffer, &buffer + 1);
return traits_type::to_int_type(*gptr());
}
int ch = source->get();
if (ch == traits_type::eof()) {
return traits_type::eof();
}
buffer = static_cast<char>(ch);
char_count++;
if (char_count % 10 == 0) {
trailer = "<trailer>";
trailer_pos = 0;
}
setg(&buffer, &buffer, &buffer + 1);
}
return traits_type::to_int_type(*gptr());
}
private:
std::unique_ptr<std::istream> source;
char buffer;
std::string trailer;
size_t trailer_pos = 0;
int char_count = 0;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the base constructor already sets them to null, so why are we doing it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this buffer 8KB? why is this buffer a c array and not a c++ array? why not make this configurable?
AwsChunkedStream has a buffer of 64KB for double encoding, why are creating a second buffer to buffer that buffer into?
If the answer is "because i need to use the AwsChunkedStream logic to do what i need it to do" then we need to move the logic from chunked stream into this class to avoid a extra buffer, or add additionally APIs to the existing class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes you're right, this was originally setup just because we needed a streambuff interface. I think extending AwsChunkedStream API to add a streamBuff interface makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need a default constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws-sdk-cpp/src/aws-cpp-sdk-core/include/smithy/client/features/ChecksumInterceptor.h
Line 48 in 8a4f5e9
| ChecksumInterceptor() = default; |
The checksum interceptor also have a default constructor. But I think we actually dont need this default constructor here since we have the explicit clientconfig constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you have this as the last step and not the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay moving this up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you call emplace back instead of using the initializer list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, this is redundant logic, changing it to an initializer list
src/aws-cpp-sdk-core/include/smithy/client/features/ChunkingInterceptor.h
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider the following code
auto main() -> int {
SDKOptions options{};
options.httpOptions.httpClientFactory_create_fn =
[]() -> std::shared_ptr<Aws::Http::HttpClientFactory> {
Aws::MakeShared<MyHttpClientFactory>();
};
SdkContext context{std::move(options)};
//.. sdk code
ClientConfiguration configuration{};
S3Client client{configuration};
return 0;
}would this operate correctly? or would the custom http client use the correct default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the base constructor already sets them to null, so why are we doing it again?
| Aws::Http::HttpRequest* m_request{nullptr}; | ||
| std::shared_ptr<Aws::IOStream> m_stream; | ||
| Aws::Utils::Array<char> m_data; | ||
| Aws::Utils::Array<char> m_buffer{DataBufferSize}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively doubles the size of this class by adding another buffer, it seems the only reason m_buffer exists is so that we can access the underlying data in m_chunkingStream. how can we access the data in m_chunkingStream without creating another buffer? seems to me like m_chunkingStream should be a different type, perhaps a Aws::Vector or a Aws::Utils::Array<char> so that you can access the underlying data. You will however change how you add and read from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted to use accessible_stream_buf but it was more complicated then initially thought. (Failing intergration test)
So I decided to just change the m_chunkingStream to a vector to access the underlying data.
| { | ||
| // Create modified config for chunking interceptor | ||
| Aws::Client::ClientConfiguration chunkingConfig(*m_clientConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you construct a object instead of referring to m_clientConfig directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we should not mutate the user's clientConfig directly
src/aws-cpp-sdk-core/include/smithy/client/AwsSmithyClientBase.h
Outdated
Show resolved
Hide resolved
src/aws-cpp-sdk-core/source/smithy/client/AwsSmithyClientBase.cpp
Outdated
Show resolved
Hide resolved
Move aws-chunked encoding logic from individual HTTP clients to a centralized ChunkingInterceptor for better separation of concerns. - Add ChunkingInterceptor to handle aws-chunked encoding at request level - Remove custom chunking logic from CRT, Curl, and Windows HTTP clients - Simplify HTTP clients to focus on transport-only responsibilities - Maintain full backwards compatibility with existing APIs unit test for chunking stream added logic to detect custom http client and smart default
src/aws-cpp-sdk-core/include/smithy/client/features/ChunkingInterceptor.h
Show resolved
Hide resolved
| Aws::MakeShared<ChecksumInterceptor>("AwsSmithyClientBase", *m_clientConfig), | ||
| Aws::MakeShared<features::ChunkingInterceptor>("AwsSmithyClientBase", [this]() { | ||
| Aws::Client::ClientConfiguration chunkingConfig = *m_clientConfig; | ||
| chunkingConfig.httpClientChunkedMode = m_httpClient->IsDefaultAwsHttpClient() ? m_clientConfig->httpClientChunkedMode : Aws::Client::HttpClientChunkedMode::CLIENT_IMPLEMENTATION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the behavior we want? If it's our implementation (IsDefaultAwsHttpClient() == true) then don't we want to force Aws::Client::HttpClientChunkedMode::DEFAULT so that the chunking is applied? And if it's not our implementation then we follow whatever the customer has set, ie m_clientConfig->httpClientChunkedMode. Seems to me this condition should be reversed:
chunkingConfig.httpClientChunkedMode = m_httpClient->IsDefaultAwsHttpClient() ? Aws::Client::HttpClientChunkedMode::DEFAULT : m_clientConfig->httpClientChunkedMode;
I think we should default initialize clientConfig.httpClientChunkedMode = HttpClientChunkedMode::CLIENT_IMPLEMENTATION; so the client can opt-in for their HttpClient implementation if they want to, but for our implementation we should ensure it's going to use chunking
| clientConfig.httpLibOverride = Aws::Http::TransferLibType::DEFAULT_CLIENT; | ||
|
|
||
| // Users can explicitly set CLIENT_IMPLEMENTATION if their custom client handles chunking | ||
| clientConfig.httpClientChunkedMode = HttpClientChunkedMode::DEFAULT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be
clientConfig.httpClientChunkedMode = HttpClientChunkedMode::CLIENT_IMPLEMENTATION;
We don't want to change existing client's HttpClient behavior unless they opt-in to our chunking logic.
sbiscigl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright its in a good place now, please fix the constructor and do some memory tests and im good to ship
| m_interceptors({ | ||
| Aws::MakeShared<ChecksumInterceptor>("AwsSmithyClientBase", *m_clientConfig), | ||
| Aws::MakeShared<features::ChunkingInterceptor>("AwsSmithyClientBase", [this]() { | ||
| Aws::Client::ClientConfiguration chunkingConfig = *m_clientConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok all of this exists because of my comment that we should be passing the whole client configuration, i'll admit i was wrong, lets use the enum value, that makes this all together better then you dont need any of this and a simple constructor call to
m_httpClient->IsDefaultAwsHttpClient() ? Aws::Client::HttpClientChunkedMode::DEFAULT : m_clientConfig->httpClientChunkedMode
will suffice, lets do that again, thats simpler and makes the better, sorry for the shuffling around there, lets do that though, that will make this better
Issue #, if available:
#3297
Description of changes:
Our goal is specifically to refactor this logic out of http clients and into a interceptor. We will introduce a core ChunkingInterceptor that owns all aws-chunked behavior for streaming requests and remove the chunking logics from the individual HTTP clients.
Check all that applies:
Check which platforms you have built SDK on to verify the correctness of this PR.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.