Setting a threadpool at or near the core count is an incomplete evaluation of your thread count problem.
You have the connector, its selectors / acceptors.
Then the incoming (server side) request processing (assuming 1 browser, you'll want to know what its concurrent requests per server limits are, and its usage patterns).
Then you have the outgoing (client in the proxy) requests that also use some of those threads.
Then you'll also want to understand the usage patterns on that 1 browser.
For example, when I wake up in the morning, I switch to my active Chrome window, Open a new window and then open all of my "morning" bookmarks as separate tabs. Instantly creating 54 initial connections (and tabs), that blossom to over 400 unique server connections once the responses start flowing back and other page resources are being requested. After about 1 second I can start using some of quicker websites, while I wait for the rest to load.
Am I unusual? perhaps.
Am I unique? definitely not, I have encountered many individuals that do similar things (often with less tabs, rarely more)
Could I use RSS instead? (I wish, but most of these sites don't offer it anymore)