量化投资与机器学习微信公众号,是业内垂直于量化投资、对冲基金、Fintech、人工智能、大数据领域的主流自媒体公众号拥有来自公募、私募、券商、期货、银行、保险、高校等行业30W+关注者,连续2年被腾讯云+社区评选为“年度最佳作者”。
量化投资与机器学公众号在2022年又双叒叕开启了一个全新系列:
QIML汇集了来自全球顶尖对冲基金、互联网大厂的真实面试题目。希望给各位读者带来不一样的求职与学习体验!
往期回顾:第一期
本期题目来自:Citadel、Two Sigma、Morgan Stanley
第二期
出题机构:Citadel
▌题目难度:Medium
题目
Compare and contrast Gaussian Naive Bayes (GNB) and logistic regression, When would you use one over the other?

答案
Both Gaussian naive Bayes (GNB) and logistic regression can be used for classification. The two models each have advantages and disadvantages, which provide the answer as to which to choose under what circumstances. These are discussed below, along with their similarities and differences:  
Advantages:  
1.GNB requires only a small number of observations to be adequately trained; it is also easy to use and reasonably fast to implement; interpretation of the results produced by GNB can also be highly useful.
2.Logistic regression has a simple interpretation in terms of class probabilities, and it allows inferences to be made about features (i.e. variables) and identification of the most relevant of these with respect to prediction.  
Disadvantages:  
1. Bu assuming features (i.e., variables) to be independent, GNB can be wrongly employed in problems where that does not hold true. a very common occurrence.  
2. Not highly flexible, logistic regression may fail to capture interactions between features and so may lose prediction power. This lack of flexibility can also lead to overfitting if very little data are available for training.  
Differences:
1.Since logistic regression directly learns , it is a discriminative classifier, whereas GNB directly estimates   and   and so is a generative classifier.
2.Logistic regression requires an optimization setup (where weights cannot be learned directly through counts), whereas GNB requires no such setup.
Similarities:
1.Both methods as linear decision functions generated from training data.
2.GNB’s implied   is the same as that of logistic regression (but with particular parameters).
Given these advantages and disadvantages, logistic regression would be preferable assuming training provided data size is not an issue, since the assumption of conditional independence breaks down if features are correlated. however, in cases where training data are limited or the data-generating process includes strong priors, using GNB may be preferable.
---
▌出题机构:Two Sigma
▌题目难度:Hard
题目
Describe the kernel trick in SVMs and give a simple example. How do you decide what kernel to choose?

答案
The idea behind the kernel trick is that data cannot be separated by a hyperplane in its current dimensionality can actually be linearly separable by projecting it onto a higher dimensional space. and we can table any data and map that data to a higher dimension through a variety of functions phi. However, if phi is difficult to compute, then we have a problem — instead, it is desirable if we can compute the value of k without blowing up the computation.  For instance, say we have two examples and want to map them to a quadratic space.
We have the following:
and we can use the following:
If we now change n = 2 (quadratic) to arbitrary n, we can have arbitrarily complex phi. As long as we perform computations in the original feature space (without a feature transformation), then we avoid the long compute time while still mapping our data to a higher dimension!
In terms of which kernel to choose, we can choose between linear and nonlinear kernels, and these will be for linear and nonlinear problems, respectively. For linear problems, we can try a linear or logistic kernel. For nonlinear problems, we can try either radial basis function (RBF) or Gaussian kernels.  In real-life problems, domain knowledge can he handy — in the absence of such knowledge, the above defaults are probably good starting points.
We could also try many kernels, and set up a hyper-parameter search (a grid search, for example) and compare different kernels to one another. Based on the loss function at hand, or certain performance metrics (accuracy, F1, AUC of the ROC curve, etc.), we can determine which kernel is appropriate.
---
出题机构:Morgan Stanley
▌题目难度:Hard
题目
Say we have N observations for some variable which we model as being drawn from a Gaussian distribution. What are your best guesses for the parameters of the distribution?

答案
Assume we have some dataset X consisting of n i.i.d observations :
Our likelihood function is the where
and therefore the log-likelihood is given by:
aking the derivative of the log-likelihood with respect to  and setting the result to 0 yields the following:
Simplifying the result yields: , and therefore the maximum likelihood estimate for is given by:
To obtain the variance, we take the derivative of the log-likelihood with respect to   and set the result equal to 0:
Simplifying yields the following:
---
出题机构:Two Sigma
▌题目难度:Hard
题目
Suppose you are running a linear regression and model the error terms as being normally distributed. Show that, in this setup, maximizing the likelihood of the data is equivalent to minimizing the sum of the squared residuals. 

答案
In matrix form, we assume is distributed as multivariate Gaussian:
The likelihood of given above is
Of which we can take the log in order to optimize:
Note that, when taking a derivative with respect to , the first term is a constant, so we can ignore it, making our optimization problem as follows:
We can ignore the constant and flip the sign to rewrite as the following:
which is exactly equivalent to minimizing the sum of the squared residuals.
---
出题机构:Citadel
▌题目难度:Hard
题目
Describe the model formulation behind logistic regression, How do you maximize the log-likelihood of a given model (using the two-class case)?

答案
Logistic regression aims to classify   into one of   classes by calculating the following:
Therefore, the model is equivalent to the following, where the denominator normalizes the numerator over the   classes:
The log-likelihood over  observations, in general, is the following:
Use the following notation to denote classes 1 and 2 for the two-class case:
Then we have the following:
Using the following notation, such that the log-likelihood can be written as follows:
Simplifying yields the following:
Substituting for the probabilities yields the following:
To maximize this log-likelihood, take the derivative and set it equal to 0
We note that:
which is equivalent to the latter half of the above expression:  
The solutions to these equations are not closed-form, however, and, hence, the above should be iterated until convergence.
---
相关阅读
干翻机器学习面试!
全程干货!Citadel在职Quant求职经验分享
G-Research:量化研究员面试『真题』
小编尽力了!G-Research量化面试『真题』答案出炉!
Quant Puzzle:高级享受!
独家!中国量化私募面试Q&A系列——鸣石投资
独家!中国量化私募面试Q&A系列——白鹭资管
Quant求职系列:Jane Street烧脑Puzzle(2019-2020)
Two Sigma:面试还是挺难(附面经)!
你能做几道?Jane Street烧脑面试题!
独家!全球顶尖对冲基金LeetCode面试题汇总
继续阅读
阅读原文