The power of depth for feedforward neural networks R Eldan, O Shamir Conference on learning theory, 907-940, 2016 | 829 | 2016 |
Learnability, stability and uniform convergence S Shalev-Shwartz, O Shamir, N Srebro, K Sridharan The Journal of Machine Learning Research 9999, 2635-2670, 2010 | 745* | 2010 |
Optimal Distributed Online Prediction Using Mini-Batches. O Dekel, R Gilad-Bachrach, O Shamir, L Xiao Journal of Machine Learning Research 13 (1), 2012 | 697 | 2012 |
Making gradient descent optimal for strongly convex stochastic optimization A Rakhlin, O Shamir, K Sridharan arXiv preprint arXiv:1109.5647, 2011 | 678 | 2011 |
Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes O Shamir, T Zhang International conference on machine learning, 71-79, 2013 | 562 | 2013 |
Communication-efficient distributed optimization using an approximate newton-type method O Shamir, N Srebro, T Zhang International conference on machine learning, 1000-1008, 2014 | 538 | 2014 |
On the computational efficiency of training neural networks R Livni, S Shalev-Shwartz, O Shamir Advances in neural information processing systems 27, 2014 | 534 | 2014 |
Size-independent sample complexity of neural networks N Golowich, A Rakhlin, O Shamir Conference On Learning Theory, 297-299, 2018 | 441 | 2018 |
Better mini-batch algorithms via accelerated gradient methods A Cotter, O Shamir, N Srebro, K Sridharan Advances in neural information processing systems 24, 2011 | 350 | 2011 |
Adaptively learning the crowd kernel O Tamuz, C Liu, S Belongie, O Shamir, AT Kalai arXiv preprint arXiv:1105.1033, 2011 | 298 | 2011 |
Nonstochastic multi-armed bandits with graph-structured feedback N Alon, N Cesa-Bianchi, C Gentile, S Mannor, Y Mansour, O Shamir SIAM Journal on Computing 46 (6), 1785-1826, 2017 | 258* | 2017 |
Spurious local minima are common in two-layer relu neural networks I Safran, O Shamir International conference on machine learning, 4433-4441, 2018 | 251 | 2018 |
Learning and generalization with the information bottleneck O Shamir, S Sabato, N Tishby Theoretical Computer Science 411 (29-30), 2696-2711, 2010 | 210 | 2010 |
Depth-width tradeoffs in approximating natural functions with neural networks I Safran, O Shamir International conference on machine learning, 2979-2987, 2017 | 203* | 2017 |
An optimal algorithm for bandit and zero-order convex optimization with two-point feedback O Shamir The Journal of Machine Learning Research 18 (1), 1703-1713, 2017 | 195 | 2017 |
Communication complexity of distributed convex learning and optimization Y Arjevani, O Shamir Advances in neural information processing systems 28, 2015 | 194 | 2015 |
Learning to classify with missing and corrupted features O Dekel, O Shamir Proceedings of the 25th international conference on Machine learning, 216-223, 2008 | 194 | 2008 |
On the complexity of bandit and derivative-free stochastic convex optimization O Shamir Conference on Learning Theory, 3-24, 2013 | 187 | 2013 |
Proving the lottery ticket hypothesis: Pruning is all you need E Malach, G Yehudai, S Shalev-Schwartz, O Shamir International Conference on Machine Learning, 6682-6691, 2020 | 186 | 2020 |
Failures of gradient-based deep learning S Shalev-Shwartz, O Shamir, S Shammah International Conference on Machine Learning, 3067-3075, 2017 | 175 | 2017 |