| donnemartin/data-science-ipython-notebooks |
25,668 |
|
0 |
0 |
over 2 years ago |
0 |
|
34 |
other |
Python |
| Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. |
| Yelp/mrjob |
2,584 |
|
112 |
2 |
over 3 years ago |
62 |
December 15, 2021 |
211 |
other |
Python |
| Run MapReduce jobs on Hadoop or Amazon Web Services |
| HariSekhon/DevOps-Bash-tools |
2,224 |
|
0 |
0 |
about 2 years ago |
0 |
|
5 |
mit |
Shell |
| 1000+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Docker, CI/CD, APIs, SQL, PostgreSQL, MySQL, Hive, Impala, Kafka, Hadoop, Jenkins, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, tmux.. |
| HariSekhon/Nagios-Plugins |
1,111 |
|
0 |
0 |
over 2 years ago |
0 |
|
71 |
other |
Python |
| 450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc... |
| HariSekhon/DevOps-Python-tools |
709 |
|
0 |
0 |
over 2 years ago |
0 |
|
37 |
mit |
Python |
| 80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc. |
| nchammas/flintrock |
627 |
|
4 |
0 |
over 2 years ago |
14 |
November 27, 2023 |
36 |
apache-2.0 |
Python |
| A command-line tool for launching Apache Spark clusters. |
| awslabs/aws-glue-libs |
568 |
|
0 |
0 |
over 2 years ago |
0 |
|
96 |
other |
Python |
| AWS Glue Libraries are additions and enhancements to Spark for ETL operations. |
| OBenner/data-engineering-interview-questions |
554 |
|
0 |
0 |
over 2 years ago |
0 |
|
0 |
|
|
| More than 2000+ Data engineer interview questions. |
| databricks/spark-redshift |
514 |
|
4 |
1 |
over 6 years ago |
10 |
November 01, 2016 |
134 |
apache-2.0 |
Scala |
| Redshift data source for Apache Spark |
| hortonworks/cloudbreak |
348 |
|
0 |
0 |
about 2 years ago |
0 |
|
41 |
apache-2.0 |
Java |
| CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features. |