Archives: big data

column-oriented DB No ratings yet.

Free and open-source software Columnar DB   Database Name Language Implemented in Notes Apache Druid Java started in 2011 for low-latency massive ingestion and queries Apache Kudu C++ released in 2016 to complete the Apache Hadoop ecosystem Apache Pinot Java open sourced in 2015 for real-time low-latency analytics Calpont InfiniDB C++ ClickHouse C++ released in 2016 to • Read More »

Cassandra Query No ratings yet.

Time Series db using cassandra Try it out   cqlsh:demo> create TABLE demo.users3(lastname text, firstname text, time timestamp , primary key(lastname, time)); cqlsh:demo> INSERT INTO users2(lastname, firstname , time ) VALUES ( ‘test1’, ‘testfir’, 164447) USING TTL 20; cqlsh:demo> select firstname FROM demo.users2;   Data Modeling However, in Cassandra, the data access queries • Read More »

bigdata OLTP , OLAP No ratings yet.

row-based vs col based db or format row based –> good for OLTP ( transcation),   e.g: cassendra col based –> good for OLAP (? easy to aggreation etc?), druid Parquet ( column based data format):   hadoop: big data storage, what is the alternatives? S3 on cloud?   pinot vs cassandra • Read More »

bigdata platform with Kubernets or Hadoop No ratings yet.

Hadoop: Hadoop kubernets MapReduce Spark on K8s Flink stream HDFS S3? any better one Resource manager Yarn/Mesos K8s itself   During its evolution phase, Hadoop provided three main functionalities that made it a Big Data-ready solution: a distributed computer mechanism (MapReduce), a robust data storage (HDFS), and a resource manager (YARN/Mesos). But modern technologies now • Read More »

Apache Kafka big picture and quick start No ratings yet.

What is Apache Kafka? ( big picture)  I found the article ( from Jay Kreps) presented a very good big picture on what Kafka suppose to do: you can use Kafka to build a stream data platform. Here the pictures from that article. The big idea is simple: many business processes can be modeled • Read More »